smarter_csv 1.17.1 → 1.17.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1af96fa0d5394ea752b09577f5a0a212d2dca492d78c17574bf03f46bcef5198
4
- data.tar.gz: 86c03a5bf89779ab84e9e8eda19b5d2c6b0d384fb99bcb1d1fa96d4e88c4afca
3
+ metadata.gz: 2e665f0dc98db44950aa9cbb2cac430068e91df8886062068413dfbcefc74fc3
4
+ data.tar.gz: def43fb66886b16ec13bd429b4fd6923b09aa1a01757a696390a38c18b59fa31
5
5
  SHA512:
6
- metadata.gz: ff21d28fb33ed6b6b3d77056b4e9ea302dc93947f06aa2ddb5b3d5fe5faec47f8fb315cb986987b8d4685857ad397797c8f5a97004e82515faa22fb2b69ae704
7
- data.tar.gz: 808646d031b77162163041b65d8269ac8d93e7fb8f4c284a2cfabface0e867e393881192769813d78c563d91ca9bdd8fc7a4903e2396f175dd3d89a5dfc0cd44
6
+ metadata.gz: 2fb7793ed4eca64cfef1f7dd82a417b44988832280b373c1748213d9f7c879cd0a2d17c4e3b72c82be6acedb01b0fec26b70e6daaefb645ee2c3bf64b7aedcd8
7
+ data.tar.gz: a0b8842d5a69d8526af81d4e2a64c31fd6a54d6d610c5ce0dbb16298dfd03c3d296546c049a4a76a710908390ec1ea1739bc7530b1ab652c7ead1ceaa02b431d
data/CHANGELOG.md CHANGED
@@ -1,8 +1,17 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
+ ## 1.17.2 (2026-05-21)
4
5
 
5
- ## 1.17.1 (2026-05-18)
6
+ RSpec tests: **2,220→ 2,274** (+54 tests)
7
+
8
+ ### Bug Fix
9
+
10
+ - fixed [Issue #334](https://github.com/tilo/smarter_csv/issues/334) with escaped double quote followed by comma. Thanks to [conorg](https://github.com/conorg)
11
+ - fixed bug when using `headers: { except: }`
12
+ - added more tests
13
+
14
+ ## 1.17.1 (2026-05-17)
6
15
 
7
16
  RSpec tests: **2,210→ 2,220** (+10 tests)
8
17
 
@@ -64,7 +73,7 @@ Measured against 1.16.4 (Apple M4, Ruby 3.4.7):
64
73
 
65
74
  Per-file breakdown: [`docs/releases/1.17.0/performance_notes.md`](docs/releases/1.17.0/performance_notes.md).
66
75
 
67
- ## 1.16.5 (2026-05-18)
76
+ ## 1.16.5 (2026-05-17)
68
77
 
69
78
  ### Bug Fix
70
79
 
@@ -222,7 +231,7 @@ Measured on 19 benchmark files, Apple M1, Ruby 3.4.7. See [benchmarks](docs/rele
222
231
  * **Writer temp file** no longer hardcoded to `/tmp` (fixes Windows); properly cleaned up with `Tempfile#close!`.
223
232
  * **Writer `StringIO`**: `finalize` no longer attempts to close a caller-owned `StringIO`.
224
233
 
225
- ## 1.15.3 (2026-05-18)
234
+ ## 1.15.3 (2026-05-17)
226
235
 
227
236
  ### Bug Fix
228
237
 
data/CONTRIBUTORS.md CHANGED
@@ -1,4 +1,4 @@
1
- # A Big Thank You to all 63 Contributors!!
1
+ # A Big Thank You to all 64 Contributors!!
2
2
 
3
3
 
4
4
  A Big Thank you to everyone who filed issues, sent comments, and who contributed with pull requests:
@@ -66,3 +66,4 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
66
66
  * [Dom Lebron](https://github.com/biglebronski)
67
67
  * [Paho Lurie-Gregg](https://github.com/paholg)
68
68
  * [Jonas Staškevičius](https://github.com/pirminis)
69
+ * [conorg](https://github.com/conorg)
data/README.md CHANGED
@@ -1,8 +1,8 @@
1
1
 
2
2
  # SmarterCSV
3
3
 
4
- ![Gem Version](https://img.shields.io/gem/v/smarter_csv) [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) [View on RubyGems](https://rubygems.org/gems/smarter_csv) [View on RubyToolbox](https://www.ruby-toolbox.com/search?q=smarter_csv)
5
-
4
+ ![Gem Version](https://img.shields.io/gem/v/smarter_csv) [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) [![Downloads](https://img.shields.io/gem/dt/smarter_csv)](https://rubygems.org/gems/smarter_csv) [![RubyGems](https://img.shields.io/badge/RubyGems-smarter__csv-brightgreen?logo=rubygems&logoColor=white)](https://rubygems.org/gems/smarter_csv) [![Ruby Toolbox](https://img.shields.io/badge/Ruby%20Toolbox-smarter__csv-brightgreen)](https://www.ruby-toolbox.com/projects/smarter_csv)
5
+
6
6
  SmarterCSV is a high-performance CSV ingestion and generation for Ruby, focused on fast end-to-end CSV ingestion of real-world data — no silent failures, no surprises, not just tokenization.
7
7
 
8
8
  ⭐ If SmarterCSV saved you hours of import time, please star the repo, and consider sponsoring this project.
@@ -311,7 +311,7 @@ Or install it yourself as:
311
311
  * [Examples](docs/examples.md)
312
312
  * [Real-World CSV Files](docs/real_world_csv.md)
313
313
  * [SmarterCSV over the Years](docs/history.md)
314
- * [Release Notes](docs/releases/1.16.0/changes.md)
314
+ * [Release Notes](docs/releases/1.17.0/changes.md)
315
315
 
316
316
  ## Articles
317
317
  * [Parsing CSV Files in Ruby with SmarterCSV](https://tilo-sloboda.medium.com/parsing-csv-files-in-ruby-with-smartercsv-6ce66fb6cf38)
@@ -333,7 +333,7 @@ For reporting issues, please:
333
333
  * open a pull-request adding a test that demonstrates the issue
334
334
  * mention your version of SmarterCSV, Ruby, Rails
335
335
 
336
- # [A Special Thanks to all 63 Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
336
+ # [A Special Thanks to all 64 Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
337
337
 
338
338
 
339
339
  ## Contributing
@@ -331,25 +331,40 @@ static VALUE rb_parse_csv_line(VALUE self, VALUE line, VALUE col_sep, VALUE quot
331
331
  if (!allow_escaped_quotes || backslash_count % 2 == 0) {
332
332
  if (__builtin_expect(quote_boundary_standard, 1)) {
333
333
  if (in_quotes) {
334
- // closing quote: only valid if followed by col_sep, row_sep, or end of line
335
- bool valid_close = (p + 1 >= endP);
336
- if (!valid_close) {
337
- valid_close = true;
338
- for (long j = 0; j < col_sep_len; j++) {
339
- if (*(p + 1 + j) != *(col_sepP + j)) { valid_close = false; break; }
334
+ if (p + 2 < endP && *(p + 1) == quote_char_val) {
335
+ /* RFC doubled quote inside a quoted field ("" → ").
336
+ * Give this precedence over the closing-quote check, but only
337
+ * when another byte follows the doubled pair.
338
+ *
339
+ * Compatibility note: we intentionally do NOT force terminal
340
+ * "" to be consumed here. SmarterCSV has a long-standing lenient
341
+ * behavior for malformed tails like ...\"" in :double_quotes mode:
342
+ * the final quote may still close the field instead of turning the
343
+ * row into an unclosed-quote error. Issue #334 needs doubled-quote
344
+ * precedence for ..."",... (more content follows), but we keep the
345
+ * historical leniency for terminal ..."". */
346
+ p++;
347
+ } else {
348
+ // closing quote: only valid if followed by col_sep, row_sep, or end of line
349
+ bool valid_close = (p + 1 >= endP);
350
+ if (!valid_close) {
351
+ valid_close = true;
352
+ for (long j = 0; j < col_sep_len; j++) {
353
+ if (*(p + 1 + j) != *(col_sepP + j)) { valid_close = false; break; }
354
+ }
340
355
  }
341
- }
342
- if (!valid_close && row_sep_len > 0) {
343
- valid_close = true;
344
- for (long j = 0; j < row_sep_len; j++) {
345
- if (*(p + 1 + j) != *(row_sepP + j)) { valid_close = false; break; }
356
+ if (!valid_close && row_sep_len > 0) {
357
+ valid_close = true;
358
+ for (long j = 0; j < row_sep_len; j++) {
359
+ if (*(p + 1 + j) != *(row_sepP + j)) { valid_close = false; break; }
360
+ }
346
361
  }
362
+ if (valid_close) {
363
+ in_quotes = false;
364
+ field_started = true;
365
+ }
366
+ // else: quote inside quoted field → literal
347
367
  }
348
- if (valid_close) {
349
- in_quotes = false;
350
- field_started = true;
351
- }
352
- // else: quote inside quoted field → literal (handles "" doubling)
353
368
  } else if (!field_started) {
354
369
  in_quotes = true; // opening quote at field boundary
355
370
  field_started = true;
@@ -829,6 +844,11 @@ __attribute__((hot)) static VALUE rb_parse_line_to_hash(VALUE self, VALUE line,
829
844
  * the frame stays well below 4 KB and ___chkstk_darwin never fires on ARM64 macOS.
830
845
  */
831
846
  bool *keep_bitmap = NULL;
847
+ /* In THIS (non-ctx) function the bitmap is alloca'd to headers_len on every call (see the alloca
848
+ * sites below), so keep_bitmap[] is exactly headers_len long and headers_len is the correct bound
849
+ * at all access sites. Do NOT mirror rb_parse_line_to_hash_ctx's keep_bitmap_len here: that variant
850
+ * caches its bitmap across rows (where @headers can grow), so it must use the captured length; this
851
+ * one rebuilds per call and does not. */
832
852
  bool keep_extra_columns = true; /* extra cols (> headers_len): keep by default */
833
853
  bool has_only = false; /* true when only_headers: filtering is active */
834
854
  long early_exit_after = -1; /* column index after which we stop; -1 = no early exit */
@@ -1147,25 +1167,40 @@ __attribute__((hot)) static VALUE rb_parse_line_to_hash(VALUE self, VALUE line,
1147
1167
  if (!allow_escaped_quotes || backslash_count % 2 == 0) {
1148
1168
  if (__builtin_expect(quote_boundary_standard, 1)) {
1149
1169
  if (in_quotes) {
1150
- // closing quote: only valid if followed by col_sep, row_sep, or end of line
1151
- bool valid_close = (p + 1 >= endP);
1152
- if (!valid_close) {
1153
- valid_close = true;
1154
- for (long j = 0; j < col_sep_len; j++) {
1155
- if (*(p + 1 + j) != *(col_sepP + j)) { valid_close = false; break; }
1170
+ if (p + 2 < endP && *(p + 1) == quote_char_val) {
1171
+ /* RFC doubled quote inside a quoted field ("" → ").
1172
+ * Give this precedence over the closing-quote check, but only
1173
+ * when another byte follows the doubled pair.
1174
+ *
1175
+ * Compatibility note: we intentionally do NOT force terminal
1176
+ * "" to be consumed here. SmarterCSV has a long-standing lenient
1177
+ * behavior for malformed tails like ...\"" in :double_quotes mode:
1178
+ * the final quote may still close the field instead of turning the
1179
+ * row into an unclosed-quote error. Issue #334 needs doubled-quote
1180
+ * precedence for ..."",... (more content follows), but we keep the
1181
+ * historical leniency for terminal ..."". */
1182
+ p++;
1183
+ } else {
1184
+ // closing quote: only valid if followed by col_sep, row_sep, or end of line
1185
+ bool valid_close = (p + 1 >= endP);
1186
+ if (!valid_close) {
1187
+ valid_close = true;
1188
+ for (long j = 0; j < col_sep_len; j++) {
1189
+ if (*(p + 1 + j) != *(col_sepP + j)) { valid_close = false; break; }
1190
+ }
1156
1191
  }
1157
- }
1158
- if (!valid_close && row_sep_len2 > 0) {
1159
- valid_close = true;
1160
- for (long j = 0; j < row_sep_len2; j++) {
1161
- if (*(p + 1 + j) != *(row_sepP2 + j)) { valid_close = false; break; }
1192
+ if (!valid_close && row_sep_len2 > 0) {
1193
+ valid_close = true;
1194
+ for (long j = 0; j < row_sep_len2; j++) {
1195
+ if (*(p + 1 + j) != *(row_sepP2 + j)) { valid_close = false; break; }
1196
+ }
1162
1197
  }
1198
+ if (valid_close) {
1199
+ in_quotes = false;
1200
+ field_started = true;
1201
+ }
1202
+ // else: quote inside quoted field → literal
1163
1203
  }
1164
- if (valid_close) {
1165
- in_quotes = false;
1166
- field_started = true;
1167
- }
1168
- // else: quote inside quoted field → literal (handles "" doubling)
1169
1204
  } else if (!field_started) {
1170
1205
  in_quotes = true; // opening quote at field boundary
1171
1206
  field_started = true;
@@ -1495,6 +1530,14 @@ __attribute__((hot)) static VALUE rb_parse_line_to_hash_ctx(VALUE self, VALUE li
1495
1530
  int numeric_mode = ctx->numeric_mode;
1496
1531
  VALUE numeric_keys = ctx->numeric_keys;
1497
1532
  bool *keep_bitmap = ctx->keep_bitmap;
1533
+ /* keep_bitmap is cached in the context (xmalloc'd once at construction, sized to the header count
1534
+ * THEN). @headers can grow in place as undeclared extra columns appear, so the live headers_len
1535
+ * (re-read each call below) may exceed the bitmap's length. Every keep_bitmap[] access in this
1536
+ * function MUST be bounded by keep_bitmap_len, never headers_len — indices past the bitmap are
1537
+ * extra columns and follow keep_extra_columns. Bounding by the grown headers_len was an
1538
+ * out-of-bounds heap read (the bug). The sibling rb_parse_line_to_hash safely uses headers_len
1539
+ * because it re-allocs its bitmap to headers_len on every call. */
1540
+ long keep_bitmap_len = ctx->keep_bitmap_len;
1498
1541
  bool keep_extra_columns = ctx->keep_extra_columns;
1499
1542
  long early_exit_after = ctx->early_exit_after;
1500
1543
 
@@ -1596,7 +1639,7 @@ __attribute__((hot)) static VALUE rb_parse_line_to_hash_ctx(VALUE self, VALUE li
1596
1639
  while (trim_end >= trim_start && (*trim_end == ' ' || *trim_end == '\t')) trim_end--;
1597
1640
  }
1598
1641
  long trimmed_len = (trim_end >= trim_start) ? (trim_end - trim_start + 1) : 0;
1599
- if (!keep_bitmap || (element_count < headers_len ? keep_bitmap[element_count] : keep_extra_columns)) {
1642
+ if (!keep_bitmap || (element_count < keep_bitmap_len ? keep_bitmap[element_count] : keep_extra_columns)) {
1600
1643
  if (insert_field_into_hash(&xform, trim_start, trimmed_len, element_count, false, quote_char_val, encoding))
1601
1644
  all_blank = false;
1602
1645
  }
@@ -1617,7 +1660,7 @@ __attribute__((hot)) static VALUE rb_parse_line_to_hash_ctx(VALUE self, VALUE li
1617
1660
  while (trim_end >= trim_start && (*trim_end == ' ' || *trim_end == '\t')) trim_end--;
1618
1661
  }
1619
1662
  long trimmed_len = (trim_end >= trim_start) ? (trim_end - trim_start + 1) : 0;
1620
- if (!keep_bitmap || (element_count < headers_len ? keep_bitmap[element_count] : keep_extra_columns)) {
1663
+ if (!keep_bitmap || (element_count < keep_bitmap_len ? keep_bitmap[element_count] : keep_extra_columns)) {
1621
1664
  if (insert_field_into_hash(&xform, trim_start, trimmed_len, element_count, false, quote_char_val, encoding))
1622
1665
  all_blank = false;
1623
1666
  }
@@ -1680,7 +1723,7 @@ __attribute__((hot)) static VALUE rb_parse_line_to_hash_ctx(VALUE self, VALUE li
1680
1723
 
1681
1724
  bool has_embedded_quotes = quoted || (trimmed_len > 0 && memchr(trim_start, quote_char_val, trimmed_len));
1682
1725
 
1683
- if (!keep_bitmap || (element_count < headers_len ? keep_bitmap[element_count] : keep_extra_columns)) {
1726
+ if (!keep_bitmap || (element_count < keep_bitmap_len ? keep_bitmap[element_count] : keep_extra_columns)) {
1684
1727
  if (insert_field_into_hash(&xform, trim_start, trimmed_len, element_count, has_embedded_quotes, quote_char_val, encoding))
1685
1728
  all_blank = false;
1686
1729
  }
@@ -1714,25 +1757,40 @@ __attribute__((hot)) static VALUE rb_parse_line_to_hash_ctx(VALUE self, VALUE li
1714
1757
  if (!allow_escaped_quotes || backslash_count % 2 == 0) {
1715
1758
  if (__builtin_expect(quote_boundary_standard, 1)) {
1716
1759
  if (in_quotes) {
1717
- /* closing quote: only valid if followed by col_sep, row_sep, or end */
1718
- bool valid_close = (p + 1 >= endP);
1719
- if (!valid_close) {
1720
- valid_close = true;
1721
- for (long j = 0; j < col_sep_len; j++) {
1722
- if (*(p + 1 + j) != *(col_sepP + j)) { valid_close = false; break; }
1760
+ if (p + 2 < endP && *(p + 1) == quote_char_val) {
1761
+ /* RFC doubled quote inside a quoted field ("" → ").
1762
+ * Give this precedence over the closing-quote check, but only
1763
+ * when another byte follows the doubled pair.
1764
+ *
1765
+ * Compatibility note: we intentionally do NOT force terminal
1766
+ * "" to be consumed here. SmarterCSV has a long-standing lenient
1767
+ * behavior for malformed tails like ...\"" in :double_quotes mode:
1768
+ * the final quote may still close the field instead of turning the
1769
+ * row into an unclosed-quote error. Issue #334 needs doubled-quote
1770
+ * precedence for ..."",... (more content follows), but we keep the
1771
+ * historical leniency for terminal ..."". */
1772
+ p++;
1773
+ } else {
1774
+ /* closing quote: only valid if followed by col_sep, row_sep, or end */
1775
+ bool valid_close = (p + 1 >= endP);
1776
+ if (!valid_close) {
1777
+ valid_close = true;
1778
+ for (long j = 0; j < col_sep_len; j++) {
1779
+ if (*(p + 1 + j) != *(col_sepP + j)) { valid_close = false; break; }
1780
+ }
1723
1781
  }
1724
- }
1725
- if (!valid_close && row_sep_len2 > 0) {
1726
- valid_close = true;
1727
- for (long j = 0; j < row_sep_len2; j++) {
1728
- if (*(p + 1 + j) != *(row_sepP2 + j)) { valid_close = false; break; }
1782
+ if (!valid_close && row_sep_len2 > 0) {
1783
+ valid_close = true;
1784
+ for (long j = 0; j < row_sep_len2; j++) {
1785
+ if (*(p + 1 + j) != *(row_sepP2 + j)) { valid_close = false; break; }
1786
+ }
1729
1787
  }
1788
+ if (valid_close) {
1789
+ in_quotes = false;
1790
+ field_started = true;
1791
+ }
1792
+ /* else: quote inside quoted field → literal */
1730
1793
  }
1731
- if (valid_close) {
1732
- in_quotes = false;
1733
- field_started = true;
1734
- }
1735
- /* else: quote inside quoted field → literal (handles "" doubling) */
1736
1794
  } else if (!field_started) {
1737
1795
  in_quotes = true; /* opening quote at field boundary */
1738
1796
  field_started = true;
@@ -1791,7 +1849,7 @@ __attribute__((hot)) static VALUE rb_parse_line_to_hash_ctx(VALUE self, VALUE li
1791
1849
 
1792
1850
  bool has_embedded_quotes = quoted || (trimmed_len > 0 && memchr(trim_start, quote_char_val, trimmed_len));
1793
1851
 
1794
- if (!keep_bitmap || (element_count < headers_len ? keep_bitmap[element_count] : keep_extra_columns)) {
1852
+ if (!keep_bitmap || (element_count < keep_bitmap_len ? keep_bitmap[element_count] : keep_extra_columns)) {
1795
1853
  if (insert_field_into_hash(&xform, trim_start, trimmed_len, element_count, has_embedded_quotes, quote_char_val, encoding))
1796
1854
  all_blank = false;
1797
1855
  }
@@ -1819,7 +1877,7 @@ __attribute__((hot)) static VALUE rb_parse_line_to_hash_ctx(VALUE self, VALUE li
1819
1877
  if (!remove_empty_values) {
1820
1878
  ensure_hash_allocated(&xform);
1821
1879
  for (long i = element_count; i < headers_len; i++) {
1822
- if (!keep_bitmap || keep_bitmap[i]) {
1880
+ if (!keep_bitmap || (i < keep_bitmap_len ? keep_bitmap[i] : keep_extra_columns)) {
1823
1881
  rb_hash_aset(xform.hash, rb_ary_entry(headers, i), Qnil);
1824
1882
  }
1825
1883
  }
@@ -410,15 +410,28 @@ module SmarterCSV
410
410
  if !allow_escaped_quotes || backslash_count % 2 == 0
411
411
  if quote_boundary_standard
412
412
  if in_quotes
413
- # closing quote: only valid if followed by col_sep, row_sep, or end of line
414
413
  next_i = i + 1
415
- if next_i >= bytesize ||
416
- line.getbyte(next_i) == col_sep_byte ||
417
- (row_sep_bytesize > 0 && line.byteslice(next_i, row_sep_bytesize) == row_sep)
414
+ if next_i + 1 < bytesize && line.getbyte(next_i) == quote_byte
415
+ # RFC doubled quote inside a quoted field ("" ").
416
+ # Give this precedence over the closing-quote check, but only
417
+ # when another byte follows the doubled pair.
418
+ #
419
+ # Compatibility note: we intentionally do NOT force terminal
420
+ # "" to be consumed here. SmarterCSV has a long-standing lenient
421
+ # behavior for malformed tails like ...\"" in :double_quotes mode:
422
+ # the final quote may still close the field instead of turning the
423
+ # row into an unclosed-quote error. Issue #334 needs doubled-quote
424
+ # precedence for ..."",... (more content follows), but we keep the
425
+ # historical leniency for terminal ..."".
426
+ i = next_i
427
+ # closing quote: only valid if followed by col_sep, row_sep, or end of line
428
+ elsif next_i >= bytesize ||
429
+ line.getbyte(next_i) == col_sep_byte ||
430
+ (row_sep_bytesize > 0 && line.byteslice(next_i, row_sep_bytesize) == row_sep)
418
431
  in_quotes = false
419
432
  field_started = true
420
433
  end
421
- # else: quote inside quoted field → literal (handles "" doubling)
434
+ # else: quote inside quoted field → literal
422
435
  elsif !field_started # at field boundary: open quoted field
423
436
  in_quotes = true
424
437
  field_started = true
@@ -519,15 +532,28 @@ module SmarterCSV
519
532
  if !allow_escaped_quotes || backslash_count % 2 == 0
520
533
  if quote_boundary_standard
521
534
  if in_quotes
522
- # closing quote: only valid if followed by col_sep, row_sep, or end of line
523
535
  next_i = i + 1
524
- if next_i >= line_size ||
525
- line[next_i...next_i + col_sep_size] == col_sep ||
526
- (row_sep_size > 0 && line[next_i...next_i + row_sep_size] == row_sep)
536
+ if next_i + 1 < line_size && line[next_i] == quote
537
+ # RFC doubled quote inside a quoted field ("" → ").
538
+ # Give this precedence over the closing-quote check, but only
539
+ # when another character follows the doubled pair.
540
+ #
541
+ # Compatibility note: we intentionally do NOT force terminal
542
+ # "" to be consumed here. SmarterCSV has a long-standing lenient
543
+ # behavior for malformed tails like ...\"" in :double_quotes mode:
544
+ # the final quote may still close the field instead of turning the
545
+ # row into an unclosed-quote error. Issue #334 needs doubled-quote
546
+ # precedence for ..."",... (more content follows), but we keep the
547
+ # historical leniency for terminal ..."".
548
+ i = next_i
549
+ # closing quote: only valid if followed by col_sep, row_sep, or end of line
550
+ elsif next_i >= line_size ||
551
+ line[next_i...next_i + col_sep_size] == col_sep ||
552
+ (row_sep_size > 0 && line[next_i...next_i + row_sep_size] == row_sep)
527
553
  in_quotes = false
528
554
  field_started = true
529
555
  end
530
- # else: quote inside quoted field → literal (handles "" doubling)
556
+ # else: quote inside quoted field → literal
531
557
  elsif !field_started # at field boundary: open quoted field
532
558
  in_quotes = true
533
559
  field_started = true
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- VERSION = "1.17.1"
4
+ VERSION = "1.17.2"
5
5
  end
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.17.1
4
+ version: 1.17.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  bindir: bin
9
9
  cert_chain: []
10
- date: 2026-05-17 00:00:00.000000000 Z
10
+ date: 2026-05-21 00:00:00.000000000 Z
11
11
  dependencies: []
12
12
  description: |
13
13
  SmarterCSV is a high-performance CSV reader and writer for Ruby focused on
@@ -39,7 +39,6 @@ files:
39
39
  - LICENSE.txt
40
40
  - README.md
41
41
  - Rakefile
42
- - TO_DO.md
43
42
  - docs/_introduction.md
44
43
  - docs/bad_row_quarantine.md
45
44
  - docs/basic_read_api.md
data/TO_DO.md DELETED
@@ -1,109 +0,0 @@
1
- # SmarterCSV v2.0 TO DO List
2
-
3
- DONE:
4
- [X] Don't call rewind on filehandle
5
- [X] use Procs for validations and transformatoins [issue #118](https://github.com/tilo/smarter_csv/issues/118)
6
- [X] skip file opening, allow reading from CSV string, e.g. reading from S3 file [issue #120](https://github.com/tilo/smarter_csv/issues/120). Or stream large file from S3 (linked in the issue)
7
- [X] [2.0 BUG] convert_to_float saves Proc as @@convert_to_integer [issue #157](https://github.com/tilo/smarter_csv/issues/157)
8
- [X] add enumerable to speed up parallel processing [issue #66](https://github.com/tilo/smarter_csv/issues/66), [issue #32](https://github.com/tilo/smarter_csv/issues/32)
9
- [X] Provide an example for custom Procs for hash_transformations in the docs [issue #174](https://github.com/tilo/smarter_csv/issues/174)
10
- [X] Collect all Errors, before surfacing them. Avoid throwing an exception on the first error [issue #133](https://github.com/tilo/smarter_csv/issues/133)
11
-
12
-
13
- Partially Done:
14
- [ ] make @errors and @warnings work [issue #118](https://github.com/tilo/smarter_csv/issues/118)
15
-
16
- StilL TO DO:
17
- [ ] Replace remove_empty_values: false [issue #213](https://github.com/tilo/smarter_csv/issues/213)
18
-
19
- Arguably by design (e.g. exclude these columns from conversion and have them returned as a string)
20
- [ ] [2.0 BUG] :convert_values_to_numeric_unless_leading_zeros drops leading zeros [issue #151](https://github.com/tilo/smarter_csv/issues/151)
21
-
22
-
23
- ## Numeric conversion: align the Ruby fallback path with the C path (permissive)
24
-
25
- Context: `convert_values_to_numeric` runs in two places that currently DISAGREE on edge cases:
26
- - C path (`acceleration: true`, the default): `ext/smarter_csv/smarter_csv.c#try_numeric_conversion`
27
- uses `strtol`/`strtod` (base 10; float branch only entered when the field contains a `.`).
28
- - Ruby fallback (`acceleration: false`): `lib/smarter_csv/hash_transformations.rb` uses the
29
- strict regex `NUMERIC_REGEX = /\A[+-]?\d+(?:\.\d+)?\z/` plus `to_i` / `to_f`.
30
-
31
- Divergence (verified empirically):
32
- | value | C path | Ruby fallback |
33
- |-----------|------------------|-------------------|
34
- | ".5" | 0.5 (Float) | ".5" (String) |
35
- | "3." | 3.0 (Float) | "3." (String) |
36
- | "1.5e3" | 1500.0 (Float) | "1.5e3" (String) |
37
- | "1.0e10" | 10000000000.0 | "1.0e10" (String) |
38
-
39
- Decision: the C path's permissive behavior (corner cases + scientific notation) is the intended
40
- contract. Fix = make the Ruby fallback match the C path. Do NOT tighten the C path.
41
-
42
- Ruby-side changes (in `hash_transformations.rb`):
43
- 1. Swap NUMERIC_REGEX for a permissive one:
44
- /\A[+-]?(?:\d+\.?\d*|\.\d+)(?:[eE][+-]?\d+)?\z/
45
- matches 1, 1., 1.5, .5, 1e3, 1.5e3, -3.14e-2, etc.; still rejects ".", "e3", "1.2.3",
46
- "1_000", "0x1F".
47
- 2. Add `DOT_BYTE = '.'.ord` (46) and include it in the first-byte fast-reject's allowed set
48
- (the C pre-check already allows a leading `.`; without this, ".5" gets rejected on byte 0).
49
- 3. Int-vs-float decision: `(v.include?('.') || v.include?('e') || v.include?('E')) ? v.to_f : v.to_i`
50
- (currently only checks for `.`).
51
-
52
- Stays a string on BOTH paths (no change needed, but worth characterization tests — there are
53
- currently NONE):
54
- - "010" => 10 (NOT octal 8 — both paths use base-10 conversion: String#to_i / strtol(.,10).
55
- A switch to Kernel#Integer() would break this. Lock it down with a test.)
56
- - "0x1F", "0b101", "0o17" => string (radix prefixes not honored by base-10 conversion)
57
- - "1_000" => string (underscores)
58
- - "1,200.00", "1.300,00" => string (thousands sep / decimal comma — strtod stops at the
59
- separator → not fully consumed; regex rejects. This is the only safe behavior; "1,200" is
60
- genuinely ambiguous. Locale-specific number formats are the caller's job via value_converters.)
61
-
62
- NOT doing: locale sniffing (read LC_NUMERIC at init and adjust the regexes). Rejected because
63
- the machine locale tells you nothing about the file's number format, it breaks reproducibility
64
- (same code + same file → different results on a US vs EU box), and `,` can't be both col_sep and
65
- decimal separator anyway. Note `strtod` IS locale-sensitive (LC_NUMERIC) but it's dormant — Ruby
66
- runs in the C/POSIX locale; don't deliberately activate it.
67
-
68
- When done: parity tests (`[true, false].each`) for the now-consistent set (.5, 3., 1.5e3, 1e3)
69
- plus characterization tests for the stays-a-string set above; CHANGELOG line noting the Ruby
70
- fallback's numeric conversion now accepts scientific notation and bare-dot forms, matching the
71
- accelerated path. Behavior change affects `acceleration: false` users only — and aligns them with
72
- the default.
73
-
74
-
75
- ## Warn once when the C extension didn't load on a platform that supports it
76
-
77
- Context: `acceleration: true` is the default. When the C extension fails to build / isn't loaded,
78
- SmarterCSV silently falls back to the Ruby parser — graceful degradation by design (so the gem
79
- keeps working for users with broken toolchains, JRuby, TruffleRuby, etc.). Today there is no
80
- signal to the user that they're not getting the C path; their CSV parsing is just slower than
81
- they might have expected.
82
-
83
- Idea: emit a one-time warning when:
84
- * the C extension is NOT loaded — `!SmarterCSV::Parser.respond_to?(:parse_csv_line_c)`, AND
85
- * the platform is one where it *should* be available — `RUBY_ENGINE == 'ruby'` (MRI / CRuby).
86
- JRuby and TruffleRuby don't load CRuby C extensions natively; nothing for the user to do.
87
-
88
- Where to fire:
89
- * NOT at `require 'smarter_csv'` time — Rails.logger typically isn't set up yet, so any
90
- "route through the warnings system" code would just fall through to `Kernel#warn` anyway,
91
- and the warning would land in stderr instead of the Rails log where ops would see it.
92
- * At first `Reader.new` / `SmarterCSV.process` call — Rails has booted, the existing
93
- routing-through-Rails.logger-or-Kernel#warn infra works, and the existing deduped warnings
94
- histogram means it fires once per process regardless of how many parse calls.
95
-
96
- Implementation sketch:
97
- * Add a new warning code (e.g. `:c_extension_unavailable`) alongside the existing ones
98
- (`:chunk_size_default`, `:header_a_method`, `:utf8_missing_binary_mode`, ...).
99
- * Severity `:warn`. Suppressible via the existing `verbose: :quiet`.
100
- * Message points at the fix — e.g. "C acceleration extension not loaded on this Ruby; using
101
- Ruby parser. To enable acceleration, reinstall with `gem pristine smarter_csv` and check
102
- the build log." Plus a link/pointer to a troubleshooting section in the docs.
103
-
104
- Bonus: add a public predicate `SmarterCSV.acceleration_available?` returning
105
- `Parser.respond_to?(:parse_csv_line_c)`. Zero noise, useful for scripts / CI / future spec
106
- files that want to branch on the environment fact rather than guess.
107
-
108
- NOT doing: a banner at `require` time (every Rails app would print it at boot, too noisy);
109
- warning when `acceleration: false` was explicitly chosen (the user knows what they're doing).