smarter_csv 1.15.2 → 1.16.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +9 -0
- data/CHANGELOG.md +68 -1
- data/CONTRIBUTORS.md +3 -1
- data/Gemfile +1 -0
- data/README.md +123 -27
- data/docs/_introduction.md +40 -24
- data/docs/bad_row_quarantine.md +285 -0
- data/docs/basic_read_api.md +151 -9
- data/docs/basic_write_api.md +474 -59
- data/docs/batch_processing.md +161 -4
- data/docs/column_selection.md +183 -0
- data/docs/data_transformations.md +162 -29
- data/docs/examples.md +339 -46
- data/docs/header_transformations.md +93 -12
- data/docs/header_validations.md +56 -18
- data/docs/history.md +117 -0
- data/docs/instrumentation.md +165 -0
- data/docs/migrating_from_csv.md +290 -0
- data/docs/options.md +150 -87
- data/docs/parsing_strategy.md +63 -1
- data/docs/real_world_csv.md +262 -0
- data/docs/releases/1.16.0/benchmarks.md +223 -0
- data/docs/releases/1.16.0/changes.md +272 -0
- data/docs/releases/1.16.0/performance_notes.md +114 -0
- data/docs/row_col_sep.md +14 -5
- data/docs/value_converters.md +193 -57
- data/ext/smarter_csv/extconf.rb +3 -0
- data/ext/smarter_csv/smarter_csv.c +1007 -71
- data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png +0 -0
- data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.svg +108 -0
- data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.png +0 -0
- data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.svg +141 -0
- data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.png +0 -0
- data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.svg +139 -0
- data/lib/smarter_csv/errors.rb +8 -0
- data/lib/smarter_csv/file_io.rb +1 -1
- data/lib/smarter_csv/hash_transformations.rb +14 -13
- data/lib/smarter_csv/header_transformations.rb +21 -2
- data/lib/smarter_csv/headers.rb +2 -1
- data/lib/smarter_csv/options.rb +124 -7
- data/lib/smarter_csv/parser.rb +362 -75
- data/lib/smarter_csv/reader.rb +494 -46
- data/lib/smarter_csv/version.rb +1 -1
- data/lib/smarter_csv/writer.rb +71 -19
- data/lib/smarter_csv.rb +95 -12
- data/smarter_csv.gemspec +20 -10
- metadata +37 -80
data/docs/options.md
CHANGED
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
### Contents
|
|
3
3
|
|
|
4
4
|
* [Introduction](./_introduction.md)
|
|
5
|
+
* [Migrating from Ruby CSV](./migrating_from_csv.md)
|
|
5
6
|
* [Parsing Strategy](./parsing_strategy.md)
|
|
6
7
|
* [The Basic Read API](./basic_read_api.md)
|
|
7
8
|
* [The Basic Write API](./basic_write_api.md)
|
|
@@ -10,8 +11,15 @@
|
|
|
10
11
|
* [Row and Column Separators](./row_col_sep.md)
|
|
11
12
|
* [Header Transformations](./header_transformations.md)
|
|
12
13
|
* [Header Validations](./header_validations.md)
|
|
14
|
+
* [Column Selection](./column_selection.md)
|
|
13
15
|
* [Data Transformations](./data_transformations.md)
|
|
14
16
|
* [Value Converters](./value_converters.md)
|
|
17
|
+
* [Bad Row Quarantine](./bad_row_quarantine.md)
|
|
18
|
+
* [Instrumentation Hooks](./instrumentation.md)
|
|
19
|
+
* [Examples](./examples.md)
|
|
20
|
+
* [Real-World CSV Files](./real_world_csv.md)
|
|
21
|
+
* [SmarterCSV over the Years](./history.md)
|
|
22
|
+
* [Release Notes](./releases/1.16.0/changes.md)
|
|
15
23
|
|
|
16
24
|
--------------
|
|
17
25
|
|
|
@@ -19,96 +27,151 @@
|
|
|
19
27
|
|
|
20
28
|
## CSV Writing
|
|
21
29
|
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
| :disable_auto_quoting | false | To manually disable auto-quoting of special characters. ⚠️ Be careful with this! |
|
|
40
|
-
| :quote_headers | false | To force quoting all headers (only needed in rare cases) |
|
|
30
|
+
| Option | Default | Explanation |
|
|
31
|
+
|--------|---------|-------------|
|
|
32
|
+
| `:row_sep` | `$/` | Separates rows. Defaults to your OS row separator: `\n` on UNIX, `\r\n` on Windows. |
|
|
33
|
+
| `:col_sep` | `","` | Separates each value in a row. |
|
|
34
|
+
| `:quote_char` | `'"'` | Character used to quote CSV fields. |
|
|
35
|
+
| `:force_quotes` | `false` | Forces each individual value to be quoted. |
|
|
36
|
+
| `:headers` | `[]` | List of keys from the input to use as headers in the CSV file. ⚠️ Disables automatic header detection! |
|
|
37
|
+
| `:map_headers` | `{}` | Like `:headers`, but also maps each key to a user-specified header value. ⚠️ Disables automatic header detection! |
|
|
38
|
+
| `:value_converters` | `nil` | Lambdas to programmatically modify values — either for specific key names, or using `_all` for all fields. |
|
|
39
|
+
| `:header_converter` | `nil` | One lambda to programmatically modify the headers. |
|
|
40
|
+
| `:discover_headers` | `true` | Automatically detects all keys in the input before writing the header. Do not set to `false` manually. ⚠️ |
|
|
41
|
+
| `:disable_auto_quoting` | `false` | Manually disables auto-quoting of special characters. ⚠️ Use with care! |
|
|
42
|
+
| `:quote_headers` | `false` | Force quoting all headers (only needed in rare cases). |
|
|
43
|
+
| `:encoding` | `nil` | File encoding passed to `File.open` when writing to a path (e.g. `'UTF-8'`, `'ISO-8859-1'`). Supports Ruby's `'external:internal'` transcoding notation (e.g. `'ISO-8859-1:UTF-8'`) to automatically transcode UTF-8 strings into the target encoding. `nil` uses the system default. Ignored when an IO object is passed directly. |
|
|
44
|
+
| `:write_nil_value` | `''` | String written in place of `nil` field values. E.g. `write_nil_value: 'N/A'`. |
|
|
45
|
+
| `:write_empty_value` | `''` | String written in place of empty-string field values, including missing keys. E.g. `write_empty_value: 'EMPTY'`. |
|
|
46
|
+
| `:write_bom` | `false` | Prepends a UTF-8 BOM (`\xEF\xBB\xBF`) to the output. Use with `encoding: 'UTF-8'` for Excel compatibility. |
|
|
41
47
|
|
|
42
48
|
|
|
43
49
|
## CSV Reading
|
|
44
50
|
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
51
|
+
### File Input & Encoding
|
|
52
|
+
|
|
53
|
+
| Option | Default | Explanation |
|
|
54
|
+
|--------|---------|-------------|
|
|
55
|
+
| `:file_encoding` | `utf-8` | Set the file encoding, e.g. `'windows-1252'` or `'iso-8859-1'`. |
|
|
56
|
+
| `:invalid_byte_sequence` | `''` | What to replace invalid byte sequences with. |
|
|
57
|
+
| `:force_utf8` | `false` | Force UTF-8 encoding of all lines (including headers) in the CSV file. |
|
|
58
|
+
|
|
59
|
+
### File Layout
|
|
60
|
+
|
|
61
|
+
| Option | Default | Explanation |
|
|
62
|
+
|--------|---------|-------------|
|
|
63
|
+
| `:skip_lines` | `nil` | How many lines to skip before the first line or header line is processed. |
|
|
64
|
+
| `:comment_regexp` | `nil` | Regular expression to ignore comment lines (e.g. `/\A#/`). See NOTE on CSV header. |
|
|
65
|
+
| `:chunk_size` | `nil` | If set, data is yielded in chunks of this many rows instead of all at once. Use with `SmarterCSV.each_chunk` for memory-efficient batch processing. |
|
|
66
|
+
|
|
67
|
+
### Separators
|
|
68
|
+
|
|
69
|
+
| Option | Default | Explanation |
|
|
70
|
+
|--------|---------|-------------|
|
|
71
|
+
| `:col_sep` | `:auto` | Column separator. `:auto` detects from file content (previous default was `','`). |
|
|
72
|
+
| `:row_sep` | `:auto` | Row / record separator. `:auto` detects from file content. Manual detection reads the whole file first (slow on large files). |
|
|
73
|
+
| `:auto_row_sep_chars` | `500` | How many characters to analyze when using `:row_sep => :auto`. `nil` or `0` means whole file. |
|
|
74
|
+
|
|
75
|
+
### Quoting
|
|
76
|
+
|
|
77
|
+
See [Parsing Strategy](./parsing_strategy.md) for full details on quote handling.
|
|
78
|
+
|
|
79
|
+
| Option | Default | Explanation |
|
|
80
|
+
|--------|---------|-------------|
|
|
81
|
+
| `:quote_char` | `'"'` | Quotation character. Must be a single byte. |
|
|
82
|
+
| `:quote_escaping` | `:auto` | How quotes are escaped inside quoted fields. `:auto` (default): tries backslash-escape first, falls back to RFC 4180. `:double_quotes` (RFC 4180): only `""` escapes a quote; backslash is literal. `:backslash` (MySQL/Unix): `\"` also escapes a quote. |
|
|
83
|
+
| `:quote_boundary` | `:standard` | Where quote characters are recognized as field delimiters. `:standard` (default): a quote only opens a field at a field boundary (first character of the field); mid-field quotes are literal. `:legacy`: any quote toggles quoted state regardless of position (old behavior). |
|
|
84
|
+
|
|
85
|
+
### Headers
|
|
86
|
+
|
|
87
|
+
| Option | Default | Explanation |
|
|
88
|
+
|--------|---------|-------------|
|
|
89
|
+
| `:headers_in_file` | `true` ¹ | Whether the file contains headers as the first line. ¹ If `user_provided_headers` is given, default becomes `false` unless explicitly set to `true`. |
|
|
90
|
+
| `:user_provided_headers` | `nil` | *Careful!* User-provided Array of header strings or symbols, overriding any in-file headers. Cannot be combined with `:key_mapping`. |
|
|
91
|
+
| `:duplicate_header_suffix` | `''` | Appends a number to duplicated headers, separated by this suffix. Set to `nil` to raise `DuplicateHeaders` error instead (previous behavior). |
|
|
92
|
+
| `:downcase_header` | `true` | Downcase all column headers. |
|
|
93
|
+
| `:strings_as_keys` | `false` | Use strings instead of symbols as keys in the result hashes. |
|
|
94
|
+
| `:keep_original_headers` | `false` | Keep the original headers from the CSV file as-is. Disables other flags that manipulate header fields. |
|
|
95
|
+
| `:strip_chars_from_headers` | `nil` | RegExp to remove extraneous characters from the header line (e.g. if headers are quoted). |
|
|
96
|
+
| `:missing_header_prefix` | `column_` | Prefix for auto-generated column names when extra columns are found. |
|
|
97
|
+
| `:missing_headers` | `:auto` | Behavior when a data row has more columns than the header row. `:auto` (default): auto-name extra columns using `missing_header_prefix`. `:raise`: raise `HeaderSizeMismatch` on the first row with extra columns. |
|
|
98
|
+
|
|
99
|
+
### Header Mapping & Validation
|
|
100
|
+
|
|
101
|
+
| Option | Default | Explanation |
|
|
102
|
+
|--------|---------|-------------|
|
|
103
|
+
| `:key_mapping` | `nil` | A hash mapping CSV headers to keys in the result hash. |
|
|
104
|
+
| `:silence_missing_keys` | `false` | Ignore missing keys in `key_mapping`. `true` makes all mapped keys optional; an Array makes only the listed keys optional. |
|
|
105
|
+
| `:remove_unmapped_keys` | `false` | When using `key_mapping`, remove columns that have no mapping. |
|
|
106
|
+
| `:required_keys` | `nil` | Array of key names (after header transformation) that must be present. Raises an exception if any required key is missing. No validation if `nil`. |
|
|
107
|
+
|
|
108
|
+
### Column Selection
|
|
109
|
+
|
|
110
|
+
| Option | Default | Explanation |
|
|
111
|
+
|--------|---------|-------------|
|
|
112
|
+
| `headers: { only: }` | `nil` | Keep only the listed columns in each result hash. See [Column Selection](./column_selection.md). Accepts a symbol, string, or array of either (normalized to symbols). Uses post-mapping names (after `key_mapping:` is applied). Cannot be combined with `headers: { except: }`. |
|
|
113
|
+
| `headers: { except: }` | `nil` | Remove the listed columns from each result hash. See [Column Selection](./column_selection.md). Accepts a symbol, string, or array of either (normalized to symbols). Uses post-mapping names (after `key_mapping:` is applied). Cannot be combined with `headers: { only: }`. |
|
|
114
|
+
|
|
115
|
+
### Value Transformations
|
|
116
|
+
|
|
117
|
+
| Option | Default | Explanation |
|
|
118
|
+
|--------|---------|-------------|
|
|
119
|
+
| `:strip_whitespace` | `true` | Remove whitespace before/after values and headers. |
|
|
120
|
+
| `:convert_values_to_numeric` | `true` | Convert strings containing integers or floats to the appropriate numeric type. Accepts `{except: [:key1, :key2]}` or `{only: :key3}` to limit which columns. |
|
|
121
|
+
| `:value_converters` | `nil` | Hash of `:header => ClassName`; each class must implement `self.convert(value)`. See [Value Converters](./value_converters.md). |
|
|
122
|
+
| `:remove_empty_values` | `true` | Remove key/value pairs where the value is `nil` or an empty string. |
|
|
123
|
+
| `:remove_zero_values` | `false` | Remove key/value pairs where the numeric value equals zero. |
|
|
124
|
+
| `:nil_values_matching` | `nil` | Set matching values to `nil`. Accepts a regular expression matched against the string representation of each value (e.g. `/\ANAN\z/` for NaN, `/\A#VALUE!\z/` for Excel errors). With `remove_empty_values: true` (default), nil-ified values are then removed. With `remove_empty_values: false`, the key is retained with a `nil` value. |
|
|
125
|
+
| `:remove_empty_hashes` | `true` | Remove result hashes that have no key/value pairs or all-empty values. |
|
|
126
|
+
|
|
127
|
+
### Error Handling
|
|
128
|
+
|
|
129
|
+
See [Bad Row Quarantine](./bad_row_quarantine.md) for full details.
|
|
130
|
+
|
|
131
|
+
| Option | Default | Explanation |
|
|
132
|
+
|--------|---------|-------------|
|
|
133
|
+
| `:on_bad_row` | `:raise` | Behavior when a row raises a parse error. `:raise` (default): re-raise, stopping processing. `:skip`: skip the bad row and continue. `:collect`: skip and append an error record to `reader.errors[:bad_rows]`. callable: called with the error record per bad row; processing continues. |
|
|
134
|
+
| `:collect_raw_lines` | `true` | When collecting bad rows, include the raw stitched line in the error record. |
|
|
135
|
+
| `:bad_row_limit` | `nil` | If set, raises `SmarterCSV::TooManyBadRows` after this many bad rows. |
|
|
136
|
+
| `:field_size_limit` | `nil` | Maximum size of any extracted field in bytes. `nil` means no limit. Raises `SmarterCSV::FieldSizeLimitExceeded` (handled by `on_bad_row`) if a field or accumulating multiline buffer exceeds this size. Prevents DoS from runaway quoted fields or huge inline payloads. See [Bad Row Quarantine](./bad_row_quarantine.md#limiting-field-size-field_size_limit). |
|
|
137
|
+
|
|
138
|
+
### Output & Diagnostics
|
|
139
|
+
|
|
140
|
+
| Option | Default | Explanation |
|
|
141
|
+
|--------|---------|-------------|
|
|
142
|
+
| `:with_line_numbers` | `false` | Add `:csv_line_number` to each result hash. |
|
|
143
|
+
| `:verbose` | `:normal` | Controls warning and diagnostic output. Accepted values:<br>• `:quiet` — suppress all warnings and notices (recommended for production)<br>• `:normal` — show behavioral warnings, e.g. auto-configuration notices **(default)**<br>• `:debug` — `:normal` + print computed options and per-row diagnostics to stderr<br>`nil` is silently treated as `:normal`. Passing `true` or `false` still works but is deprecated — see below. |
|
|
144
|
+
|
|
145
|
+
### Instrumentation Hooks
|
|
146
|
+
|
|
147
|
+
See [Instrumentation Hooks](./instrumentation.md) for full details and payload reference.
|
|
148
|
+
|
|
149
|
+
| Option | Default | Explanation |
|
|
150
|
+
|--------|---------|-------------|
|
|
151
|
+
| `:on_start` | `nil` | Callable invoked once before the first row is parsed. Receives a payload hash with `:input`, `:file_size`, `:col_sep`, `:row_sep`. |
|
|
152
|
+
| `:on_chunk` | `nil` | Callable invoked after each chunk is parsed (only when `chunk_size` is set). Receives `:chunk_number`, `:rows_in_chunk`, `:total_rows_so_far`. |
|
|
153
|
+
| `:on_complete` | `nil` | Callable invoked once after the entire file is exhausted. Receives `:total_rows`, `:total_chunks`, `:duration`, `:bad_rows`. |
|
|
154
|
+
|
|
155
|
+
### Performance
|
|
156
|
+
|
|
157
|
+
| Option | Default | Explanation |
|
|
158
|
+
|--------|---------|-------------|
|
|
159
|
+
| `:acceleration` | `true` | Use the C extension for parsing (MRI Ruby only). Set to `false` to force the pure-Ruby fallback (always used on JRuby/TruffleRuby). |
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## Deprecated Options
|
|
164
|
+
|
|
165
|
+
These options are still accepted but emit a deprecation warning. They will be removed in a future version.
|
|
166
|
+
|
|
167
|
+
| Option | Default | Replacement |
|
|
168
|
+
|--------|---------|-------------|
|
|
169
|
+
| `:strict` | `false` | Use `missing_headers: :raise` instead of `strict: true`, or `missing_headers: :auto` instead of `strict: false`. |
|
|
170
|
+
| `:required_headers` | `nil` | Renamed to `:required_keys`. Use `required_keys:` instead. |
|
|
171
|
+
| `:remove_values_matching` | `nil` | Renamed to `:nil_values_matching`. Use `nil_values_matching:` instead. |
|
|
172
|
+
| `verbose: true` | — | Use `verbose: :debug` instead. |
|
|
173
|
+
| `verbose: false` | — | Use `verbose: :normal` (or omit — it is the default) instead. |
|
|
112
174
|
|
|
113
175
|
-------------
|
|
114
|
-
|
|
176
|
+
|
|
177
|
+
PREVIOUS: [Batch Processing](./batch_processing.md) | NEXT: [Row and Column Separators](./row_col_sep.md) | UP: [README](../README.md)
|
data/docs/parsing_strategy.md
CHANGED
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
### Contents
|
|
3
3
|
|
|
4
4
|
* [Introduction](./_introduction.md)
|
|
5
|
+
* [Migrating from Ruby CSV](./migrating_from_csv.md)
|
|
5
6
|
* [**Parsing Strategy**](./parsing_strategy.md)
|
|
6
7
|
* [The Basic Read API](./basic_read_api.md)
|
|
7
8
|
* [The Basic Write API](./basic_write_api.md)
|
|
@@ -10,8 +11,15 @@
|
|
|
10
11
|
* [Row and Column Separators](./row_col_sep.md)
|
|
11
12
|
* [Header Transformations](./header_transformations.md)
|
|
12
13
|
* [Header Validations](./header_validations.md)
|
|
14
|
+
* [Column Selection](./column_selection.md)
|
|
13
15
|
* [Data Transformations](./data_transformations.md)
|
|
14
16
|
* [Value Converters](./value_converters.md)
|
|
17
|
+
* [Bad Row Quarantine](./bad_row_quarantine.md)
|
|
18
|
+
* [Instrumentation Hooks](./instrumentation.md)
|
|
19
|
+
* [Examples](./examples.md)
|
|
20
|
+
* [Real-World CSV Files](./real_world_csv.md)
|
|
21
|
+
* [SmarterCSV over the Years](./history.md)
|
|
22
|
+
* [Release Notes](./releases/1.16.0/changes.md)
|
|
15
23
|
|
|
16
24
|
--------------
|
|
17
25
|
|
|
@@ -95,5 +103,59 @@ SmarterCSV.process("file.csv", quote_escaping: :backslash)
|
|
|
95
103
|
|
|
96
104
|
**Note:** In `:backslash` mode, a field like `"abc\"` will raise `MalformedCSV` because the closing quote is escaped, leaving the field unclosed.
|
|
97
105
|
|
|
106
|
+
## Quote Boundary: The `quote_boundary` Option
|
|
107
|
+
|
|
108
|
+
Real-world CSV files sometimes contain quote characters in the middle of an unquoted field — for example, a measurement like `6'2"`, a product name like `Intel Core i5 "Raptor Lake"`, or a field with an apostrophe in a poorly-exported file. Under a naive quote parser, any `"` would toggle quoted state, causing the field to be misread and subsequent fields to be garbled.
|
|
109
|
+
|
|
110
|
+
The `quote_boundary` option controls where SmarterCSV recognizes a quote as a field delimiter.
|
|
111
|
+
|
|
112
|
+
### `:standard` (default)
|
|
113
|
+
|
|
114
|
+
In `:standard` mode, two rules apply:
|
|
115
|
+
|
|
116
|
+
- **Rule 1 — Opening**: a quote only opens a quoted field when it appears at the very start of the field (immediately after the column separator, or at the start of a line). A quote encountered after any other content is treated as a literal character.
|
|
117
|
+
- **Rule 2 — Closing**: a quote only closes a quoted field when it is immediately followed by a column separator, a row separator, or end of input. A quote in any other position inside a quoted field is treated as content (enabling RFC 4180 `""` doubled-quote escaping).
|
|
118
|
+
|
|
119
|
+
```ruby
|
|
120
|
+
# Mid-field quote is a literal character — no state change
|
|
121
|
+
csv = "product,size\nCore i5 \"Raptor Lake\",medium\n"
|
|
122
|
+
SmarterCSV.process(StringIO.new(csv))
|
|
123
|
+
# => [{product: 'Core i5 "Raptor Lake"', size: "medium"}]
|
|
124
|
+
|
|
125
|
+
# Quote at field start opens quoted mode normally
|
|
126
|
+
csv = "first,second\n\"hello, world\",other\n"
|
|
127
|
+
SmarterCSV.process(StringIO.new(csv))
|
|
128
|
+
# => [{first: "hello, world", second: "other"}]
|
|
129
|
+
|
|
130
|
+
# RFC 4180 doubled quotes work inside a properly opened quoted field
|
|
131
|
+
csv = "name\n\"She said \"\"hello\"\"\"\n"
|
|
132
|
+
SmarterCSV.process(StringIO.new(csv))
|
|
133
|
+
# => [{name: 'She said "hello"'}]
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
`:standard` is the default because treating mid-field quotes as literals matches how most modern CSV parsers (including Ruby's built-in `CSV` library in strict mode) handle malformed-but-common real-world data.
|
|
137
|
+
|
|
138
|
+
### `:legacy`
|
|
139
|
+
|
|
140
|
+
In `:legacy` mode, any quote character toggles quoted state regardless of its position in the field. This was the only behavior available before SmarterCSV 1.16.0.
|
|
141
|
+
|
|
142
|
+
Use `:legacy` only if you have files that were specifically produced to rely on mid-field quote toggling, and you cannot change the source. Note that a mid-field quote with an odd total count will result in an unclosed field and a `MalformedCSV` error under `:legacy` mode.
|
|
143
|
+
|
|
144
|
+
```ruby
|
|
145
|
+
SmarterCSV.process("file.csv", quote_boundary: :legacy)
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
### Interaction with `quote_escaping`
|
|
149
|
+
|
|
150
|
+
Both options apply simultaneously. `quote_boundary` governs *where* a quote is recognized as a delimiter; `quote_escaping` governs *how* a literal quote is represented *inside* a quoted field. They are independent:
|
|
151
|
+
|
|
152
|
+
| `quote_boundary` | `quote_escaping` | Effect |
|
|
153
|
+
|---|---|---|
|
|
154
|
+
| `:standard` | `:auto` (default) | Standard field boundaries + auto-detect escaping style |
|
|
155
|
+
| `:standard` | `:double_quotes` | Standard field boundaries + RFC 4180 only |
|
|
156
|
+
| `:standard` | `:backslash` | Standard field boundaries + backslash escaping |
|
|
157
|
+
| `:legacy` | `:auto` | Old toggle behavior + auto-detect escaping style |
|
|
158
|
+
|
|
98
159
|
--------------
|
|
99
|
-
|
|
160
|
+
|
|
161
|
+
PREVIOUS: [Migrating from Ruby CSV](./migrating_from_csv.md) | NEXT: [The Basic Read API](./basic_read_api.md) | UP: [README](../README.md)
|