smarter_csv 1.18.0 → 1.18.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3335e39a1c0792f01df9e95401c7f3885c49a0d64eeb9c76e5c20e25d01a62f5
4
- data.tar.gz: e43f00228777b56fc1ee0814a74acaa6a23c51fe8da6f64e42ad92fe1b54002f
3
+ metadata.gz: 8b820fa45fcde042d9a96230a53d4065b104db91e50d48629b9d43cb099a6dca
4
+ data.tar.gz: 8bff7d0845fe3b6c3eecc71c58065364809fafb87bc25f238b53c35cf5b49519
5
5
  SHA512:
6
- metadata.gz: 2abcd136f30d284c3c27cbd2b6c9782aec4235ec62cc27ea7620380ae9efc889f9f1c05c10ad957d6e7c84d65be75d35d96d4ac59ba5780dc5e65e0151c661e6
7
- data.tar.gz: 6c61062c08d0a89dea2c91a7faafd88f6d09c88015ad8c7f4facb2eb474b44b18e0c5eaffc36fa5e2885f0b2c02487497f2460d66ce2433f62c09bf42d92a4ce
6
+ metadata.gz: c8b4a0d5cb0620b542617c8426eb3004dbb7a89602b38174280636194c634d1ee2e9951eac727a1972761d5434b66bb1f1bf9c90ec2659790405cc8c34159062
7
+ data.tar.gz: f6e042464354f8714aa5504eb04c16f137806766057bc22a6db0677d65f49eb4d57f5d54d1378f106fa0899c61408c9cb890d5e20d20f1802a1225338a7b9ba8
data/CHANGELOG.md CHANGED
@@ -4,6 +4,29 @@
4
4
  > [!TIP]
5
5
  > **Upgrading?** The [SmarterCSV Upgrade Wizard](https://tilo.github.io/smarter_csv/upgrade_wizard.html) walks you through what (if anything) you need to change for your specific version. Most steps do not require any changes.
6
6
 
7
+ ## 1.18.1 (2026-06-30)
8
+
9
+ ### Bug Fixes
10
+
11
+ - **Portable builds by default — fixes the "Illegal instruction" crash on heterogeneous CPUs ([#343](https://github.com/tilo/smarter_csv/issues/343)).**
12
+
13
+ Since 1.14.3 the C extension was compiled with `-march=native` on every platform except Apple Silicon, baking-in the build host's CPU instructions (e.g. AVX-512).
14
+ A binary built on one machine then could encounter `Illegal instruction` when run on a CPU lacking those instructions — common when the build host differs from the run host (CI/build servers, Docker images, mixed-hardware fleets).
15
+
16
+ The C extension is now built **portable** by default (no host-specific instructions). Thanks to [@paholg](https://github.com/paholg) for the report.
17
+
18
+ ### New Features
19
+
20
+ - **`SMARTER_CSV_PERFORMANCE` build option** (`portable` default, `tuned`, or `max`)
21
+
22
+ | Level | Flags added | Portable? | Use when |
23
+ |----------------------|-------------------------------------------|----------------------------------|---------------------------------------|
24
+ | `portable` (default) | none | Yes, any CPU of the arch | Build host may differ from run host |
25
+ | `tuned` | `-mtune=native` | Yes, instruction scheduling only | Build and run hosts share a microarch |
26
+ | `max` | `-march=native`, or `-mcpu=native` on ARM | No, host instruction optimization| Build host and run host are the same |
27
+
28
+ See the [Introduction](docs/_introduction.md#build-time-performance-tuning-smarter_csv_performance) for details.
29
+
7
30
  ## 1.18.0 (2026-06-17)
8
31
 
9
32
  This release is focused on both performance and the introduction of automatic conversion of decimals to big_decimal or float, preserving the precision, and also supporting scientific notation.
data/README.md CHANGED
@@ -297,6 +297,27 @@ Or install it yourself as:
297
297
  $ gem install smarter_csv
298
298
  ```
299
299
 
300
+ ### CPU Optimization (`SMARTER_CSV_PERFORMANCE`)
301
+
302
+ The C extension is compiled when the gem is installed. By default it is built **portable**: it uses no CPU-specific instructions, so a binary built on one machine runs on any other CPU of the same architecture. Set `SMARTER_CSV_PERFORMANCE` at install time to trade portability for speed:
303
+
304
+
305
+ | Level | Flags added | Portable? | Use when |
306
+ |----------------------|-------------------------------------------|----------------------------------|---------------------------------------|
307
+ | `portable` (default) | none | Yes, any CPU of the arch | Build host may differ from run host |
308
+ | `tuned` | `-mtune=native` | Yes, instruction scheduling only | Build and run hosts share a microarch |
309
+ | `max` | `-march=native`, or `-mcpu=native` on ARM | No, host instruction optimization| Build host and run host are the same |
310
+
311
+ `max` enables host-specific instructions, so a binary built with it can crash with `Illegal instruction` if it later runs on a CPU that lacks them (for example, built on an AVX-512 machine and run on one without). `tuned` only changes instruction scheduling, never the instruction set, so it stays portable. Every flag is probed against your compiler at build time and skipped if unsupported, so an unavailable flag never breaks the build.
312
+
313
+ ```bash
314
+ SMARTER_CSV_PERFORMANCE=tuned gem install smarter_csv # portable, tuned for this machine's microarchitecture
315
+ SMARTER_CSV_PERFORMANCE=max gem install smarter_csv # fastest, NOT portable — only when you build on the machine you run on
316
+ SMARTER_CSV_PERFORMANCE=tuned bundle install # same, under Bundler
317
+ ```
318
+
319
+ For a fixed baseline instead of `native` (e.g. a portable-but-newer instruction set), pass flags directly via `CFLAGS`, which the build also honors: `CFLAGS="-march=x86-64-v2" gem install smarter_csv`.
320
+
300
321
  ## Documentation
301
322
 
302
323
  * [Introduction](docs/_introduction.md)
data/UPGRADING.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  > [!TIP]
4
4
  > Prefer the interactive [Upgrade Wizard](https://tilo.github.io/smarter_csv/upgrade_wizard.html) for a guided walk-through with Yes/No questions.
5
- > This document is auto-generated from `CHANGELOG.md` and `docs/upgrade_path.json` by `bin/gen-upgrading-md`.
5
+ > This document is auto-generated from `CHANGELOG.md` and `docs/upgrade_path.json` by `bin/generate-upgrading-md`.
6
6
 
7
7
  ## How to use this guide
8
8
 
@@ -12,25 +12,40 @@
12
12
 
13
13
  Prefer an interactive walk-through? The [Upgrade Wizard](https://tilo.github.io/smarter_csv/upgrade_wizard.html) asks one question at a time and only shows the migration steps that apply to your code.
14
14
 
15
- **Latest release:** `1.17.3` (in the `1.17.x` series).
15
+ **Latest release:** `1.18.1` (in the `1.18.x` series).
16
16
 
17
17
  ---
18
18
 
19
- ## 1.17.x — latest series
19
+ ## 1.18.x — latest series
20
20
 
21
21
  **Versions in this series:**
22
- [1.17.0, 1.17.1, 1.17.2, 1.17.3]
22
+ [1.18.0, 1.18.1]
23
23
 
24
- **Latest release:** `1.17.3`
24
+ > ⚠️ **In-series notes** worth checking:
25
+ > - **1.18.0:** This version is particularly interesting if you have geolocation, scientific, or high-precision data.
26
+
27
+ **Latest release:** `1.18.1`
25
28
 
26
29
  Update your Gemfile to:
27
30
 
28
31
  ```ruby
29
- gem 'smarter_csv', '~> 1.17.0'
32
+ gem 'smarter_csv', '~> 1.18.0'
30
33
  ```
31
34
 
32
35
  Then run `bundle update smarter_csv`.
33
36
 
37
+ ## Series 1.17 → 1.18
38
+
39
+ **Coming from any 1.17 version:**
40
+ [1.17.0, 1.17.1, 1.17.2, 1.17.3, 1.17.4]
41
+
42
+ **Upgrading to 1.18.x** (latest: `1.18.1`):
43
+
44
+ - **If** you build and run the gem on the same machine (or a fleet of identical CPUs) and want the previous native-optimized build for maximum speed:
45
+ → set `SMARTER_CSV_PERFORMANCE=max` (or `tuned`) at install time — 1.18.1 builds <strong>portable</strong> by default (no host-specific CPU instructions) to fix an `Illegal instruction` crash when a binary built on one CPU is run on another (<a href="https://github.com/tilo/smarter_csv/issues/343">#343</a>). The default is safe everywhere; the env var opts back into host optimization.
46
+
47
+ ---
48
+
34
49
  ## Series 1.16 → 1.17
35
50
 
36
51
  **Coming from any 1.16 version:**
@@ -40,7 +55,7 @@ Then run `bundle update smarter_csv`.
40
55
  > - **1.16.1:** **Fibers:** `SmarterCSV.errors` uses `Thread.current` for storage, which is **shared across all fibers running in the same thread**. If you process CSV files concurrently in fibers (e.g. with `Async`, `Falcon`, or manual `Fiber` scheduling), `SmarterCSV.errors` may return stale or wrong results. **Use `SmarterCSV::Reader` directly** — errors are scoped to the reader instance and are always correct regardless of fiber context.
41
56
  > - **1.16.2:** If your code references auto-generated keys for blank headers, update those to use the absolute column position.
42
57
 
43
- **Upgrading to 1.17.x** (latest: `1.17.3`): you can upgrade all the way — no code changes needed.
58
+ **Upgrading to 1.17.x** (latest: `1.17.4`): you can upgrade all the way — no code changes needed.
44
59
 
45
60
  ---
46
61
 
@@ -75,6 +75,28 @@ SmarterCSV was created to solve exactly these problems: nightly imports of large
75
75
  * **CSV writing:**
76
76
  `SmarterCSV.generate` writes arrays of hashes to CSV, with support for header renaming and value converters on output. See [The Basic Write API](./basic_write_api.md).
77
77
 
78
+ ## Build-Time Performance Tuning (`SMARTER_CSV_PERFORMANCE`)
79
+
80
+ The C extension is compiled when the gem is installed. By default it is built **portable**: it uses no CPU-specific instructions, so a binary compiled on one machine runs on any other CPU of the same architecture. This matters whenever the machine that builds the gem differs from the machine that runs it — a CI or build server, a Docker image moved between hosts, or a mixed-hardware fleet. A build that bakes in instructions the run host lacks (such as AVX-512) would otherwise crash with `Illegal instruction`.
81
+
82
+ Set `SMARTER_CSV_PERFORMANCE` at install time to trade portability for speed:
83
+
84
+ | Level | Flags added | Portable? | Use when |
85
+ |----------------------|-------------------------------------------|----------------------------------|---------------------------------------|
86
+ | `portable` (default) | none | Yes, any CPU of the arch | Build host may differ from run host |
87
+ | `tuned` | `-mtune=native` | Yes, instruction scheduling only | Build and run hosts share a microarch |
88
+ | `max` | `-march=native`, or `-mcpu=native` on ARM | No, host instruction optimization| Build host and run host are the same |
89
+
90
+ `tuned` only changes instruction scheduling, never the instruction set, so it stays portable — and it pays off when the build and run hosts share a microarchitecture (the same chip, or a fleet of identical instances). `max` enables host-specific instructions and is the fastest, but a binary built with it can crash on a different CPU. Every flag is probed against your compiler at build time and skipped if unsupported, so an unavailable flag never breaks the build.
91
+
92
+ ```bash
93
+ SMARTER_CSV_PERFORMANCE=tuned gem install smarter_csv # portable, tuned for this machine's microarchitecture
94
+ SMARTER_CSV_PERFORMANCE=max gem install smarter_csv # fastest, NOT portable — only when you build on the machine you run on
95
+ SMARTER_CSV_PERFORMANCE=tuned bundle install # same, under Bundler
96
+ ```
97
+
98
+ For a fixed baseline instead of `native` (e.g. a portable-but-newer instruction set), pass flags directly via `CFLAGS`, which the build also honors: `CFLAGS="-march=x86-64-v2" gem install smarter_csv`.
99
+
78
100
  ---------------
79
101
 
80
102
  NEXT: [Migrating from Ruby CSV](./migrating_from_csv.md) | UP: [README](../README.md)
@@ -1,6 +1,6 @@
1
1
  {
2
- "latest": "1.17",
3
- "latest_release": "1.17.2",
2
+ "latest": "1.18",
3
+ "latest_release": "1.18.0",
4
4
  "path": {
5
5
  "1.0": {
6
6
  "to": "1.1",
@@ -170,6 +170,17 @@
170
170
  "to": "1.17",
171
171
  "latest_release": "1.16.6",
172
172
  "actions": []
173
+ },
174
+ "1.17": {
175
+ "to": "1.18",
176
+ "latest_release": "1.17.4",
177
+ "actions": [
178
+ {
179
+ "if": "you build and run the gem on the same machine (or a fleet of identical CPUs) and want the previous native-optimized build for maximum speed",
180
+ "then": "set <code>SMARTER_CSV_PERFORMANCE=max</code> (or <code>tuned</code>) at install time &mdash; 1.18.1 builds <strong>portable</strong> by default (no host-specific CPU instructions) to fix an <code>Illegal instruction</code> crash when a binary built on one CPU is run on another (<a href=\"https://github.com/tilo/smarter_csv/issues/343\">#343</a>). The default is safe everywhere; the env var opts back into host optimization."
181
+ }
182
+ ],
183
+ "note": "<strong>Bonus:</strong> 1.18 automatically converts scientific notation (e.g. <code>1e3</code>, <code>6.022e23</code>) to numbers, and preserves full precision on long decimals by returning a <code>BigDecimal</code> when a value carries more than 16 significant digits (<code>decimal_precision: :auto</code>, the default). Nothing to change &mdash; this just works."
173
184
  }
174
185
  }
175
186
  }
@@ -94,6 +94,15 @@ input[type="text"] {
94
94
  color: var(--green);
95
95
  font-weight: 600;
96
96
  }
97
+ .benefit {
98
+ background: var(--green-soft);
99
+ border: 1px solid #b6dab6;
100
+ border-radius: 8px;
101
+ padding: 1em 1.25em;
102
+ margin-bottom: 1em;
103
+ color: var(--fg);
104
+ }
105
+ .benefit strong { color: var(--green); }
97
106
  .check {
98
107
  padding: 0.85em 0;
99
108
  border-top: 1px solid var(--border);
@@ -229,7 +238,7 @@ function renderHop(series, originalVersion) {
229
238
  if (hop.actions.length === 0) {
230
239
  body = `<div class="hop dropin">
231
240
  You can upgrade directly to version ${targetRelease}. No changes needed.
232
- </div>`;
241
+ </div>${hop.note ? `<div class="benefit">${hop.note}</div>` : ""}`;
233
242
  } else {
234
243
  body = `<div class="hop">
235
244
  ${hop.actions.map((a, i) => `
@@ -324,6 +333,7 @@ function renderHop(series, originalVersion) {
324
333
  from: series,
325
334
  to: hop.to,
326
335
  dropIn: hop.actions.length === 0,
336
+ note: hop.note || null,
327
337
  matched
328
338
  });
329
339
 
@@ -377,7 +387,8 @@ function renderSummary() {
377
387
  const targetRelease = latestReleaseFor(d.to);
378
388
  const heading = `<p class="summary-hop-heading"><strong>${d.from}.x &rarr; ${targetRelease}</strong></p>`;
379
389
  if (d.dropIn) {
380
- return `<div class="summary-hop">${heading}<p class="muted">No code changes needed for this step.</p></div>`;
390
+ const noteHTML = d.note ? `<p>${d.note}</p>` : "";
391
+ return `<div class="summary-hop">${heading}<p class="muted">No code changes needed for this step.</p>${noteHTML}</div>`;
381
392
  }
382
393
  if (d.matched.length === 0) {
383
394
  return `<div class="summary-hop">${heading}<p class="muted">None of the conditions in this step applied to your code.</p></div>`;
@@ -0,0 +1,55 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SmarterCSV
4
+ # Pure (mkmf-free) selection of CPU-optimization flags from the
5
+ # SMARTER_CSV_PERFORMANCE environment variable. Kept separate from extconf.rb
6
+ # so the logic can be unit-tested without invoking a compiler.
7
+ #
8
+ # Levels:
9
+ # portable - no host-specific flags. The binary runs on any CPU of the same
10
+ # architecture. The safe default: a binary built here will not
11
+ # crash with "Illegal instruction" on an older/different CPU.
12
+ # tuned - -mtune=native: tunes instruction scheduling for the build host's
13
+ # microarchitecture WITHOUT changing the instruction set, so the
14
+ # binary stays portable. A real win when build and run hosts share
15
+ # a microarchitecture (same chip or a homogeneous fleet).
16
+ # max - host-specific instructions: -march=native, or -mcpu=native on
17
+ # ARM/Clang where -march=native is rejected. Fastest, but NOT
18
+ # portable -- may crash on a CPU lacking the build host's
19
+ # instructions. Use only when build host and run host match.
20
+ #
21
+ # `accepts` is a predicate (in the real build, a wrapper over mkmf's
22
+ # try_compile) returning true when the compiler accepts a given flag; each
23
+ # candidate is probed so an unsupported flag is skipped rather than breaking
24
+ # the build.
25
+ module CpuFlags
26
+ LEVELS = %w[portable tuned max].freeze
27
+
28
+ # Candidate flags per level, in preference order. The first one the compiler
29
+ # accepts wins. `max` degrades march -> mcpu -> mtune; tuned only ever
30
+ # considers -mtune=native (never an instruction-set flag).
31
+ CANDIDATES = {
32
+ 'portable' => [].freeze,
33
+ 'tuned' => ['-mtune=native'].freeze,
34
+ 'max' => ['-march=native', '-mcpu=native', '-mtune=native'].freeze,
35
+ }.freeze
36
+
37
+ # Returns a Hash: { level: String, flags: Array<String>, warning: String|nil }.
38
+ def self.select(raw_level, accepts:)
39
+ level, warning = normalize(raw_level)
40
+ chosen = CANDIDATES[level].find { |flag| accepts.call(flag) }
41
+ { level: level, flags: chosen ? [chosen] : [], warning: warning }
42
+ end
43
+
44
+ # Normalizes the env value to a known level. Unknown values fall back to
45
+ # 'portable' (a typo can then only ever be slower, never non-portable) and
46
+ # return a warning naming the bad value and the fallback.
47
+ def self.normalize(raw_level)
48
+ value = raw_level.to_s.strip.downcase
49
+ return ['portable', nil] if value.empty?
50
+ return [value, nil] if LEVELS.include?(value)
51
+
52
+ ['portable', "SMARTER_CSV_PERFORMANCE=#{raw_level.inspect} is not one of #{LEVELS.join('|')}; using 'portable'."]
53
+ end
54
+ end
55
+ end
@@ -2,6 +2,7 @@
2
2
 
3
3
  require 'mkmf'
4
4
  require "rbconfig"
5
+ require_relative 'cpu_flags'
5
6
 
6
7
  if RbConfig::MAKEFILE_CONFIG["CFLAGS"].include?("-g -O3")
7
8
  fixed_CFLAGS = RbConfig::MAKEFILE_CONFIG["CFLAGS"].sub("-g -O3", "-O3 $(cflags)")
@@ -9,11 +10,31 @@ if RbConfig::MAKEFILE_CONFIG["CFLAGS"].include?("-g -O3")
9
10
  RbConfig::MAKEFILE_CONFIG["CFLAGS"] = fixed_CFLAGS
10
11
  end
11
12
 
13
+ # Probe whether the compiler accepts a flag by compiling a trivial program with
14
+ # it. Lets us skip flags the toolchain rejects (e.g. -march=native on Clang/ARM,
15
+ # or GCC-only flags on MSVC) instead of breaking the build. Replaces the old
16
+ # RUBY_PLATFORM string guesses: ask the actual compiler, don't infer from the OS.
17
+ def compiler_accepts?(flag)
18
+ try_compile("int main(void){return 0;}", flag)
19
+ end
20
+
12
21
  optflags = "-O3 -flto -fomit-frame-pointer -DNDEBUG".dup
13
- optflags << " -march=native" unless RUBY_PLATFORM.start_with?("arm64-darwin")
22
+
23
+ # CPU optimization level, set via SMARTER_CSV_PERFORMANCE (default: portable).
24
+ # See cpu_flags.rb for the full description of each level.
25
+ #
26
+ # portable (default) - no host-specific flags; runs on any CPU of the same arch.
27
+ # tuned - -mtune=native; host scheduling tuning, still portable.
28
+ # max - host instruction set (-march/-mcpu native); fastest, but
29
+ # NOT portable -- may crash on a CPU lacking those instructions.
30
+ cpu = SmarterCSV::CpuFlags.select(ENV["SMARTER_CSV_PERFORMANCE"], accepts: method(:compiler_accepts?))
31
+ warn(cpu[:warning]) if cpu[:warning]
32
+ cpu[:flags].each { |flag| optflags << " #{flag}" }
33
+ puts("SmarterCSV performance level: #{cpu[:level]} -- optflags: #{optflags}")
34
+
14
35
  # -fno-semantic-interposition: GCC/Clang only (not MSVC). Allows intra-library
15
36
  # calls to bypass the PLT on Linux and enables more aggressive LTO inlining.
16
- optflags << " -fno-semantic-interposition" unless RUBY_PLATFORM.include?("mswin")
37
+ optflags << " -fno-semantic-interposition" if compiler_accepts?("-fno-semantic-interposition")
17
38
 
18
39
  append_cflags('-Wno-compound-token-split-by-macro')
19
40
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- VERSION = "1.18.0"
4
+ VERSION = "1.18.1"
5
5
  end
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.18.0
4
+ version: 1.18.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  bindir: bin
9
9
  cert_chain: []
10
- date: 2026-06-19 00:00:00.000000000 Z
10
+ date: 2026-07-01 00:00:00.000000000 Z
11
11
  dependencies:
12
12
  - !ruby/object:Gem::Dependency
13
13
  name: bigdecimal
@@ -83,6 +83,7 @@ files:
83
83
  - docs/upgrade_wizard.html
84
84
  - docs/value_converters.md
85
85
  - docs/warnings.md
86
+ - ext/smarter_csv/cpu_flags.rb
86
87
  - ext/smarter_csv/extconf.rb
87
88
  - ext/smarter_csv/smarter_csv.c
88
89
  - ext/smarter_csv/vendor/LICENSE-fast_float-MIT
@@ -140,7 +141,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
140
141
  - !ruby/object:Gem::Version
141
142
  version: '0'
142
143
  requirements: []
143
- rubygems_version: 3.6.9
144
+ rubygems_version: 4.0.15
144
145
  specification_version: 4
145
146
  summary: Fastest end-to-end CSV ingestion for Ruby with smart defaults and Rails-ready
146
147
  hash output