RubyGems - smarter_csv - Versions diffs - 1.15.2 → 1.16.1 - Mend

smarter_csv 1.15.2 → 1.16.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

checksums.yaml +4 -4
data/.rspec +2 -0
data/.rubocop.yml +9 -0
data/CHANGELOG.md +112 -1
data/CONTRIBUTORS.md +4 -1
data/Gemfile +1 -0
data/README.md +129 -27
data/docs/_introduction.md +45 -24
data/docs/bad_row_quarantine.md +342 -0
data/docs/basic_read_api.md +152 -9
data/docs/basic_write_api.md +475 -59
data/docs/batch_processing.md +162 -4
data/docs/column_selection.md +184 -0
data/docs/data_transformations.md +163 -29
data/docs/examples.md +340 -46
data/docs/header_transformations.md +94 -12
data/docs/header_validations.md +57 -18
data/docs/history.md +119 -0
data/docs/instrumentation.md +166 -0
data/docs/migrating_from_csv.md +565 -0
data/docs/options.md +151 -87
data/docs/parsing_strategy.md +64 -1
data/docs/real_world_csv.md +263 -0
data/docs/releases/1.16.0/benchmarks.md +223 -0
data/docs/releases/1.16.0/changes.md +273 -0
data/docs/releases/1.16.0/performance_notes.md +114 -0
data/docs/row_col_sep.md +15 -5
data/docs/ruby_csv_pitfalls.md +514 -0
data/docs/value_converters.md +194 -57
data/ext/smarter_csv/extconf.rb +3 -0
data/ext/smarter_csv/smarter_csv.c +1017 -82
data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png +0 -0
data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.svg +108 -0
data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.png +0 -0
data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.svg +141 -0
data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.png +0 -0
data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.svg +139 -0
data/lib/smarter_csv/errors.rb +8 -0
data/lib/smarter_csv/file_io.rb +1 -1
data/lib/smarter_csv/hash_transformations.rb +14 -13
data/lib/smarter_csv/header_transformations.rb +21 -2
data/lib/smarter_csv/headers.rb +2 -1
data/lib/smarter_csv/options.rb +124 -7
data/lib/smarter_csv/parser.rb +358 -74
data/lib/smarter_csv/reader.rb +494 -46
data/lib/smarter_csv/version.rb +1 -1
data/lib/smarter_csv/writer.rb +71 -19
data/lib/smarter_csv.rb +134 -13
data/smarter_csv.gemspec +20 -10
metadata +38 -80

data/docs/examples.md CHANGED Viewed

@@ -2,6 +2,8 @@
 ### Contents
   * [Introduction](./_introduction.md)
+  * [Migrating from Ruby CSV](./migrating_from_csv.md)
+  * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
   * [Parsing Strategy](./parsing_strategy.md)
   * [The Basic Read API](./basic_read_api.md)
   * [The Basic Write API](./basic_write_api.md)
@@ -10,70 +12,362 @@
   * [Row and Column Separators](./row_col_sep.md)
   * [Header Transformations](./header_transformations.md)
   * [Header Validations](./header_validations.md)
+  * [Column Selection](./column_selection.md)
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
---------------
+  * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Instrumentation Hooks](./instrumentation.md)
+  * [**Examples**](./examples.md)
+  * [Real-World CSV Files](./real_world_csv.md)
+  * [SmarterCSV over the Years](./history.md)
+  * [Release Notes](./releases/1.16.0/changes.md)
+--------------
 # Examples
-Here are some examples to demonstrate the versatility of SmarterCSV.
+**Rescue from `SmarterCSV::Error` (recommended):** SmarterCSV auto-detects row and column separators. In rare cases detection fails and raises an exception (e.g. `NoColSepDetected`). Rescuing from `SmarterCSV::Error` ensures your application handles unexpected CSV formats gracefully.
+---
-**It is generally recommended to rescue `SmarterCSV::Error` or it's sub-classes.**
+1. [CSV → Array of Hashes](#example-1-csv--array-of-hashes)
+2. [Parsing a CSV String](#example-2-parsing-a-csv-string)
+3. [Key Mapping and Column Selection](#example-3-key-mapping-and-column-selection)
+4. [Encoding and Preamble Skip](#example-4-encoding-and-preamble-skip)
+5. [Value Converters](#example-5-value-converters)
+6. [Header Validation](#example-6-header-validation)
+7. [Bad Row Handling](#example-7-bad-row-handling)
+8. [Writing CSV](#example-8-writing-csv)
+9. [Using `each` and `each_chunk` Enumerators](#example-9-using-each-and-each_chunk-enumerators)
+10. [Importing into a Database](#example-10-importing-into-a-database)
+11. [Batch Processing with Sidekiq](#example-11-batch-processing-with-sidekiq)
+12. [Resumable CSV Import with Rails ActiveJob](#example-12-resumable-csv-import-with-rails-activejob-rails-81)
+13. [Instrumentation](#example-13-instrumentation)
-By default SmarterCSV determines the `row_sep` and `col_sep` values automatically. In cases where the automatic detection fails, an exception will be raised, e.g. `NoColSepDetected`. Rescuing from these exceptions will make sure that you don't miss processing CSV files, in case users upload CSV files with unexpected formats.
+---
-In rare cases you may have to manually set these values, after going through the troubleshooting procedure described above.
+## Example 1: CSV → Array of Hashes
-## Example 1a: How SmarterCSV processes CSV-files as array of hashes:
-Please note how each hash contains only the keys for columns with non-null values.
+Each hash only contains keys for columns with non-nil, non-empty values — columns with blank entries are omitted automatically:
 ```ruby
-     $ cat pets.csv
-     first name,last name,dogs,cats,birds,fish
-     Dan,McAllister,2,,,
-     Lucy,Laweless,,5,,
-     Miles,O'Brian,,,,21
-     Nancy,Homes,2,,1,
-     $ irb
-     > require 'smarter_csv'
-      => true
-     > pets_by_owner = SmarterCSV.process('/tmp/pets.csv')
-      => [ {:first_name=>"Dan", :last_name=>"McAllister", :dogs=>"2"},
-           {:first_name=>"Lucy", :last_name=>"Laweless", :cats=>"5"},
-           {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
-           {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
-         ]
+$ cat pets.csv
+first name,last name,dogs,cats,birds,fish
+Dan,McAllister,2,,,
+Lucy,Laweless,,5,,
+Miles,O'Brian,,,,21
+Nancy,Homes,2,,1,
+$ irb
+> require 'smarter_csv'
+> pets_by_owner = SmarterCSV.process('pets.csv')
+ => [ {first_name: "Dan",   last_name: "McAllister", dogs: 2},
+      {first_name: "Lucy",  last_name: "Laweless",   cats: 5},
+      {first_name: "Miles", last_name: "O'Brian",    fish: 21},
+      {first_name: "Nancy", last_name: "Homes",      dogs: 2, birds: 1}
+    ]
 ```
+---
+## Example 2: Parsing a CSV String
+Use `SmarterCSV.parse` to parse a CSV string directly — no file needed. Useful in tests, API responses, or when the CSV arrives as a string in memory:
-## Example 3: Populate a MySQL or MongoDB Database with SmarterCSV:
 ```ruby
-    # without using chunks:
-    filename = '/tmp/some.csv'
-    options = {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
-    n = SmarterCSV.process(filename, options) do |array|
-          # we're passing a block in, to process each resulting hash / =row (the block takes array of hashes)
-          # when chunking is not enabled, there is only one hash in each array
-          MyModel.create( array.first )
-    end
+csv_string = <<~CSV
+  name,age,city
+  Alice,30,New York
+  Bob,25,Chicago
+CSV
-     => returns number of chunks / rows we processed
+data = SmarterCSV.parse(csv_string)
+# => [{name: "Alice", age: 30, city: "New York"}, {name: "Bob", age: 25, city: "Chicago"}]
 ```
-## Example 4: Processing a CSV File, and inserting batch jobs in Sidekiq:
-The block receives an optional second parameter `chunk_index` (0-based) for progress tracking:
+See [The Basic Read API](./basic_read_api.md) and [Migrating from Ruby CSV](./migrating_from_csv.md).
+---
+## Example 3: Key Mapping and Column Selection
+Rename headers and drop unwanted columns in one pass:
+```ruby
+options = {
+  key_mapping: {
+    first_name: :fname,
+    last_name:  :lname,
+    dob:        :birth_date,
+    ssn:        nil,          # drop this column entirely
+  },
+}
+data = SmarterCSV.process('people.csv', options)
+# => [{fname: "Alice", lname: "Smith", birth_date: "1990-05-14"}, ...]
+#  ↑ :ssn is gone; original CSV headers remapped to your domain names
+```
+Keep only specific columns using `headers: { only: }`:
+```ruby
+data = SmarterCSV.process('people.csv', headers: { only: [:name, :email] })
+# => [{name: "Alice", email: "alice@example.com"}, ...]
+```
+See [Header Transformations](./header_transformations.md) and [Column Selection](./column_selection.md).
+---
+## Example 4: Encoding and Preamble Skip
+Handle non-UTF-8 files and metadata rows before the header:
+```ruby
+# Bank statement export: Windows-1252, 3 preamble rows, then header
+data = SmarterCSV.process('statement.csv',
+  file_encoding: 'windows-1252',
+  skip_lines:    3)
+# European lab instrument export: semicolon-separated, Latin-1
+data = SmarterCSV.process('results.csv',
+  file_encoding: 'iso-8859-1',
+  col_sep:       :auto)   # :auto detects the semicolon
+```
+See [Row and Column Separators](./row_col_sep.md) and [Real-World CSV Files](./real_world_csv.md).
+---
+## Example 5: Value Converters
+Transform raw strings into typed values — dates, booleans, currency:
 ```ruby
-    filename = '/tmp/input.csv' # CSV file containing ids or data to process
-    options = { :chunk_size => 100 }
-    n = SmarterCSV.process(filename, options) do |chunk, chunk_index|
-      puts "Queueing chunk #{chunk_index} with #{chunk.size} records..."
-      Sidekiq::Client.push_bulk(
-        'class' => SidekiqIndividualWorkerClass,
-        'args' => chunk,
-      )
-      # OR:
-      # SidekiqBatchWorkerClass.process_async(chunk) # pass an array of hashes to Sidekiq workers for parallel processing
+require 'date'
+data = SmarterCSV.process('records.csv',
+  value_converters: {
+    # Parse US date format
+    dob:    ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil },
+    # Strip currency symbol and convert to Float
+    price:  ->(v) { v&.delete('$,')&.to_f },
+    # Boolean from various representations
+    active: ->(v) { v&.match?(/\Atrue\z/i) },
+  })
+data.first[:dob]    # => #<Date: 1990-05-14>
+data.first[:price]  # => 44.5
+data.first[:active] # => true
+```
+Combining with `nil_values_matching` to clean sentinel values before conversion:
+```ruby
+data = SmarterCSV.process('export.csv',
+  nil_values_matching: /\A(N\/A|NULL|#N\/A)\z/i,
+  value_converters: {
+    score: ->(v) { v&.to_f },   # v is nil for N/A rows — guard with &.
+  })
+```
+See [Value Converters](./value_converters.md).
+---
+## Example 6: Header Validation
+Raise early if required columns are missing, before processing any data rows:
+```ruby
+begin
+  data = SmarterCSV.process('transactions.csv',
+    required_keys: [:account_id, :amount, :currency])
+rescue SmarterCSV::MissingKeys => e
+  puts "CSV is missing required columns: #{e.keys.join(', ')}"
+  # => "CSV is missing required columns: currency"
+end
+```
+See [Header Validations](./header_validations.md).
+---
+## Example 7: Bad Row Handling
+Collect parse errors without stopping the import:
+```ruby
+reader = SmarterCSV::Reader.new('data.csv', on_bad_row: :collect)
+good_rows = reader.process
+bad = reader.errors[:bad_rows]
+puts "Imported #{good_rows.size} rows, #{bad.size} bad rows"
+bad.each do |rec|
+  puts "Line #{rec[:file_line_number]}: #{rec[:error_message]}"
+  puts "  Raw: #{rec[:raw_line]}"
+end
+```
+Cap the number of tolerated bad rows and limit field sizes to guard against malformed input:
+```ruby
+SmarterCSV.process('untrusted.csv',
+  on_bad_row:       :skip,
+  bad_row_limit:    10,
+  field_size_limit: 4096)
+```
+See [Bad Row Quarantine](./bad_row_quarantine.md).
+---
+## Example 8: Writing CSV
+```ruby
+records = [
+  { name: "Alice", age: 30, city: "New York" },
+  { name: "Bob",   age: 25, city: "Chicago"  },
+]
+SmarterCSV.generate('output.csv') do |csv|
+  records.each { |r| csv << r }
+end
+# output.csv:
+# name,age,city
+# Alice,30,New York
+# Bob,25,Chicago
+```
+Writing with header renaming and value converters:
+```ruby
+require 'date'
+SmarterCSV.generate('report.csv',
+  map_headers:      { name: 'Full Name', dob: 'Date of Birth' },
+  value_converters: { dob: ->(v) { v&.strftime('%m/%d/%Y') } },
+) do |csv|
+  User.find_each { |u| csv << { name: u.full_name, dob: u.dob } }
+end
+```
+See [The Basic Write API](./basic_write_api.md).
+---
+## Example 9: Using `each` and `each_chunk` Enumerators
+The modern API gives you full Enumerable power without loading the whole file:
+```ruby
+# each — one hash per row
+reader = SmarterCSV::Reader.new('data.csv')
+reader.each { |hash| MyModel.upsert(hash) }
+puts reader.headers.inspect   # accessible after processing
+# Enumerable methods
+active_users = reader.select { |h| h[:status] == 'active' }
+names        = reader.map    { |h| h[:name] }
+# Lazy — stop early without reading the whole file
+first_ten_active = reader.lazy.select { |h| h[:active] }.first(10)
+# each_slice — manual batching without chunk_size
+reader.each_slice(500) { |batch| MyModel.insert_all(batch) }
+```
+See [Batch Processing](./batch_processing.md) and [The Basic Read API](./basic_read_api.md).
+---
+## Example 10: Importing into a Database
+```ruby
+filename = '/tmp/some.csv'
+options = { key_mapping: { unwanted_row: nil, old_row_name: :new_name } }
+n = SmarterCSV.process(filename, options) do |array|
+  MyModel.create(array.first)
+end
+# => returns number of rows processed
+```
+---
+## Example 11: Batch Processing with Sidekiq
+Processing in chunks reduces memory usage and enables parallel processing. The block receives the chunk as an optional second parameter:
+```ruby
+filename = '/tmp/input.csv'
+n = SmarterCSV.process(filename, chunk_size: 100) do |chunk, chunk_index|
+  puts "Queueing chunk #{chunk_index} with #{chunk.size} records..."
+  Sidekiq::Client.push_bulk(
+    'class' => SidekiqWorkerClass,
+    'args'  => chunk,
+  )
+end
+# => returns number of chunks
+```
+See [Batch Processing](./batch_processing.md).
+---
+## Example 12: Resumable CSV Import with Rails ActiveJob (Rails 8.1+)
+Rails 8.1 introduced `ActiveJob::Continuable`, which lets a job pause and resume from exactly where it stopped — for example during a deployment or queue drain.
+```ruby
+# app/jobs/import_csv_job.rb
+class ImportCsvJob < ApplicationJob
+  include ActiveJob::Continuable
+  def perform(file_path)
+    step :import_rows do |step|
+      SmarterCSV.process(file_path, chunk_size: 500) do |chunk, chunk_index|
+        next if chunk_index < step.cursor.to_i  # skip already-processed chunks on resume
+        MyModel.import!(chunk)
+        step.set! chunk_index + 1
+      end
     end
-    => returns number of chunks
+  end
+end
+```
+- `step.cursor` starts as `nil` (→ `0`), so the first run processes all chunks.
+- If interrupted after chunk 7, Rails persists the cursor as `8`.
+- On the next run chunks 0–7 are skipped quickly via `next`; processing resumes from chunk 8.
+> Requires Rails 8.1+ and a queue adapter that supports graceful shutdown (Sidekiq, Solid Queue).
+---
+## Example 13: Instrumentation
+```ruby
+SmarterCSV.process('large_import.csv',
+  chunk_size: 1000,
+  on_start: ->(info) {
+    Rails.logger.info "Import started: #{info[:input]} (#{info[:file_size]} bytes)"
+  },
+  on_chunk: ->(info) {
+    Rails.logger.debug "Chunk #{info[:chunk_number]}: #{info[:rows_in_chunk]} rows"
+  },
+  on_complete: ->(stats) {
+    Rails.logger.info "Done: #{stats[:total_rows]} rows in #{stats[:duration].round(2)}s"
+  },
+) { |chunk| MyModel.insert_all(chunk) }
 ```
+See [Instrumentation Hooks](./instrumentation.md).
+--------------------
+PREVIOUS: [Instrumentation Hooks](./instrumentation.md) | NEXT: [Real-World CSV Files](./real_world_csv.md) | UP: [README](../README.md)

data/docs/header_transformations.md CHANGED Viewed

@@ -2,6 +2,8 @@
 ### Contents
   * [Introduction](./_introduction.md)
+  * [Migrating from Ruby CSV](./migrating_from_csv.md)
+  * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
   * [Parsing Strategy](./parsing_strategy.md)
   * [The Basic Read API](./basic_read_api.md)
   * [The Basic Write API](./basic_write_api.md)
@@ -10,15 +12,55 @@
   * [Row and Column Separators](./row_col_sep.md)
   * [**Header Transformations**](./header_transformations.md)
   * [Header Validations](./header_validations.md)
+  * [Column Selection](./column_selection.md)
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
---------------
+  * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Instrumentation Hooks](./instrumentation.md)
+  * [Examples](./examples.md)
+  * [Real-World CSV Files](./real_world_csv.md)
+  * [SmarterCSV over the Years](./history.md)
+  * [Release Notes](./releases/1.16.0/changes.md)
+--------------
 # Header Transformations
 By default SmarterCSV assumes that a CSV file has headers, and it automatically normalizes the headers and transforms them into Ruby symbols. You can completely customize or override this (see below).
+## Header Transformation Pipeline
+When a CSV file is opened, the header line passes through the following steps in order:
+```
+[user_provided_headers] ──► skips steps below; uses your array directly
+         │
+         ▼ (when headers come from the file)
+comment_regexp ──► strip_chars_from_headers ──► split on col_sep
+    ──► strip quote_char ──► strip_whitespace
+    ──► [unless keep_original_headers]: gsub spaces/dashes→_ ──► downcase_header
+    ──► disambiguate_headers ──► symbolize ──► key_mapping
+```
+| Step | Option | Default | Description |
+|------|--------|---------|-------------|
+| 1 | `comment_regexp` | `nil` | Strips a comment prefix from the raw header line (e.g. `# ` at start) |
+| 2 | `strip_chars_from_headers` | `nil` | Removes characters matching a regexp from the raw header line (e.g. `/[\-"]/`) |
+| 3 | *(split)* | `col_sep` | Splits the header line into individual column tokens |
+| 4 | `quote_char` | `"` | Strips surrounding quote characters from each token |
+| 5 | `strip_whitespace` | `true` | Strips leading/trailing whitespace from each header |
+| 6 | *(normalize)* | — | Replaces spaces and dashes with `_` (`keep_original_headers` skips this and steps 7–9) |
+| 7 | `downcase_header` | `true` | Downcases each header string |
+| 8 | `duplicate_header_suffix` | `''` | Renames empty headers to `column_N`; appends suffix+number to duplicates |
+| 9 | `strings_as_keys` | `false` | Converts headers to symbols (skipped if `true` or `keep_original_headers`) |
+| 10 | `key_mapping` | `nil` | Renames or drops headers; use post-transformation key names as input |
+> `user_provided_headers` bypasses all file header reading and transformation entirely — your array is used as-is. Versions >1.13 automatically set `headers_in_file: false` when `user_provided_headers` is given; if the file has a header row you want to skip, set `headers_in_file: true` explicitly.
+See [Configuration Options](./options.md) for full option reference.
+---
 ## Header Normalization
 When processing the headers, it transforms them into Ruby symbols, stripping extra spaces, lower-casing them and replacing spaces with underscores. e.g. " \t Annual Sales  " becomes `:annual_sales`. (see Notes below)
@@ -81,16 +123,57 @@ end
 ## Key Mapping
-The above example already illustrates how intermediate keys can be mapped into something different.
-This transfoms some of the keys in the input, but other keys are still present.
+`key_mapping:` renames CSV headers to the symbols your application expects. Any header not
+listed in the mapping is kept as-is by default.
-There is an additional option `remove_unmapped_keys` which can be enabled to only produce the mapped keys in the resulting hashes, and drops any other columns.
+```ruby
+# CSV headers: first_name, last_name, internal_id, created_at
+data = SmarterCSV.process('contacts.csv',
+  key_mapping: { first_name: :given_name, last_name: :family_name },
+)
+# => [{given_name: "Alice", family_name: "Smith", internal_id: 42, created_at: "2026-01-01"}, ...]
+#       ^^^ renamed                                ^^^ unmapped keys kept as-is
+```
-### NOTES on Key Mapping:
- * keys in the header line of the file can be re-mapped to a chosen set of symbols, so the resulting Hashes can be better used internally in your application (e.g. when directly creating MongoDB entries with them)
- * if you want to completely delete a key, then map it to nil or to '', they will be automatically deleted from any result Hash
- * if you have input files with a large number of columns, and you want to ignore all columns which are not specifically mapped with :key_mapping, then use option :remove_unmapped_keys => true
+To delete a specific column, map it to `nil` — it will be removed from every row hash:
+```ruby
+key_mapping: { internal_id: nil, created_at: nil }   # drop these two columns
+```
+### `remove_unmapped_keys:` — drop everything not in the map
+When you have files with many columns and only care about a few, listing every unwanted
+column as `nil` is tedious. Use `remove_unmapped_keys: true` to implicitly drop any header
+that has no entry in `key_mapping:`:
+```ruby
+# CSV has 50 columns; you only want two of them, renamed
+data = SmarterCSV.process('contacts.csv',
+  key_mapping:          { first_name: :given_name, last_name: :family_name },
+  remove_unmapped_keys: true,
+)
+# => [{given_name: "Alice", family_name: "Smith"}, ...]   # only the two mapped columns
+```
+### `remove_unmapped_keys:` vs `headers: { only: }`
+Both achieve column selection, but they serve different purposes:
+| | `remove_unmapped_keys: true` | `headers: { only: [...] }` |
+|---|---|---|
+| Use when | Already using `key_mapping:` and want to implicitly drop the rest | Pure column selection, no renaming needed |
+| Performance | Post-parse filter — all fields parsed, unmapped keys deleted | **C-path early exit** — unneeded fields never parsed |
+| Renaming | Yes — combines selection and rename in one step | No renaming (use `key_mapping:` alongside if needed) |
+For wide files where performance matters, prefer `headers: { only: }` — it skips unneeded
+fields entirely inside the C parser and can be **10–14× faster** on very wide files.
+Use `remove_unmapped_keys: true` when you are already remapping headers and the convenience
+of a single option outweighs the (usually small) performance difference.
+See [Column Selection](./column_selection.md) for full details on `headers: { only: }`.
+> **Note:** Key mapping is particularly useful when importing CSV data directly into a database or document store. By remapping headers to the exact symbol names your application uses internally (e.g. ActiveRecord attributes, DynamoDB document keys, Sidekiq job parameters), you can pass the resulting hashes directly without any further transformation.
 ## CSV Files without Headers
@@ -124,5 +207,4 @@ For CSV files with headers, you can either:
  * some CSV files use un-escaped quotation characters inside fields. This can cause the import to break. To get around this, set the `quote_char` to something different, e.g. `quote_char: "%"`, or try setting `:strip_chars_from_headers => /[\-"]/`
 ---------------
-PREVIOUS: [Row and Column Separators](./row_col_sep.md) | NEXT: [Header Validations](./header_validations.md)
+PREVIOUS: [Row and Column Separators](./row_col_sep.md) | NEXT: [Header Validations](./header_validations.md) | UP: [README](../README.md)

data/docs/header_validations.md CHANGED Viewed

@@ -2,6 +2,8 @@
 ### Contents
   * [Introduction](./_introduction.md)
+  * [Migrating from Ruby CSV](./migrating_from_csv.md)
+  * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
   * [Parsing Strategy](./parsing_strategy.md)
   * [The Basic Read API](./basic_read_api.md)
   * [The Basic Write API](./basic_write_api.md)
@@ -10,43 +12,80 @@
   * [Row and Column Separators](./row_col_sep.md)
   * [Header Transformations](./header_transformations.md)
   * [**Header Validations**](./header_validations.md)
+  * [Column Selection](./column_selection.md)
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
---------------
+  * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Instrumentation Hooks](./instrumentation.md)
+  * [Examples](./examples.md)
+  * [Real-World CSV Files](./real_world_csv.md)
+  * [SmarterCSV over the Years](./history.md)
+  * [Release Notes](./releases/1.16.0/changes.md)
+--------------
 # Header Validations
-When you are importing data, it can be important to verify that all required data is present, to ensure consistent quality when importing data.
+When importing data it is important to verify that all required columns are present — catching a missing column upfront is far better than a cryptic error later when your code tries to access a key that was never populated.
-You can use the `required_keys` option to specify an array of hash keys that you require to be present at a minimum for every data row (after header transformation).
+## `required_keys`
-If these keys are not present, `SmarterCSV::MissingKeys` will be raised to inform you of the data inconsistency.
+Use `required_keys` to specify an array of hash keys that must be present after header transformation. Validation runs once, after the header row is parsed and all header transformations (downcase, symbolize, `key_mapping`) have been applied — so use the **transformed** key names, not the raw CSV header strings.
-## Example
+If any required key is absent, `SmarterCSV::MissingKeys` is raised before any data rows are processed.
 ```ruby
-  options = {
-    required_keys: [:source_account, :destination_account, :amount]
-  }
-  data = SmarterCSV.process("/tmp/transactions.csv", options)
-  => this will raise SmarterCSV::MissingKeys if any row does not contain these three keys
+options = {
+  required_keys: [:source_account, :destination_account, :amount]
+}
+data = SmarterCSV.process('/tmp/transactions.csv', options)
+# => raises SmarterCSV::MissingKeys if any of the three columns are missing
 ```
-## Handling Missing Keys Programmatically
+### Accessing the missing keys
-When `SmarterCSV::MissingKeys` is raised, you can access the missing keys directly via the `keys` accessor, without parsing the error message:
+`SmarterCSV::MissingKeys` exposes the missing keys via the `keys` accessor:
 ```ruby
 begin
-  options = { required_keys: [:source_account, :destination_account, :amount] }
-  data = SmarterCSV.process("/tmp/transactions.csv", options)
+  data = SmarterCSV.process('/tmp/transactions.csv',
+    required_keys: [:source_account, :destination_account, :amount])
 rescue SmarterCSV::MissingKeys => e
   puts "Missing columns: #{e.keys.join(', ')}"
-  # => e.keys returns [:amount] (array of missing key symbols)
+  # => "Missing columns: amount"
 end
 ```
+### Interaction with `key_mapping`
+`required_keys` uses the **post-mapping** key names. If you remap CSV headers, reference the mapped names:
+```ruby
+options = {
+  key_mapping:   { acct_from: :source_account, acct_to: :destination_account },
+  required_keys: [:source_account, :destination_account, :amount],
+}
+```
+---
+## `silence_missing_keys`
+When using `key_mapping`, SmarterCSV raises `SmarterCSV::KeyMappingError` if a mapped key is not found in the CSV header. Use `silence_missing_keys` to make some or all mapped keys optional:
+```ruby
+# All mapped keys are optional — no error if any are absent
+options = {
+  key_mapping:          { optional_field: :my_field, required_field: :other_field },
+  silence_missing_keys: true,
+}
+# Only specific mapped keys are optional
+options = {
+  key_mapping:          { optional_field: :my_field, required_field: :other_field },
+  silence_missing_keys: [:optional_field],
+}
+```
 ----------------
-PREVIOUS: [Header Transformations](./header_transformations.md) | NEXT: [Data Transformations](./data_transformations.md)
+PREVIOUS: [Header Transformations](./header_transformations.md) | NEXT: [Column Selection](./column_selection.md) | UP: [README](../README.md)