RubyGems - philiprehberger-csv_kit - Versions diffs - 0.7.0 → 0.9.0 - Mend

philiprehberger-csv_kit 0.7.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +14 -1
data/README.md +35 -7
data/lib/philiprehberger/csv_kit/processor.rb +25 -0
data/lib/philiprehberger/csv_kit/version.rb +1 -1
data/lib/philiprehberger/csv_kit.rb +74 -44
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: ea3cafa68ee9e49b8c1b305af9e7f796a9c417676a6018ee416dbb978e91eb38
-  data.tar.gz: e34151bbdf97d2e78fc348620fd958fa01a3e6d446744593d7a8474b82885de6
+  metadata.gz: 85d915adb35a580821d1055d966fe76b802598d6422571ad1e727694dd604a4f
+  data.tar.gz: 66cc520a86bae535668b77aa56ae7d154ddc1f5df22cca7b4578d4b12a7958e8
 SHA512:
-  metadata.gz: a98d2e7baa28c03c04c44322482cc5a9ec4bef939692f5b3812679efe1d4ab1edc90748d65003ea4ddb287a73d27cb6ba22fd4e254ccf734d9c61cd3fbb28cab
-  data.tar.gz: 04a064667d1cbbde06ab473cad6dd18e4edc9558fa5f6efcff5ea2a0505a3c55846f016ebac118735c3b6c4b035bf48edf4c3cf03750c2ba855bf4ec4fc76f4e
+  metadata.gz: 0b719464ccea551cb56975fe78985ffa2f49480f2439e9a528f091afa53b5b24dd55219dbab3e65a8f30ec1f326959749ad4162e52d36471b1ee3bfd1272afcd
+  data.tar.gz: 7f98968c2c5063109053f2a4a8f4508f6544e5238f783440969f237f593b66f4ac2a2ace4e6b495862621944fe891c44107a247366d3c1cedd59be6208a88df9

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.9.0] - 2026-04-19
+### Added
+- `Processor#default(key, value)` — fill nil or empty cells at `key` with a default value during transform; chains naturally with `type:` coercion
+## [0.8.0] - 2026-04-17
+### Added
+- `CsvKit.to_csv(rows, headers:, dialect:)` — serialize an array of hashes to a CSV string; inverse of `to_hashes`
+- `to_hashes`, `pluck`, `headers`, `count`, `each_hash`, `find`, and `filter` now accept an IO object in addition to a file path
 ## [0.7.0] - 2026-04-16
 ### Added
@@ -101,7 +112,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Type coercion and row validation
 - Quick load and filtering convenience methods
-[Unreleased]: https://github.com/philiprehberger/rb-csv-kit/compare/v0.7.0...HEAD
+[Unreleased]: https://github.com/philiprehberger/rb-csv-kit/compare/v0.9.0...HEAD
+[0.9.0]: https://github.com/philiprehberger/rb-csv-kit/compare/v0.8.0...v0.9.0
+[0.8.0]: https://github.com/philiprehberger/rb-csv-kit/compare/v0.7.0...v0.8.0
 [0.7.0]: https://github.com/philiprehberger/rb-csv-kit/compare/v0.6.0...v0.7.0
 [0.6.0]: https://github.com/philiprehberger/rb-csv-kit/compare/v0.5.0...v0.6.0
 [0.5.0]: https://github.com/philiprehberger/rb-csv-kit/releases/tag/v0.5.0

data/README.md CHANGED Viewed

@@ -106,6 +106,17 @@ rows = Philiprehberger::CsvKit.process("data.csv") do |p|
 end
 ```
+### Default Values for Missing Cells
+Fill nil or empty-string cells with a default value before any `type` coercion runs:
+```ruby
+Philiprehberger::CsvKit.process("users.csv") do |p|
+  p.default(:country, "US")
+  p.type(:age, :integer)
+end
+```
 ### Date/Time Type Coercions
 ```ruby
@@ -124,6 +135,21 @@ rows = Philiprehberger::CsvKit.process("data.csv", dialect: { delimiter: ";", qu
 end
 ```
+### Write CSV String
+Inverse of `to_hashes`. Serialize an array of hashes to a CSV string. Headers default to the keys of the first row:
+```ruby
+csv = Philiprehberger::CsvKit.to_csv([
+  { name: "Alice", age: 30 },
+  { name: "Bob",   age: 25 }
+])
+# => "name,age\nAlice,30\nBob,25\n"
+# Control column order / subset with explicit headers
+Philiprehberger::CsvKit.to_csv(rows, headers: [:name])
+```
 ### Writing CSV
 ```ruby
@@ -183,18 +209,20 @@ delimiter = Philiprehberger::CsvKit::Detector.detect("data.tsv")
 | Method / Class | Description |
 |----------------|-------------|
-| `CsvKit.to_hashes(path, dialect:)` | Load CSV into array of symbolized hashes |
+| `CsvKit.to_hashes(path_or_io, dialect:)` | Load CSV into array of symbolized hashes |
+| `CsvKit.to_csv(rows, headers:, dialect:)` | Serialize an array of hashes to a CSV string |
 | `CsvKit.sample(path_or_io, n, dialect:)` | Return n randomly sampled rows using reservoir sampling (Algorithm R) |
-| `CsvKit.pluck(path, *keys, dialect:)` | Extract specific columns |
-| `CsvKit.filter(path, dialect:, &block)` | Filter rows, return CSV string |
-| `CsvKit.find(path, dialect:, &block)` | Return the first row matching the predicate, or nil |
-| `CsvKit.headers(path, dialect:)` | Return header row as array of symbols |
-| `CsvKit.count(path, dialect:)` | Count data rows without loading into memory |
-| `CsvKit.each_hash(path, dialect:, &block)` | Stream rows as symbolized hashes; returns Enumerator if no block |
+| `CsvKit.pluck(path_or_io, *keys, dialect:)` | Extract specific columns |
+| `CsvKit.filter(path_or_io, dialect:, &block)` | Filter rows, return CSV string |
+| `CsvKit.find(path_or_io, dialect:, &block)` | Return the first row matching the predicate, or nil |
+| `CsvKit.headers(path_or_io, dialect:)` | Return header row as array of symbols |
+| `CsvKit.count(path_or_io, dialect:)` | Count data rows without loading into memory |
+| `CsvKit.each_hash(path_or_io, dialect:, &block)` | Stream rows as symbolized hashes; returns Enumerator if no block |
 | `CsvKit.process(path_or_io, dialect:, &block)` | Streaming DSL with transforms and validations |
 | `Processor#headers(*names)` | Override header names |
 | `Processor#transform(key, &block)` | Register column transform |
 | `Processor#type(key, type, **opts)` | Register built-in type coercion (:integer, :float, :string, :date, :datetime) |
+| `Processor#default(key, value)` | Fill nil or empty cells at `key` with `value` (runs before `type` coercion) |
 | `Processor#validate(key, &block)` | Register column validation (skip invalid) |
 | `Processor#skip(n)` | Skip the first N data rows |
 | `Processor#limit(n)` | Stop after processing N rows |

data/lib/philiprehberger/csv_kit/processor.rb CHANGED Viewed

@@ -31,6 +31,7 @@ module Philiprehberger
         @path_or_io = path_or_io
         @dialect = dialect ? Dialect.new(dialect) : nil
         @transforms = {}
+        @defaults = {}
         @validations = {}
         @reject_block = nil
         @each_block = nil
@@ -63,6 +64,22 @@ module Philiprehberger
         @transforms[key] = ->(v) { coercion.call(v, opts) }
       end
+      # Register a default value for a column.
+      #
+      # Cells where the value is `nil` or an empty string are replaced with
+      # the provided default during transform. Defaults run BEFORE `type`
+      # coercions and `transform` blocks, so callers can default a missing
+      # cell to a string and then coerce it (e.g. default to "0" then cast
+      # to :integer).
+      #
+      # @param key [Symbol] column name
+      # @param value [Object] value to use when the cell is nil or empty
+      # @return [self]
+      def default(key, value)
+        @defaults[key] = value
+        self
+      end
       # Register a validation for a specific column.
       def validate(key, &block)
         @validations[key] = block
@@ -122,6 +139,7 @@ module Philiprehberger
         return unless valid?(row)
         return if rejected?(row)
+        apply_defaults!(row)
         apply_transforms!(row)
         apply_renames!(row)
         @each_block&.call(row)
@@ -165,6 +183,13 @@ module Philiprehberger
         @reject_block&.call(row) || false
       end
+      def apply_defaults!(row)
+        @defaults.each do |key, value|
+          current = row[key]
+          row[key] = value if current.nil? || current.to_s.empty?
+        end
+      end
       def apply_transforms!(row)
         @transforms.each { |key, blk| row[key] = blk.call(row[key]) }
       end

data/lib/philiprehberger/csv_kit/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module Philiprehberger
   module CsvKit
-    VERSION = '0.7.0'
+    VERSION = '0.9.0'
   end
 end

data/lib/philiprehberger/csv_kit.rb CHANGED Viewed

@@ -3,6 +3,7 @@
 require 'csv'
 require 'date'
 require 'time'
+require 'stringio'
 require_relative 'csv_kit/version'
 require_relative 'csv_kit/dialect'
 require_relative 'csv_kit/detector'
@@ -30,69 +31,85 @@ module Philiprehberger
     # Load an entire CSV into an array of symbolized hashes.
     #
-    # @param path [String] file path
+    # @param path_or_io [String, IO] file path or IO object
     # @param dialect [Symbol, Hash, nil] CSV dialect preset or custom options
     # @return [Array<Hash{Symbol => String}>]
-    def self.to_hashes(path, dialect: nil)
-      csv_opts = { headers: true }
-      csv_opts = Dialect.new(dialect).merge_into(csv_opts) if dialect
-      CSV.foreach(path, **csv_opts).map do |row|
-        row.to_h.transform_keys(&:to_sym)
+    def self.to_hashes(path_or_io, dialect: nil)
+      rows = []
+      foreach_row(path_or_io, headers: true, dialect: dialect) do |row|
+        rows << row.to_h.transform_keys(&:to_sym)
+      end
+      rows
+    end
+    # Serialize an array of hashes to a CSV string.
+    #
+    # If headers is omitted, the keys of the first hash are used. Empty input
+    # returns an empty string. Dialect options are passed through to the writer.
+    #
+    # @param rows [Array<Hash>] data rows
+    # @param headers [Array<Symbol, String>, nil] explicit column order (optional)
+    # @param dialect [Symbol, Hash, nil] CSV dialect preset or custom options
+    # @return [String] CSV string with header row
+    def self.to_csv(rows, headers: nil, dialect: nil)
+      return '' if rows.empty? && headers.nil?
+      resolved_headers = (headers || rows.first.keys).map(&:to_sym)
+      io = StringIO.new
+      Writer.stream(io, headers: resolved_headers, dialect: dialect) do |w|
+        rows.each { |row| w << (row.is_a?(Hash) ? row.transform_keys(&:to_sym) : row) }
       end
+      io.string
     end
     # Extract specific columns from a CSV.
     #
-    # @param path [String] file path
+    # @param path_or_io [String, IO] file path or IO object
     # @param keys [Array<Symbol>] column names to extract
     # @param dialect [Symbol, Hash, nil] CSV dialect preset or custom options
     # @return [Array<Hash{Symbol => String}>]
-    def self.pluck(path, *keys, dialect: nil)
-      to_hashes(path, dialect: dialect).map { |h| h.slice(*keys) }
+    def self.pluck(path_or_io, *keys, dialect: nil)
+      to_hashes(path_or_io, dialect: dialect).map { |h| h.slice(*keys) }
     end
     # Return the header row as an array of symbols.
     #
-    # @param path [String] file path
+    # @param path_or_io [String, IO] file path or IO object
     # @param dialect [Symbol, Hash, nil] CSV dialect preset or custom options
     # @return [Array<Symbol>]
-    def self.headers(path, dialect: nil)
+    def self.headers(path_or_io, dialect: nil)
       csv_opts = {}
       csv_opts = Dialect.new(dialect).merge_into(csv_opts) if dialect
-      CSV.open(path, **csv_opts) do |csv|
+      row = nil
+      with_csv(path_or_io, csv_opts) do |csv|
         row = csv.shift
-        return [] unless row
-        row.map(&:to_sym)
       end
+      return [] unless row
+      row.map(&:to_sym)
     end
     # Count data rows without loading them all into memory.
     #
-    # @param path [String] file path
+    # @param path_or_io [String, IO] file path or IO object
     # @param dialect [Symbol, Hash, nil] CSV dialect preset or custom options
     # @return [Integer]
-    def self.count(path, dialect: nil)
-      csv_opts = { headers: true }
-      csv_opts = Dialect.new(dialect).merge_into(csv_opts) if dialect
+    def self.count(path_or_io, dialect: nil)
       n = 0
-      CSV.foreach(path, **csv_opts) { |_| n += 1 }
+      foreach_row(path_or_io, headers: true, dialect: dialect) { |_| n += 1 }
       n
     end
     # Stream rows one at a time as symbolized hashes with constant memory.
     # Returns an Enumerator if no block is given.
     #
-    # @param path [String] file path
+    # @param path_or_io [String, IO] file path or IO object
     # @param dialect [Symbol, Hash, nil] CSV dialect preset or custom options
     # @yield [Hash{Symbol => String}] each row
     # @return [Enumerator, nil]
-    def self.each_hash(path, dialect: nil, &block)
-      csv_opts = { headers: true }
-      csv_opts = Dialect.new(dialect).merge_into(csv_opts) if dialect
+    def self.each_hash(path_or_io, dialect: nil, &block)
       enum = Enumerator.new do |yielder|
-        CSV.foreach(path, **csv_opts) do |row|
+        foreach_row(path_or_io, headers: true, dialect: dialect) do |row|
           yielder.yield(row.to_h.transform_keys(&:to_sym))
         end
       end
@@ -109,13 +126,10 @@ module Philiprehberger
     # @param dialect [Symbol, Hash, nil] CSV dialect preset or custom options
     # @return [Array<Hash{Symbol => String}>]
     def self.sample(path_or_io, n, dialect: nil)
-      csv_opts = { headers: true }
-      csv_opts = Dialect.new(dialect).merge_into(csv_opts) if dialect
       reservoir = []
       index = 0
-      iterate = lambda do |row|
+      foreach_row(path_or_io, headers: true, dialect: dialect) do |row|
         hash = row.to_h.transform_keys(&:to_sym)
         if index < n
           reservoir << hash
@@ -126,25 +140,17 @@ module Philiprehberger
         index += 1
       end
-      if path_or_io.is_a?(String)
-        CSV.foreach(path_or_io, **csv_opts, &iterate)
-      else
-        CSV.new(path_or_io, **csv_opts).each(&iterate)
-      end
       reservoir
     end
     # Find the first row matching a predicate, streaming (stops as soon as a match is found).
     #
-    # @param path [String] file path
+    # @param path_or_io [String, IO] file path or IO object
     # @param dialect [Symbol, Hash, nil] CSV dialect preset or custom options
     # @yield [Hash{Symbol => String}] each row as a symbolized hash
     # @return [Hash{Symbol => String}, nil] the first matching row or nil
-    def self.find(path, dialect: nil, &block)
-      csv_opts = { headers: true }
-      csv_opts = Dialect.new(dialect).merge_into(csv_opts) if dialect
-      CSV.foreach(path, **csv_opts) do |row|
+    def self.find(path_or_io, dialect: nil, &block)
+      foreach_row(path_or_io, headers: true, dialect: dialect) do |row|
         hash = row.to_h.transform_keys(&:to_sym)
         return hash if block.call(hash)
       end
@@ -153,12 +159,12 @@ module Philiprehberger
     # Filter rows and return matching rows as a CSV string.
     #
-    # @param path [String] file path
+    # @param path_or_io [String, IO] file path or IO object
     # @param dialect [Symbol, Hash, nil] CSV dialect preset or custom options
     # @yield [Hash{Symbol => String}] each row as a symbolized hash
     # @return [String] CSV string with headers
-    def self.filter(path, dialect: nil, &)
-      rows = to_hashes(path, dialect: dialect).select(&)
+    def self.filter(path_or_io, dialect: nil, &)
+      rows = to_hashes(path_or_io, dialect: dialect).select(&)
       return '' if rows.empty?
       headers = rows.first.keys
@@ -167,5 +173,29 @@ module Philiprehberger
         rows.each { |row| csv << headers.map { |k| row[k] } }
       end
     end
+    # @api private
+    # Iterate CSV rows from either a file path or an IO object.
+    def self.foreach_row(path_or_io, headers: false, dialect: nil, &block)
+      csv_opts = headers ? { headers: true } : {}
+      csv_opts = Dialect.new(dialect).merge_into(csv_opts) if dialect
+      if path_or_io.is_a?(String)
+        CSV.foreach(path_or_io, **csv_opts, &block)
+      else
+        CSV.new(path_or_io, **csv_opts).each(&block)
+      end
+    end
+    # @api private
+    # Open a CSV reader over either a file path or an IO object.
+    def self.with_csv(path_or_io, csv_opts, &block)
+      if path_or_io.is_a?(String)
+        CSV.open(path_or_io, **csv_opts, &block)
+      else
+        block.call(CSV.new(path_or_io, **csv_opts))
+      end
+    end
+    private_class_method :foreach_row, :with_csv
   end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: philiprehberger-csv_kit
 version: !ruby/object:Gem::Version
-  version: 0.7.0
+  version: 0.9.0
 platform: ruby
 authors:
 - Philip Rehberger
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2026-04-17 00:00:00.000000000 Z
+date: 2026-04-20 00:00:00.000000000 Z
 dependencies: []
 description: Streaming CSV processor with row-by-row transforms, validations, column
   plucking, streaming each_hash iteration, filtering, writing, error recovery, and