RubyGems - csvreader - Versions diffs - 0.2.0 → 0.3.0 - Mend

csvreader 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: af0fcea1b598e6123786a05532a6f5b2e10a4095
-  data.tar.gz: ba2dc18a6076e425847b440c05819e898f0a66b2
+  metadata.gz: a9bc6971bd638abc67e8e82e241dbb370602b0d5
+  data.tar.gz: 062f2727188a6f3705c21a5cc825194f84bea41c
 SHA512:
-  metadata.gz: 28f60b98574e5331b53280f27017fae776c787ee1b7a56815c8a8f9c21a0926e6f561ca8a75f1464f1743849f989a52f122fbe4f20086de8159cf2df53b71bbe
-  data.tar.gz: 6d5b80b11e4774bc227bffe62bc829ab19b70f22fd69ca35c54526b85f261fa5c4bf0a7d87c9ba715738f50a6710bdd843f3b6cc1581f0d88744332fdf062796
+  metadata.gz: 595f1c779e0457377fe5c09602cba1ce7754803b35b9280e06dfc752759da441c708db830cb159076ce3f445b3b6aaf1fef459ff9eaace4ae6a436988e52455f
+  data.tar.gz: f4dc02242912ba15bef498838093f85de22c3670b5bcb2b92139adbd9343cbddec753f755729f85b2962df26d724ff069cc73b8f5cccd3edb896dc0d3ac26969

data/README.md CHANGED

@@ -164,7 +164,7 @@ see [`TabReader` »](https://github.com/datatxt/tabreader).
 Two major design bugs and many many minor.
-1) The CSV class uses `line.split(`,`)` with some kludges (†) with the claim its faster.
+(1) The CSV class uses `line.split(',')` with some kludges (†) with the claim its faster.
 What?! The right way: CSV needs its own purpose-built parser. There's no other
 way you can handle all the (edge) cases with double quotes and escaped doubled up
 double quotes. Period.
@@ -175,7 +175,7 @@ Or handling double quotes inside values and so on and on.
 (†): kludge - a workaround or quick-and-dirty solution that is clumsy, inelegant, inefficient, difficult to extend and hard to maintain
-2) The CSV class returns `nil` for `,,` but an empty string (`""`)
+(2) The CSV class returns `nil` for `,,` but an empty string (`""`)
 for `"","",""`. The right way: All values are always strings. Period.
 If you want to use `nil` you MUST configure a string (or strings)

data/lib/csvreader/reader.rb CHANGED

@@ -6,6 +6,34 @@ module Csv    ## check: rename to CsvSettings / CsvPref / CsvGlobals or similar
   ## STD_CSV_ENGINE = CSV   ## to avoid name confusion use longer name - why? why not? find a better name?
   ## use __CSV__ or similar? or just ::CSV ??
+class Dialect   ## todo: use a module - it's just a namespace/module now - why? why not?
+  ###
+  # (auto-)add these flavors/dialects:
+  #     :tab                   -> uses TabReader(!)
+  #     :strict|:rfc4180
+  #     :unix                   -> uses unix-style escapes e.g. \n \" etc.
+  #     :windows|:excel
+  #     :guess|:auto     -> guess (auto-detect) separator - why? why not?
+  ##  e.g. use Dialect.registry[:unix] = { ... } etc.
+  ##   note use @@ - there is only one registry
+  def self.registry() @@registry ||={} end
+  ## add built-in dialects:
+  ##    trim - use strip? why? why not? use alias?
+  registry[:tab]     = {}   ##{ class: TabReader }
+  registry[:strict]  = { strict: true, trim: false }   ## add no comments, blank lines, etc. ???
+  registry[:rfc4180] = :strict    ## alternative name
+  registry[:windows] = {}
+  registry[:excel]   = :windows
+  registry[:unix]    = {}
+  ## todo: add some more
+end  # class Dialect
   class Configuration
     puts "CSV::VERSION:"
@@ -23,6 +51,9 @@ module Csv    ## check: rename to CsvSettings / CsvPref / CsvGlobals or similar
     attr_accessor :sep   ## col_sep (column separator)
+    attr_accessor :na    ## not available (string or array of strings or nil) - rename to nas/nils/nulls - why? why not?
+    attr_accessor :trim        ### allow ltrim/rtrim/trim - why? why not?
+    attr_accessor :dialect
     def initialize
       @sep = ','
@@ -32,6 +63,8 @@ module Csv    ## check: rename to CsvSettings / CsvPref / CsvGlobals or similar
       self  ## return self for chaining
     end
+    def trim?() @trim; end   ## strip leading and trailing spaces
     def blank?( line )
       ## note:  blank line does NOT include "blank" with spaces only!!
       ##          use BLANK_REGEX in skip_lines to clean-up/skip/remove/ignore
@@ -96,46 +129,53 @@ end   # module Csvv
 class CsvReader
-  def self.foreach( path, sep: Csv.config.sep, headers: false )
+  def self.parse_line( txt, sep:        Csv.config.sep,
+                            trim:       Csv.config.trim?,
+                            na:         Csv.config.na,
+                            dialect:    Csv.config.dialect,
+                            converters: nil)
+    ## note: do NOT include headers option (otherwise single row gets skipped as first header row :-)
     csv_options = Csv.config.default_options.merge(
-                     headers: headers,
-                     col_sep: sep,
-                     external_encoding: 'utf-8'  ## note:  always (auto-)add utf-8 external encoding for now!!!
+                    headers: false,  ## note: always turn off headers!!!!!!
+                    col_sep: sep
     )
+    ## pp csv_options
+    CSV.parse_line( txt, csv_options )
+  end
-    CSV.foreach( path, csv_options ) do |row|
-      yield( row )    ## check/todo: use block.call( row ) ## why? why not?
-    end
+  def self.parse( txt, sep: Csv.config.sep, headers: false )
+    csv_options = Csv.config.default_options.merge(
+                     headers: headers,
+                     col_sep: sep
+    )
+    ## pp csv_options
+    CSV.parse( txt, csv_options )
   end
   def self.read( path, sep: Csv.config.sep, headers: false )
     ## note: use our own file.open
     ##   always use utf-8 for now
     ##    check/todo: add skip option bom too - why? why not?
-    txt = File.open( path, 'r:utf-8' )
+    txt = File.open( path, 'r:bom|utf-8' )
     parse( txt, sep: sep, headers: headers )
   end
-  def self.parse( txt, sep: Csv.config.sep, headers: false )
+  def self.foreach( path, sep: Csv.config.sep, headers: false )
     csv_options = Csv.config.default_options.merge(
                      headers: headers,
-                     col_sep: sep
+                     col_sep: sep,
+                     external_encoding: 'utf-8'  ## note:  always (auto-)add utf-8 external encoding for now!!!
     )
-    ## pp csv_options
-    CSV.parse( txt, csv_options )
-  end
+    ##  todo/check/fix:
+    ##  can use bom e.g. 'bom|utf-8' - how?
+    ##   raises ArgumentError: unknown encoding name - bom|utf-8
-  def self.parse_line( txt, sep: Csv.config.sep )
-    ## note: do NOT include headers option (otherwise single row gets skipped as first header row :-)
-    csv_options = Csv.config.default_options.merge(
-                    headers: false,  ## note: always turn off headers!!!!!!
-                    col_sep: sep
-    )
-    ## pp csv_options
-    CSV.parse_line( txt, csv_options )
-  end
+    CSV.foreach( path, csv_options ) do |row|
+      yield( row )    ## check/todo: use block.call( row ) ## why? why not?
+    end
+  end
   def self.header( path, sep: Csv.config.sep )   ## use header or headers - or use both (with alias)?
       # read first lines (only)
@@ -148,7 +188,7 @@ class CsvReader
       ##  - NOT a blank line
       lines = ''
-      File.open( path, 'r:utf-8' ) do |f|
+      File.open( path, 'r:bom|utf-8' ) do |f|
         ## todo/fix: how to handle empty files or files without headers?!
@@ -171,31 +211,20 @@ class CsvReader
       parse_line( lines, sep: sep )
     end  # method self.header
-    ####################
-    # helper methods
-    def self.unwrap( row_or_array )   ## unwrap row - find a better name? why? why not?
-      ## return row values as array of strings
-      if row_or_array.is_a?( CSV::Row )
-        row = row_or_array
-        row.fields   ## gets array of string of field values
-      else  ## assume "classic" array of strings
-        array = row_or_array
-      end
-    end
 end # class CsvReader
 class CsvHashReader
-def self.read( path, sep: Csv.config.sep, headers: true )
-  CsvReader.read( path, sep: sep, headers: headers )
-end
 def self.parse( txt, sep: Csv.config.sep, headers: true )
   CsvReader.parse( txt, sep: sep, headers: headers )
 end
+def self.read( path, sep: Csv.config.sep, headers: true )
+  CsvReader.read( path, sep: sep, headers: headers )
+end
 def self.foreach( path, sep: Csv.config.sep, headers: true, &block )
   CsvReader.foreach( path, sep: sep, headers: headers, &block )
 end

data/lib/csvreader/version.rb CHANGED

@@ -4,7 +4,7 @@
 class CsvReader   ## note: uses a class for now - change to module - why? why not?
   MAJOR = 0    ## todo: namespace inside version or something - why? why not??
-  MINOR = 2
+  MINOR = 3
   PATCH = 0
   VERSION = [MAJOR,MINOR,PATCH].join('.')

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: csvreader
 version: !ruby/object:Gem::Version
-  version: 0.2.0
+  version: 0.3.0
 platform: ruby
 authors:
 - Gerald Bauer
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-08-19 00:00:00.000000000 Z
+date: 2018-08-20 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rdoc