RubyGems - csvreader - Versions diffs - 0.4.0 → 0.5.0 - Mend

csvreader 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: ed373a97a0bdb4c45d2980894a32014cdcb8ca7c
-  data.tar.gz: 784adcade81e39ad9accd1a9b2d0c76fd666b6f9
+  metadata.gz: ea1d667219773e3a355c81f815d91e92340d61a1
+  data.tar.gz: ba7a43ccb5e110fc1f6eca76ca2a74a62f1131fb
 SHA512:
-  metadata.gz: 5523a8697990c691f55aa7c3b23867104b1c4c5b8e9e25b0424a3191e73cbb32cee369541b712f60fc366ba76a8207a77d6b12b68ea209896b6c26e11c5712de
-  data.tar.gz: 7c33c812c2a53303911b6686d03554d6e388b3f936a3b6b8d995ed237651bd171d3bdb8ab8f38f7f327e1a9d1be26d1fa955918012f37cf6a9e1c2cc6ab08373
+  metadata.gz: 0543a4338d2d12e36da16acdad9abff28633e519baa1d92044d1ca8f5e3472d835d00a10d8b19c24561b06e0d724f87414495600f4c83eef7c9e033474b4c09e
+  data.tar.gz: 8df669bc86f2066b2650a67bda5698fae7b6d58766b9c318f47958b0499671d0a4d39e862b8d3af842105a2777a3cf7ad05168380c338a3053fd3d363697abfb

data/Manifest.txt CHANGED Viewed

@@ -13,5 +13,7 @@ test/data/beer11.csv
 test/data/shakespeare.csv
 test/helper.rb
 test/test_parser.rb
+test/test_parser_formats.rb
+test/test_parser_rfc4180.rb
 test/test_reader.rb
 test/test_reader_hash.rb

data/README.md CHANGED Viewed

@@ -164,17 +164,15 @@ see [`TabReader` »](https://github.com/datatxt/tabreader).
 Two major design bugs and many many minor.
-(1) The CSV class uses `line.split(',')` with some kludges (†) with the claim its faster.
+(1) The CSV class uses [`line.split(',')`](https://github.com/ruby/csv/blob/master/lib/csv.rb#L1248) with some kludges (†) with the claim it's faster.
 What?! The right way: CSV needs its own purpose-built parser. There's no other
 way you can handle all the (edge) cases with double quotes and escaped doubled up
 double quotes. Period.
-For example, the CSV class cannot handle leading or trailing spaces
+For example, the CSV class cannot handle leading or trailing spaces
 for double quoted values `1,•"2","3"•`.
 Or handling double quotes inside values and so on and on.
-(†): kludge - a workaround or quick-and-dirty solution that is clumsy, inelegant, inefficient, difficult to extend and hard to maintain
 (2) The CSV class returns `nil` for `,,` but an empty string (`""`)
 for `"","",""`. The right way: All values are always strings. Period.
@@ -182,6 +180,36 @@ If you want to use `nil` you MUST configure a string (or strings)
 such as `NA`, `n/a`, `\N`, or similar that map to `nil`.
+(†): kludge - a workaround or quick-and-dirty solution that is clumsy, inelegant, inefficient, difficult to extend and hard to maintain
+Appendix: Simple examples the standard csv library cannot read:
+Quoted values with leading or trailing spaces e.g.
+```
+1, "2","3" , "4" ,5
+```
+=>
+``` ruby
+["1", "2", "3", "4" ,"5"]
+```
+"Auto-fix" unambiguous quotes in "unquoted" values e.g.
+```
+value with "quotes", another value
+```
+=>
+``` ruby
+["value with \"quotes\"", "another value"]
+```
+and some more.

data/lib/csvreader.rb CHANGED Viewed

@@ -3,6 +3,7 @@
 require 'csv'
 require 'json'
 require 'pp'
+require 'logger'
 ###

data/lib/csvreader/buffer.rb CHANGED Viewed

@@ -18,22 +18,10 @@ class BufferIO   ## todo: find a better name - why? why not? is really just for
     end
   end # method getc
-  def ungetc( c )
-    ## add upfront as first char in buffer
-    ##   last in/first out queue!!!!
-    @buf.unshift( c )
-    ## puts "ungetc - >#{c} (#{c.ord})< => >#{@buf}<"
-  end
   def peek
-     ## todo/fix:
-     ## use Hexadecimal code: 1A, U+001A for eof char - why? why not?
     if @buf.size == 0 && @io.eof?
       puts "peek - hitting eof!!!"
-      ## return eof char(s) - exits? is \0 ?? double check
-      return "\0"
+      return  "\0"   ## return NUL char (0) for now
     end
     if @buf.size == 0
@@ -44,5 +32,6 @@ class BufferIO   ## todo: find a better name - why? why not? is really just for
     @buf.first
   end # method peek
 end # class BufferIO
 end # class CsvReader

data/lib/csvreader/parser.rb CHANGED Viewed

@@ -1,74 +1,92 @@
 # encoding: utf-8
 class CsvReader
-class Parser
-## char constants
-DOUBLE_QUOTE = "\""
-COMMENT      = "#"    ## use COMMENT_HASH or HASH or ??
-SPACE        = " "
-TAB          = "\t"
-LF	         = "\n"    ## 0A (hex)  10 (dec)
-CR	         = "\r"    ## 0D (hex)  13 (dec)
-def self.parse( data )
-  puts "parse:"
-  pp data
-  parser = new
-  parser.parse( data )
-end
-def self.parse_line( data )
-  puts "parse_line:"
+class Parser
-  parser = new
-  records = parser.parse( data, limit: 1 )
-  ## unwrap record if empty return nil - why? why not?
-  ##  return empty record e.g. [] - why? why not?
-  records.size == 0 ? nil : records.first
+## char constants
+DOUBLE_QUOTE = "\""
+BACKSLASH    = "\\"    ## use BACKSLASH_ESCAPE ??
+COMMENT      = "#"      ## use COMMENT_HASH or HASH or ??
+SPACE        = " "      ##   \s == ASCII 32 (dec)            =    (Space)
+TAB          = "\t"     ##   \t == ASCII 0x09 (hex)          = HT (Tab/horizontal tab)
+LF	         = "\n"     ##   \n == ASCII 0x0A (hex) 10 (dec) = LF (Newline/line feed)
+CR	         = "\r"     ##   \r == ASCII 0x0D (hex) 13 (dec) = CR (Carriage return)
+###################################
+## add simple logger with debug flag/switch
+#
+#  use Parser.debug = true   # to turn on
+#
+#  todo/fix: use logutils instead of std logger - why? why not?
+def self.logger() @@logger ||= Logger.new( STDOUT ); end
+def logger()  self.class.logger; end
+attr_reader :config   ## todo/fix: change config to proper dialect class/struct - why? why not?
+def initialize( sep:         ',',
+                quote:       DOUBLE_QUOTE, ## note: set to nil for no quote
+                doublequote: true,
+                escape:      BACKSLASH,   ## note: set to nil for no escapes
+                trim:        true,   ## note: will toggle between human/default and strict mode parser!!!
+                na:          ['\N', 'NA'],  ## note: set to nil for no null vales / not availabe (na)
+                quoted_empty:   '',   ## note: only available in strict mode (e.g. trim=false)
+                unquoted_empty: ''    ## note: only available in strict mode (e.g. trim=false)
+               )
+  @config = {}   ## todo/fix: change config to proper dialect class/struct - why? why not?
+  @config[:sep]          = sep
+  @config[:quote]        = quote
+  @config[:doublequote]  = doublequote
+  @config[:escape]  = escape
+  @config[:trim]         = trim
+  @config[:na]     = na
+  @config[:quoted_empty] = quoted_empty
+  @config[:unquoted_empty] = unquoted_empty
 end
-def self.read( path )
-  parser = new
-  File.open( path, 'r:bom|utf-8' ) do |file|
-    parser.parse( file )
-  end
-end
+def strict?
+  ## note:  use trim for separating two different parsers / code paths:
+  ##   - human with trim leading and trailing whitespace and
+  ##   - strict with no leading and trailing whitespaces allowed
-def self.foreach( path, &block )
-  parser = new
-  File.open( path, 'r:bom|utf-8' ) do |file|
-    parser.foreach( file, &block )
-  end
+  ## for now use - trim == false for strict version flag alias
+  ##   todo/fix: add strict flag - why? why not?
+  @config[:trim] ? false : true
 end
-def self.parse_lines( data, &block )
-  parser = new
-  parser.parse_lines( data, &block )
-end
+DEFAULT = new( sep: ',', trim: true )
+RFC4180 = new( sep: ',', trim: false )
+EXCEL   = new( sep: ',', trim: false )
+def self.default()  DEFAULT; end    ## alternative alias for DEFAULT
+def self.rfc4180()  RFC4180; end    ## alternative alias for RFC4180
+def self.excel()    EXCEL; end      ## alternative alias for EXCEL
-def parse_field( io, trim: true )
+def parse_field( io, sep: )
+  logger.debug "parse field - sep: >#{sep}< (#{sep.ord})"  if logger.debug?
   value = ""
-  value << parse_spaces( io ) ## add leading spaces
+  skip_spaces( io )   ## strip leading spaces
   if (c=io.peek; c=="," || c==LF || c==CR || io.eof?) ## empty field
-    value = value.strip    if trim ## strip all spaces
      ## return value; do nothing
   elsif io.peek == DOUBLE_QUOTE
-    puts "start double_quote field - value >#{value}<"
-    value = value.strip   ## note always strip/trim leading spaces in quoted value
-    puts "start double_quote field - peek >#{io.peek}< (#{io.peek.ord})"
+    logger.debug "start double_quote field - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
     io.getc  ## eat-up double_quote
     loop do
@@ -89,18 +107,18 @@ def parse_field( io, trim: true )
     ## note: always eat-up all trailing spaces (" ") and tabs (\t)
     skip_spaces( io )
-    puts "end double_quote field - peek >#{io.peek}< (#{io.peek.ord})"
+    logger.debug "end double_quote field - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
   else
-    puts "start reg field - peek >#{io.peek}< (#{io.peek.ord})"
+    logger.debug "start reg field - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
     ## consume simple value
     ##   until we hit "," or "\n" or "\r"
     ##    note: will eat-up quotes too!!!
     while (c=io.peek; !(c=="," || c==LF || c==CR || io.eof?))
-      puts "  add char >#{io.peek}< (#{io.peek.ord})"
+      logger.debug "  add char >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
       value << io.getc   ## eat-up all spaces (" ") and tabs (\t)
     end
-    value = value.strip    if trim ## strip all spaces
-    puts "end reg field - peek >#{io.peek}< (#{io.peek.ord})"
+    value = value.strip   ## strip all trailing spaces
+    logger.debug "end reg field - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
   end
   value
@@ -108,12 +126,60 @@ end
-def parse_record( io, trim: true )
+def parse_field_strict( io, sep: )
+  logger.debug "parse field (strict) - sep: >#{sep}< (#{sep.ord})"  if logger.debug?
+  value = ""
+  if (c=io.peek; c==sep || c==LF || c==CR || io.eof?) ## empty unquoted field
+     value = config[:unquoted_empty]   ## defaults to "" (might be set to nil if needed)
+     ## return value; do nothing
+  elsif config[:quote] && io.peek == config[:quote]
+    logger.debug "start quote field (strict) - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+    io.getc  ## eat-up double_quote
+    loop do
+      while (c=io.peek; !(c==config[:quote] || io.eof?))
+        value << io.getc   ## eat-up everything unit quote (")
+      end
+      break if io.eof?
+      io.getc ## eat-up double_quote
+      if config[:doublequote] && io.peek == config[:quote]  ## doubled up quote?
+        value << io.getc   ## add doube quote and continue!!!!
+      else
+        break
+      end
+    end
+    value = config[:quoted_empty]  if value == ""   ## defaults to "" (might be set to nil if needed)
+    logger.debug "end double_quote field (strict) - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+  else
+    logger.debug "start reg field (strict) - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+    ## consume simple value
+    ##   until we hit "," or "\n" or "\r" or stroy "\"" double quote
+    while (c=io.peek; !(c==sep || c==LF || c==CR || c==config[:quote] || io.eof?))
+      logger.debug "  add char >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+      value << io.getc
+    end
+    logger.debug "end reg field (strict) - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+  end
+  value
+end
+def parse_record( io, sep: )
   values = []
   loop do
-     value = parse_field( io, trim: trim )
-     puts "value: »#{value}«"
+     value = parse_field( io, sep: sep )
+     logger.debug "value: »#{value}«"  if logger.debug?
      values << value
      if io.eof?
@@ -133,6 +199,33 @@ def parse_record( io, trim: true )
 end
+def parse_record_strict( io, sep: )
+  values = []
+  loop do
+     value = parse_field_strict( io, sep: sep )
+     logger.debug "value: »#{value}«"  if logger.debug?
+     values << value
+     if io.eof?
+        break
+     elsif (c=io.peek; c==LF || c==CR)
+       skip_newline( io )   ## note: singular / single newline only (NOT plural)
+       break
+     elsif io.peek == sep
+       io.getc   ## eat-up FS (,)
+     else
+       puts "*** csv parse error (strict): found >#{io.peek} (#{io.peek.ord})< - FS (,) or RS (\\n) expected!!!!"
+       exit(1)
+     end
+  end
+  values
+end
 def skip_newlines( io )
   return if io.eof?
@@ -142,6 +235,22 @@ def skip_newlines( io )
 end
+def skip_newline( io )    ## note: singular (strict) version
+  return if io.eof?
+  ## only skip CR LF or LF or CR
+  if io.peek == CR
+    io.getc ## eat-up
+    io.getc  if io.peek == LF
+  elsif io.peek == LF
+    io.getc ## eat-up
+  else
+    # do nothing
+  end
+end
 def skip_until_eol( io )
   return if io.eof?
@@ -161,91 +270,95 @@ end
-def parse_spaces( io )  ## helper method
-  spaces = ""
-  ## add leading spaces
-  while (c=io.peek; c==SPACE || c==TAB)
-    spaces << io.getc   ## eat-up all spaces (" ") and tabs (\t)
-  end
-  spaces
-end
-def parse_lines( io_maybe, trim: true,
-                           comments: true,
-                           blanks: true,   &block )
-  ## find a better name for io_maybe
-  ##   make sure io is a wrapped into BufferIO!!!!!!
-  if io_maybe.is_a?( BufferIO )    ### allow (re)use of BufferIO if managed from "outside"
-    io = io_maybe
-  else
-    io = BufferIO.new( io_maybe )
-  end
+def parse_lines_human( io, sep:, &block )
   loop do
     break if io.eof?
-    ## hack: use own space buffer for peek( x ) lookahead (more than one char)
-    ## check for comments or blank lines
-    if comments || blanks
-      spaces = parse_spaces( io )
-    end
+    skip_spaces( io )
-    if comments && io.peek == COMMENT        ## comment line
-      puts "skipping comment - peek >#{io.peek}< (#{io.peek.ord})"
+    if io.peek == COMMENT        ## comment line
+      logger.debug "skipping comment - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
       skip_until_eol( io )
       skip_newlines( io )
-    elsif blanks && (c=io.peek; c==LF || c==CR || io.eof?)
-      puts "skipping blank - peek >#{io.peek}< (#{io.peek.ord})"
+    elsif (c=io.peek; c==LF || c==CR || io.eof?)
+      logger.debug "skipping blank - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
       skip_newlines( io )
-    else  # undo (ungetc spaces)
-      puts "start record - peek >#{io.peek}< (#{io.peek.ord})"
-      if comments || blanks
-        ## note: MUST ungetc in "reverse" order
-        ##   ##   buffer is last in/first out queue!!!!
-        spaces.reverse.each_char { |space| io.ungetc( space ) }
-      end
-      record = parse_record( io, trim: trim )
+    else
+      logger.debug "start record - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+      record = parse_record( io, sep: sep )
       ## note: requires block - enforce? how? why? why not?
       block.call( record )   ## yield( record )
     end
   end  # loop
-end # method parse_lines
+end # method parse_lines_human
+def parse_lines_strict( io, sep:, &block )
+  ## no leading and trailing whitespaces trimmed/stripped
+  ## no comments skipped
+  ## no blanks skipped
+  ## - follows strict rules of
+  ##  note: this csv format is NOT recommended;
+  ##    please, use a format with comments, leading and trailing whitespaces, etc.
+  ##    only added for checking compatibility
+  loop do
+    break if io.eof?
+    logger.debug "start record (strict) - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+    record = parse_record_strict( io, sep: sep )
+    ## note: requires block - enforce? how? why? why not?
+    block.call( record )   ## yield( record )
+  end  # loop
+end # method parse_lines_strict
+def parse_lines( io_maybe, sep: config[:sep], &block )
+  ## find a better name for io_maybe
+  ##   make sure io is a wrapped into BufferIO!!!!!!
+  if io_maybe.is_a?( BufferIO )    ### allow (re)use of BufferIO if managed from "outside"
+    io = io_maybe
+  else
+    io = BufferIO.new( io_maybe )
+  end
+  if strict?
+    parse_lines_strict( io, sep: sep, &block )
+  else
+    parse_lines_human( io, sep: sep, &block )
+  end
+end  ## parse_lines
+##   fix: add optional block  - lets you use it like foreach!!!
+##    make foreach an alias of parse with block - why? why not?
+##
+##   unifiy with (make one) parse and parse_lines!!!! - why? why not?
-def parse( io_maybe, trim: true,
-               comments: true,
-               blanks: true,
-               limit: nil )
+def parse( io_maybe, sep: config[:sep], limit: nil )
   records = []
-  parse_lines( io_maybe, trim: trim, comments: comments, blanks: blanks ) do |record|
+  parse_lines( io_maybe, sep: sep  ) do |record|
     records << record
     ## set limit to 1 for processing "single" line (that is, get one record)
-    return records   if limit && limit >= records.size
+    break  if limit && limit >= records.size
   end
   records
 end ## method parse
-def foreach( io_maybe, trim: true,
-                 comments: true,
-                 blanks: true,    &block )
-  parse_lines( io_maybe, trim: trim, comments: comments, blanks: blanks, &block )
-end
 end # class Parser
 end # class CsvReader

data/lib/csvreader/reader.rb CHANGED Viewed

@@ -1,150 +1,98 @@
 # encoding: utf-8
-module Csv    ## check: rename to CsvSettings / CsvPref / CsvGlobals or similar - why? why not???
+class CsvReader
-class Dialect   ## todo: use a module - it's just a namespace/module now - why? why not?
-  ###
-  # (auto-)add these flavors/dialects:
-  #     :tab                   -> uses TabReader(!)
-  #     :strict|:rfc4180
-  #     :unix                   -> uses unix-style escapes e.g. \n \" etc.
-  #     :windows|:excel
-  #     :guess|:auto     -> guess (auto-detect) separator - why? why not?
-  ##  e.g. use Dialect.registry[:unix] = { ... } etc.
-  ##   note use @@ - there is only one registry
-  def self.registry() @@registry ||={} end
-  ## add built-in dialects:
-  ##    trim - use strip? why? why not? use alias?
-  registry[:tab]     = {}   ##{ class: TabReader }
-  registry[:strict]  = { strict: true, trim: false }   ## add no comments, blank lines, etc. ???
-  registry[:rfc4180] = :strict    ## alternative name
-  registry[:windows] = {}
-  registry[:excel]   = :windows
-  registry[:unix]    = {}
-  ## todo: add some more
-end  # class Dialect
-  class Configuration
+  def initialize( parser )
+    @parser = parser
+  end
-    attr_accessor :sep   ## col_sep (column separator)
-    attr_accessor :na    ## not available (string or array of strings or nil) - rename to nas/nils/nulls - why? why not?
-    attr_accessor :trim        ### allow ltrim/rtrim/trim - why? why not?
-    attr_accessor :blanks
-    attr_accessor :comments
-    attr_accessor :dialect
+  DEFAULT = new( Parser::DEFAULT )
+  RFC4180 = new( Parser::RFC4180 )
+  EXCEL   = new( Parser::EXCEL )
-    def initialize
-      @sep      = ','
-      @blanks   = true
-      @comments = true
-      @trim     = true
-      ## note: do NOT add headers as global - should ALWAYS be explicit
-      ##   headers (true/false) - changes resultset and requires different processing!!!
+  def self.default()  DEFAULT; end    ## alternative alias for DEFAULT
+  def self.rfc4180()  RFC4180; end    ## alternative alias for RFC4180
+  def self.excel()    EXCEL; end      ## alternative alias for EXCEL
-      self  ## return self for chaining
-    end
-    ## strip leading and trailing spaces
-    def trim?() @trim; end
-    ## skip blank lines (with only 1+ spaces)
-    ## note: for now blank lines with no spaces will always get skipped
-    def blanks?() @blanks; end
-    def comments?() @comments; end
-    ## built-in (default) options
-    ##  todo: find a better name?
-    def default_options
-      ## note:
-      ##   do NOT include sep character and
-      ##   do NOT include headers true/false here
-      ##
-      ##  make default sep its own "global" default config
-      ##   e.g. Csv.config.sep =
-      ## common options
-      ##   skip comments starting with #
-      ##   skip blank lines
-      ##   strip leading and trailing spaces
-      ##    NOTE/WARN:  leading and trailing spaces NOT allowed/working with double quoted values!!!!
-      defaults = {
-        blanks:   @blanks,    ## note: skips lines with no whitespaces only!! (e.g. line with space is NOT blank!!)
-        comments: @comments,
-        trim:     @trim
-        ## :converters => :strip
-      }
-      defaults
-    end
-  end # class Configuration
+  #####################
+  ## convenience helpers defaulting to default csv dialect/format reader
+  ##
+  ##   CsvReader.parse_line is the same as
+  ##     CsvReader::DEFAULT.parse_line or CsvReader.default.parse_line
+  ##
+  def self.parse_line( data, sep: nil,
+                             converters: nil )
+     DEFAULT.parse_line( data, sep: sep, converters: converters )
+  end
-  ## lets you use
-  ##   Csv.configure do |config|
-  ##      config.sep = ','   ## or "/t"
-  ##   end
+  def self.parse( data, sep: nil,
+                        converters: nil )
+     DEFAULT.parse( data, sep: sep, converters: converters )
+  end
-  def self.configure
-    yield( config )
+  #### fix!!! remove - replace with parse with (optional) block!!!!!
+  def self.parse_lines( data, sep: nil,
+                              converters: nil, &block )
+     DEFAULT.parse_lines( data, sep: sep, converters: nil, &block )
   end
-  def self.config
-    @config ||= Configuration.new
+  def self.read( path, sep: nil,
+                       converters: nil )
+     DEFAULT.read( path, sep: sep, converters: converters )
   end
-end   # module Csvv
+  def self.header( path, sep: nil )
+     DEFAULT.header( path, sep: sep )
+  end
+  def self.foreach( path, sep: nil,
+                          converters: nil, &block )
+     DEFAULT.foreach( path, sep: sep, converters: converters, &block )
+  end
-####
-## use our own wrapper
-class CsvReader
-  def self.parse_line( txt, sep:        Csv.config.sep,
-                            trim:       Csv.config.trim?,
-                            na:         Csv.config.na,
-                            dialect:    Csv.config.dialect,
-                            converters: nil)
-    ## note: do NOT include headers option (otherwise single row gets skipped as first header row :-)
-    csv_options = Csv.config.default_options.merge(
-                    col_sep: sep
-    )
-    ## pp csv_options
-    Parser.parse_line( txt )  ##, csv_options )
-  end
+  #############################
+  ## all "high-level" reader methods
+  ##
+  ## note: allow "overriding" of separator
+  ##    if sep is not nil otherwise use default dialect/format separator
   ##
   ##  todo/fix: "unify" parse and parse_lines  !!!
   ##    check for block_given? - why? why not?
-  def self.parse( txt, sep: Csv.config.sep )
-    csv_options = Csv.config.default_options.merge(
-                     col_sep: sep
-    )
-    ## pp csv_options
-    Parser.parse( txt )  ###, csv_options )
+  def parse( data, sep: nil, limit: nil,
+                   converters: nil )
+    sep = @parser.config[:sep]  if sep.nil?
+    @parser.parse( data, sep: sep, limit: limit )
+  end
+  #### fix!!! remove - replace with parse with (optional) block!!!!!
+  def parse_lines( data, sep: nil,
+                         converters: nil, &block )
+    sep = @parser.config[:sep]  if sep.nil?
+    @parser.parse_lines( data, sep: sep, &block )
   end
-  def self.parse_lines( txt, sep: Csv.config.sep, &block )
-    csv_options = Csv.config.default_options.merge(
-                     col_sep: sep
-    )
-    ## pp csv_options
-    Parser.parse_lines( txt, &block )  ###, csv_options )
+  def parse_line( data, sep: nil,
+                        converters: nil )
+    records = parse( data, sep: sep, limit: 1 )
+    ## unwrap record if empty return nil - why? why not?
+    ##  return empty record e.g. [] - why? why not?
+    records.size == 0 ? nil : records.first
   end
-  def self.read( path, sep: Csv.config.sep )
+  def read( path, sep: nil,
+                  converters: nil )
     ## note: use our own file.open
     ##   always use utf-8 for now
     ##    check/todo: add skip option bom too - why? why not?
@@ -152,33 +100,26 @@ class CsvReader
     parse( txt, sep: sep )
   end
-  def self.foreach( path, sep: Csv.config.sep, &block )
-    csv_options = Csv.config.default_options.merge(
-                     col_sep: sep
-    )
-    Parser.foreach( path, &block ) ###, csv_options )
+  def foreach( path, sep: nil,
+                     converters: nil, &block )
+    File.open( path, 'r:bom|utf-8' ) do |file|
+      parse_lines( file, sep: sep, &block )
+    end
   end
-  def self.header( path, sep: Csv.config.sep )   ## use header or headers - or use both (with alias)?
-      # read first lines (only)
-      #  and parse with csv to get header from csv library itself
-      #
-      #  check - if there's an easier or built-in way for the csv library
-      ## readlines until
-      ##  - NOT a comments line or
-      ##  - NOT a blank line
+  def header( path, sep: nil )   ## use header or headers - or use both (with alias)?
+     # read first lines (only)
+     #  and parse with csv to get header from csv library itself
      record = nil
      File.open( path, 'r:bom|utf-8' ) do |file|
-        record = Parser.parse_line( file )
+        record = parse_line( file, sep: sep )
      end
-     record  ## todo/fix: return nil for empty - why? why not?
-    end  # method self.header
+     record  ## todo/fix: returns nil for empty - why? why not?
+  end  # method self.header
 end # class CsvReader
@@ -188,13 +129,13 @@ end # class CsvReader
 class CsvHashReader
-def self.parse( txt, sep: Csv.config.sep, headers: nil )
+def self.parse( data, sep: nil, headers: nil )
   ## pass in headers as array e.g. ['A', 'B', 'C']
   names = headers ? headers : nil
   records = []
-  CsvReader.parse_lines( txt ) do |values|     # sep: sep
+  CsvReader.parse_lines( data ) do |values|     # sep: sep
     if names.nil?
       names = values   ## store header row / a.k.a. field/column names
     else
@@ -206,13 +147,13 @@ def self.parse( txt, sep: Csv.config.sep, headers: nil )
 end
-def self.read( path, sep: Csv.config.sep, headers: nil )
+def self.read( path, sep: nil, headers: nil )
   txt = File.open( path, 'r:bom|utf-8' ).read
   parse( txt, sep: sep, headers: headers )
 end
-def self.foreach( path, sep: Csv.config.sep, headers: nil, &block )
+def self.foreach( path, sep: nil, headers: nil, &block )
   ## pass in headers as array e.g. ['A', 'B', 'C']
   names = headers ? headers : nil
@@ -228,7 +169,7 @@ def self.foreach( path, sep: Csv.config.sep, headers: nil, &block )
 end
-def self.header( path, sep: Csv.config.sep )   ## add header too? why? why not?
+def self.header( path, sep: nil )   ## add header too? why? why not?
   ## same as "classic" header method - delegate/reuse :-)
   CsvReader.header( path, sep: sep )
 end

data/lib/csvreader/version.rb CHANGED Viewed

@@ -4,7 +4,7 @@
 class CsvReader   ## note: uses a class for now - change to module - why? why not?
   MAJOR = 0    ## todo: namespace inside version or something - why? why not??
-  MINOR = 4
+  MINOR = 5
   PATCH = 0
   VERSION = [MAJOR,MINOR,PATCH].join('.')

data/test/test_parser.rb CHANGED Viewed

@@ -9,24 +9,38 @@ require 'helper'
 class TestParser < MiniTest::Test
+def setup
+  CsvReader::Parser.logger.level = :debug   ## turn on "global" logging - move to helper - why? why not?
+end
+def parser
+  parser = CsvReader::Parser::DEFAULT
+end
-def test_parse1
-   records = [["a", "b", "c"],
-              ["1", "2", "3"],
-              ["4", "5", "6"]]
-   ## don't care about newlines (\r\n)
-   assert_equal records, CsvReader::Parser.parse( "a,b,c\n1,2,3\n4,5,6" )
-   assert_equal records, CsvReader::Parser.parse( "a,b,c\n1,2,3\n4,5,6\n" )
-   assert_equal records, CsvReader::Parser.parse( "a,b,c\r1,2,3\r4,5,6" )
-   assert_equal records, CsvReader::Parser.parse( "a,b,c\r\n1,2,3\r\n4,5,6\r\n" )
-   ## or leading and trailing spaces
-   assert_equal records, CsvReader::Parser.parse( "    \n a , b , c \n 1,2  ,3 \n 4,5,6   " )
-   assert_equal records, CsvReader::Parser.parse( "\n\na,  b,c   \n  1, 2, 3\n 4, 5, 6" )
-   assert_equal records, CsvReader::Parser.parse( "   \"a\"  , b ,  \"c\"   \n1,  2,\"3\"   \n4,5,  \"6\"" )
-   assert_equal records, CsvReader::Parser.parse( "a, b, c\n1,  2,3\n\n\n4,5,6\n\n\n" )
-   assert_equal records, CsvReader::Parser.parse( " a, b ,c  \n 1 , 2 , 3 \n4,5,6  " )
+def test_parser_default
+  pp CsvReader::Parser::DEFAULT
+  pp CsvReader::Parser.default
+  assert true
+end
+def test_parse
+  records = [["a", "b", "c"],
+             ["1", "2", "3"],
+             ["4", "5", "6"]]
+  ## don't care about newlines (\r\n)
+  assert_equal records, parser.parse( "a,b,c\n1,2,3\n4,5,6" )
+  assert_equal records, parser.parse( "a,b,c\n1,2,3\n4,5,6\n" )
+  assert_equal records, parser.parse( "a,b,c\r1,2,3\r4,5,6" )
+  assert_equal records, parser.parse( "a,b,c\r\n1,2,3\r\n4,5,6\r\n" )
+  ## or leading and trailing spaces
+  assert_equal records, parser.parse( "    \n a , b , c \n 1,2  ,3 \n 4,5,6   " )
+  assert_equal records, parser.parse( "\n\na,  b,c   \n  1, 2, 3\n 4, 5, 6" )
+  assert_equal records, parser.parse( "   \"a\"  , b ,  \"c\"   \n1,  2,\"3\"   \n4,5,  \"6\"" )
+  assert_equal records, parser.parse( "a, b, c\n1,  2,3\n\n\n4,5,6\n\n\n" )
+  assert_equal records, parser.parse( " a, b ,c  \n 1 , 2 , 3 \n4,5,6  " )
 end
@@ -34,19 +48,19 @@ def test_parse_quotes
   records = [["a", "b", "c"],
              ["11 \n 11", "\"2\"", "3"]]
-  assert_equal records, CsvReader::Parser.parse( " a, b ,c  \n\"11 \n 11\", \"\"\"2\"\"\" , 3 \n" )
-  assert_equal records, CsvReader::Parser.parse( "\n\n \"a\", \"b\" ,\"c\"  \n  \"11 \n 11\"  ,  \"\"\"2\"\"\" , 3 \n" )
+  assert_equal records, parser.parse( " a, b ,c  \n\"11 \n 11\", \"\"\"2\"\"\" , 3 \n" )
+  assert_equal records, parser.parse( "\n\n \"a\", \"b\" ,\"c\"  \n  \"11 \n 11\"  ,  \"\"\"2\"\"\" , 3 \n" )
 end
 def test_parse_empties
   records = [["", "", ""]]
-  assert_equal records, CsvReader::Parser.parse( ",," )
-  assert_equal records, CsvReader::Parser.parse( <<TXT )
+  assert_equal records, parser.parse( ",," )
+  assert_equal records, parser.parse( <<TXT )
   "","",""
 TXT
-  assert_equal [], CsvReader::Parser.parse( "" )
+  assert_equal [], parser.parse( "" )
 end
@@ -54,7 +68,7 @@ def test_parse_comments
   records = [["a", "b", "c"],
              ["1", "2", "3"]]
-  assert_equal records, CsvReader::Parser.parse( <<TXT )
+  assert_equal records, parser.parse( <<TXT )
 # comment
 # comment
 ## comment
@@ -64,7 +78,7 @@ a, b, c
 TXT
-  assert_equal records, CsvReader::Parser.parse( <<TXT )
+  assert_equal records, parser.parse( <<TXT )
    a,   b,   c
    1,   2,   3

data/test/test_parser_formats.rb ADDED Viewed

@@ -0,0 +1,69 @@
+# encoding: utf-8
+###
+#  to run use
+#     ruby -I ./lib -I ./test test/test_parser_formats.rb
+require 'helper'
+class TestParserFormats < MiniTest::Test
+def setup
+  CsvReader::Parser.logger.level = :debug   ## turn on "global" logging - move to helper - why? why not?
+end
+def parser
+  CsvReader::Parser
+end
+def test_parse_whitespace
+   records = [["a", "b", "c"],
+              ["1", "2", "3"]]
+   ## don't care about newlines (\r\n) ??? - fix? why? why not?
+   assert_equal records, parser.default.parse( "a,b,c\n1,2,3" )
+   assert_equal records, parser.default.parse( "a,b,c\n1,2,3\n" )
+   assert_equal records, parser.default.parse( " a, b ,c \n\n1,2,3\n" )
+   assert_equal records, parser.default.parse( " a, b ,c \n \n1,2,3\n" )
+   assert_equal [["a", "b", "c"],
+                 [""],
+                 ["1", "2", "3"]], parser.default.parse( %Q{a,b,c\n""\n1,2,3\n} )
+   assert_equal [["", ""],
+                 [""],
+                 ["", "", ""]], parser.default.parse( %Q{,\n""\n"","",""\n} )
+   ## strict rfc4180 - no trim leading or trailing spaces or blank lines
+   assert_equal records,   parser.rfc4180.parse( "a,b,c\n1,2,3" )
+   assert_equal [["a", "b", "c"],
+                 [""],
+                 ["1", "2", "3"]], parser.rfc4180.parse( "a,b,c\n\n1,2,3" )
+   assert_equal [[" a", " b ", "c "],
+                 [""],
+                 ["1", "2", "3"]], parser.rfc4180.parse( " a, b ,c \n\n1,2,3" )
+    assert_equal [[" a", " b ", "c "],
+                  [" "],
+                  ["",""],
+                  ["1", "2", "3"]], parser.rfc4180.parse( " a, b ,c \n \n,\n1,2,3" )
+end
+def test_parse_empties
+    assert_equal [], parser.default.parse( "\n \n \n" )
+    ## strict rfc4180 - no trim leading or trailing spaces or blank lines
+    assert_equal [[""],
+                  [" "],
+                  [" "]], parser.rfc4180.parse( "\n \n \n" )
+    assert_equal [[""],
+                  [" "],
+                  [" "]], parser.rfc4180.parse( "\n \n " )
+    assert_equal [[""]], parser.rfc4180.parse( "\n" )
+    assert_equal [],     parser.rfc4180.parse( "" )
+end
+end # class TestParserFormats

data/test/test_parser_rfc4180.rb ADDED Viewed

@@ -0,0 +1,95 @@
+# encoding: utf-8
+###
+#  to run use
+#     ruby -I ./lib -I ./test test/test_parser_rfc4180.rb
+require 'helper'
+class TestParserRfc4180 < MiniTest::Test
+def setup
+  CsvReader::Parser.logger.level = :debug   ## turn on "global" logging - move to helper - why? why not?
+end
+def parser
+  CsvReader::Parser::RFC4180
+end
+def test_parser_rfc4180
+  pp CsvReader::Parser::RFC4180
+  pp CsvReader::Parser.rfc4180
+  assert true
+end
+def test_parse
+   records = [["a", "b", "c"],
+              ["1", "2", "3"],
+              ["4", "5", "6"]]
+   ## don't care about newlines (\r\n) ??? - fix? why? why not?
+   assert_equal records, parser.parse( "a,b,c\n1,2,3\n4,5,6" )
+   assert_equal records, parser.parse( "a,b,c\n1,2,3\n4,5,6\n" )
+   assert_equal records, parser.parse( "a,b,c\r1,2,3\r4,5,6" )
+   assert_equal records, parser.parse( "a,b,c\r\n1,2,3\r\n4,5,6\r\n" )
+end
+def test_parse_semicolon
+   records = [["a", "b", "c"],
+              ["1", "2", "3"],
+              ["4", "5", "6"]]
+   ## don't care about newlines (\r\n) ??? - fix? why? why not?
+   assert_equal records, parser.parse( "a;b;c\n1;2;3\n4;5;6",         sep: ';' )
+   assert_equal records, parser.parse( "a;b;c\n1;2;3\n4;5;6\n",       sep: ';' )
+   assert_equal records, parser.parse( "a;b;c\r1;2;3\r4;5;6",         sep: ';' )
+   assert_equal records, parser.parse( "a;b;c\r\n1;2;3\r\n4;5;6\r\n", sep: ';' )
+end
+def test_parse_tab
+   records = [["a", "b", "c"],
+              ["1", "2", "3"],
+              ["4", "5", "6"]]
+   ## don't care about newlines (\r\n) ??? - fix? why? why not?
+   assert_equal records, parser.parse( "a\tb\tc\n1\t2\t3\n4\t5\t6",         sep: "\t" )
+   assert_equal records, parser.parse( "a\tb\tc\n1\t2\t3\n4\t5\t6\n",       sep: "\t" )
+   assert_equal records, parser.parse( "a\tb\tc\r1\t2\t3\r4\t5\t6",         sep: "\t" )
+   assert_equal records, parser.parse( "a\tb\tc\r\n1\t2\t3\r\n4\t5\t6\r\n", sep: "\t" )
+end
+def test_parse_empties
+  assert_equal [["","",""],["","",""]], parser.parse( %Q{"","",""\n,,} )
+  parser.config[:quoted_empty] = nil
+  assert_nil       parser.config[:quoted_empty]
+  assert_equal "", parser.config[:unquoted_empty]
+  assert_equal [[nil,nil,nil," "],["","",""," "]], parser.parse( %Q{"","",""," "\n,,, } )
+  parser.config[:unquoted_empty] = nil
+  assert_nil parser.config[:quoted_empty]
+  assert_nil parser.config[:unquoted_empty]
+  assert_equal [[nil,nil,nil," "],[nil,nil,nil," "]], parser.parse( %Q{"","",""," "\n,,, } )
+  ## reset to defaults
+  parser.config[:quoted_empty]   = ""
+  parser.config[:unquoted_empty] = ""
+  assert_equal "", parser.config[:quoted_empty]
+  assert_equal "", parser.config[:unquoted_empty]
+  assert_equal [["","",""],["","",""]], parser.parse( %Q{"","",""\n,,} )
+end
+end # class TestParserRfc4180

data/test/test_reader.rb CHANGED Viewed

@@ -9,6 +9,10 @@ require 'helper'
 class TestReader < MiniTest::Test
+def setup
+  CsvReader::Parser.logger.level = :debug   ## turn on "global" logging - move to helper - why? why not?
+end
 def test_read
   puts "== read: beer.csv:"

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: csvreader
 version: !ruby/object:Gem::Version
-  version: 0.4.0
+  version: 0.5.0
 platform: ruby
 authors:
 - Gerald Bauer
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-08-21 00:00:00.000000000 Z
+date: 2018-09-25 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rdoc
@@ -64,6 +64,8 @@ files:
 - test/data/shakespeare.csv
 - test/helper.rb
 - test/test_parser.rb
+- test/test_parser_formats.rb
+- test/test_parser_rfc4180.rb
 - test/test_reader.rb
 - test/test_reader_hash.rb
 homepage: https://github.com/csv11/csvreader