RubyGems - csvreader - Versions diffs - 0.4.0 → 0.5.0 - Mend

csvreader 0.4.0 → 0.5.0

Files changed (13) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: ed373a97a0bdb4c45d2980894a32014cdcb8ca7c
-  data.tar.gz: 784adcade81e39ad9accd1a9b2d0c76fd666b6f9
+  metadata.gz: ea1d667219773e3a355c81f815d91e92340d61a1
+  data.tar.gz: ba7a43ccb5e110fc1f6eca76ca2a74a62f1131fb
 SHA512:
-  metadata.gz: 5523a8697990c691f55aa7c3b23867104b1c4c5b8e9e25b0424a3191e73cbb32cee369541b712f60fc366ba76a8207a77d6b12b68ea209896b6c26e11c5712de
-  data.tar.gz: 7c33c812c2a53303911b6686d03554d6e388b3f936a3b6b8d995ed237651bd171d3bdb8ab8f38f7f327e1a9d1be26d1fa955918012f37cf6a9e1c2cc6ab08373
+  metadata.gz: 0543a4338d2d12e36da16acdad9abff28633e519baa1d92044d1ca8f5e3472d835d00a10d8b19c24561b06e0d724f87414495600f4c83eef7c9e033474b4c09e
+  data.tar.gz: 8df669bc86f2066b2650a67bda5698fae7b6d58766b9c318f47958b0499671d0a4d39e862b8d3af842105a2777a3cf7ad05168380c338a3053fd3d363697abfb

data/Manifest.txt CHANGED Viewed

@@ -13,5 +13,7 @@ test/data/beer11.csv
 test/data/shakespeare.csv
 test/helper.rb
 test/test_parser.rb
+test/test_parser_formats.rb
+test/test_parser_rfc4180.rb
 test/test_reader.rb
 test/test_reader_hash.rb

data/README.md CHANGED Viewed

@@ -164,17 +164,15 @@ see [`TabReader` »](https://github.com/datatxt/tabreader).
 Two major design bugs and many many minor.
-(1) The CSV class uses `line.split(',')` with some kludges (†) with the claim its faster.
+(1) The CSV class uses [`line.split(',')`](https://github.com/ruby/csv/blob/master/lib/csv.rb#L1248) with some kludges (†) with the claim it's faster.
 What?! The right way: CSV needs its own purpose-built parser. There's no other
 way you can handle all the (edge) cases with double quotes and escaped doubled up
 double quotes. Period.
-For example, the CSV class cannot handle leading or trailing spaces
+For example, the CSV class cannot handle leading or trailing spaces
 for double quoted values `1,•"2","3"•`.
 Or handling double quotes inside values and so on and on.
-(†): kludge - a workaround or quick-and-dirty solution that is clumsy, inelegant, inefficient, difficult to extend and hard to maintain
 (2) The CSV class returns `nil` for `,,` but an empty string (`""`)
 for `"","",""`. The right way: All values are always strings. Period.
@@ -182,6 +180,36 @@ If you want to use `nil` you MUST configure a string (or strings)
 such as `NA`, `n/a`, `\N`, or similar that map to `nil`.
+(†): kludge - a workaround or quick-and-dirty solution that is clumsy, inelegant, inefficient, difficult to extend and hard to maintain
+Appendix: Simple examples the standard csv library cannot read:
+Quoted values with leading or trailing spaces e.g.
+```
+1, "2","3" , "4" ,5
+```
+=>
+``` ruby
+["1", "2", "3", "4" ,"5"]
+```
+"Auto-fix" unambiguous quotes in "unquoted" values e.g.
+```
+value with "quotes", another value
+```
+=>
+``` ruby
+["value with \"quotes\"", "another value"]
+```
+and some more.

data/lib/csvreader.rb CHANGED Viewed

@@ -3,6 +3,7 @@
 require 'csv'
 require 'json'
 require 'pp'
+require 'logger'
 ###

data/lib/csvreader/buffer.rb CHANGED Viewed

@@ -18,22 +18,10 @@ class BufferIO   ## todo: find a better name - why? why not? is really just for
     end
   end # method getc
-  def ungetc( c )
-    ## add upfront as first char in buffer
-    ##   last in/first out queue!!!!
-    @buf.unshift( c )
-    ## puts "ungetc - >#{c} (#{c.ord})< => >#{@buf}<"
-  end
   def peek
-     ## todo/fix:
-     ## use Hexadecimal code: 1A, U+001A for eof char - why? why not?
     if @buf.size == 0 && @io.eof?
       puts "peek - hitting eof!!!"
-      ## return eof char(s) - exits? is \0 ?? double check
-      return "\0"
+      return  "\0"   ## return NUL char (0) for now
     end
     if @buf.size == 0
@@ -44,5 +32,6 @@ class BufferIO   ## todo: find a better name - why? why not? is really just for
     @buf.first
   end # method peek
 end # class BufferIO
 end # class CsvReader

data/lib/csvreader/parser.rb CHANGED Viewed

@@ -1,74 +1,92 @@
 # encoding: utf-8
 class CsvReader
-class Parser
-## char constants
-DOUBLE_QUOTE = "\""
-COMMENT      = "#"    ## use COMMENT_HASH or HASH or ??
-SPACE        = " "
-TAB          = "\t"
-LF	         = "\n"    ## 0A (hex)  10 (dec)
-CR	         = "\r"    ## 0D (hex)  13 (dec)
-def self.parse( data )
-  puts "parse:"
-  pp data
-  parser = new
-  parser.parse( data )
-end
-def self.parse_line( data )
-  puts "parse_line:"
+class Parser
-  parser = new
-  records = parser.parse( data, limit: 1 )
-  ## unwrap record if empty return nil - why? why not?
-  ##  return empty record e.g. [] - why? why not?
-  records.size == 0 ? nil : records.first
+## char constants
+DOUBLE_QUOTE = "\""
+BACKSLASH    = "\\"    ## use BACKSLASH_ESCAPE ??
+COMMENT      = "#"      ## use COMMENT_HASH or HASH or ??
+SPACE        = " "      ##   \s == ASCII 32 (dec)            =    (Space)
+TAB          = "\t"     ##   \t == ASCII 0x09 (hex)          = HT (Tab/horizontal tab)
+LF	         = "\n"     ##   \n == ASCII 0x0A (hex) 10 (dec) = LF (Newline/line feed)
+CR	         = "\r"     ##   \r == ASCII 0x0D (hex) 13 (dec) = CR (Carriage return)
+###################################
+## add simple logger with debug flag/switch
+#
+#  use Parser.debug = true   # to turn on
+#
+#  todo/fix: use logutils instead of std logger - why? why not?
+def self.logger() @@logger ||= Logger.new( STDOUT ); end
+def logger()  self.class.logger; end
+attr_reader :config   ## todo/fix: change config to proper dialect class/struct - why? why not?
+def initialize( sep:         ',',
+                quote:       DOUBLE_QUOTE, ## note: set to nil for no quote
+                doublequote: true,
+                escape:      BACKSLASH,   ## note: set to nil for no escapes
+                trim:        true,   ## note: will toggle between human/default and strict mode parser!!!
+                na:          ['\N', 'NA'],  ## note: set to nil for no null vales / not availabe (na)
+                quoted_empty:   '',   ## note: only available in strict mode (e.g. trim=false)
+                unquoted_empty: ''    ## note: only available in strict mode (e.g. trim=false)
+               )
+  @config = {}   ## todo/fix: change config to proper dialect class/struct - why? why not?
+  @config[:sep]          = sep
+  @config[:quote]        = quote
+  @config[:doublequote]  = doublequote
+  @config[:escape]  = escape
+  @config[:trim]         = trim
+  @config[:na]     = na
+  @config[:quoted_empty] = quoted_empty
+  @config[:unquoted_empty] = unquoted_empty
 end
-def self.read( path )
-  parser = new
-  File.open( path, 'r:bom|utf-8' ) do |file|
-    parser.parse( file )
-  end
-end
+def strict?
+  ## note:  use trim for separating two different parsers / code paths:
+  ##   - human with trim leading and trailing whitespace and
+  ##   - strict with no leading and trailing whitespaces allowed
-def self.foreach( path, &block )
-  parser = new
-  File.open( path, 'r:bom|utf-8' ) do |file|
-    parser.foreach( file, &block )
-  end
+  ## for now use - trim == false for strict version flag alias
+  ##   todo/fix: add strict flag - why? why not?
+  @config[:trim] ? false : true
 end
-def self.parse_lines( data, &block )
-  parser = new
-  parser.parse_lines( data, &block )
-end
+DEFAULT = new( sep: ',', trim: true )
+RFC4180 = new( sep: ',', trim: false )
+EXCEL   = new( sep: ',', trim: false )
+def self.default()  DEFAULT; end    ## alternative alias for DEFAULT
+def self.rfc4180()  RFC4180; end    ## alternative alias for RFC4180
+def self.excel()    EXCEL; end      ## alternative alias for EXCEL
-def parse_field( io, trim: true )
+def parse_field( io, sep: )
+  logger.debug "parse field - sep: >#{sep}< (#{sep.ord})"  if logger.debug?
   value = ""
-  value << parse_spaces( io ) ## add leading spaces
+  skip_spaces( io )   ## strip leading spaces
   if (c=io.peek; c=="," || c==LF || c==CR || io.eof?) ## empty field
-    value = value.strip    if trim ## strip all spaces
      ## return value; do nothing
   elsif io.peek == DOUBLE_QUOTE
-    puts "start double_quote field - value >#{value}<"
-    value = value.strip   ## note always strip/trim leading spaces in quoted value
-    puts "start double_quote field - peek >#{io.peek}< (#{io.peek.ord})"
+    logger.debug "start double_quote field - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
     io.getc  ## eat-up double_quote
     loop do
@@ -89,18 +107,18 @@ def parse_field( io, trim: true )
     ## note: always eat-up all trailing spaces (" ") and tabs (\t)
     skip_spaces( io )
-    puts "end double_quote field - peek >#{io.peek}< (#{io.peek.ord})"
+    logger.debug "end double_quote field - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
   else
-    puts "start reg field - peek >#{io.peek}< (#{io.peek.ord})"
+    logger.debug "start reg field - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
     ## consume simple value
     ##   until we hit "," or "\n" or "\r"
     ##    note: will eat-up quotes too!!!
     while (c=io.peek; !(c=="," || c==LF || c==CR || io.eof?))
-      puts "  add char >#{io.peek}< (#{io.peek.ord})"
+      logger.debug "  add char >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
       value << io.getc   ## eat-up all spaces (" ") and tabs (\t)
     end
-    value = value.strip    if trim ## strip all spaces
-    puts "end reg field - peek >#{io.peek}< (#{io.peek.ord})"
+    value = value.strip   ## strip all trailing spaces
+    logger.debug "end reg field - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
   end
   value
@@ -108,12 +126,60 @@ end
-def parse_record( io, trim: true )
+def parse_field_strict( io, sep: )
+  logger.debug "parse field (strict) - sep: >#{sep}< (#{sep.ord})"  if logger.debug?
+  value = ""
+  if (c=io.peek; c==sep || c==LF || c==CR || io.eof?) ## empty unquoted field
+     value = config[:unquoted_empty]   ## defaults to "" (might be set to nil if needed)
+     ## return value; do nothing
+  elsif config[:quote] && io.peek == config[:quote]
+    logger.debug "start quote field (strict) - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+    io.getc  ## eat-up double_quote
+    loop do
+      while (c=io.peek; !(c==config[:quote] || io.eof?))
+        value << io.getc   ## eat-up everything unit quote (")
+      end
+      break if io.eof?
+      io.getc ## eat-up double_quote
+      if config[:doublequote] && io.peek == config[:quote]  ## doubled up quote?
+        value << io.getc   ## add doube quote and continue!!!!
+      else
+        break
+      end
+    end
+    value = config[:quoted_empty]  if value == ""   ## defaults to "" (might be set to nil if needed)
+    logger.debug "end double_quote field (strict) - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+  else
+    logger.debug "start reg field (strict) - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+    ## consume simple value
+    ##   until we hit "," or "\n" or "\r" or stroy "\"" double quote
+    while (c=io.peek; !(c==sep || c==LF || c==CR || c==config[:quote] || io.eof?))
+      logger.debug "  add char >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+      value << io.getc
+    end
+    logger.debug "end reg field (strict) - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+  end
+  value
+end
+def parse_record( io, sep: )
   values = []
   loop do
-     value = parse_field( io, trim: trim )
-     puts "value: »#{value}«"
+     value = parse_field( io, sep: sep )
+     logger.debug "value: »#{value}«"  if logger.debug?
      values << value
      if io.eof?
@@ -133,6 +199,33 @@ def parse_record( io, trim: true )
 end
+def parse_record_strict( io, sep: )
+  values = []
+  loop do
+     value = parse_field_strict( io, sep: sep )
+     logger.debug "value: »#{value}«"  if logger.debug?
+     values << value
+     if io.eof?
+        break
+     elsif (c=io.peek; c==LF || c==CR)
+       skip_newline( io )   ## note: singular / single newline only (NOT plural)
+       break
+     elsif io.peek == sep
+       io.getc   ## eat-up FS (,)
+     else
+       puts "*** csv parse error (strict): found >#{io.peek} (#{io.peek.ord})< - FS (,) or RS (\\n) expected!!!!"
+       exit(1)
+     end
+  end
+  values
+end
 def skip_newlines( io )
   return if io.eof?
@@ -142,6 +235,22 @@ def skip_newlines( io )
 end
+def skip_newline( io )    ## note: singular (strict) version
+  return if io.eof?
+  ## only skip CR LF or LF or CR
+  if io.peek == CR
+    io.getc ## eat-up
+    io.getc  if io.peek == LF
+  elsif io.peek == LF
+    io.getc ## eat-up
+  else
+    # do nothing
+  end
+end
 def skip_until_eol( io )
   return if io.eof?
@@ -161,91 +270,95 @@ end
-def parse_spaces( io )  ## helper method
-  spaces = ""
-  ## add leading spaces
-  while (c=io.peek; c==SPACE || c==TAB)
-    spaces << io.getc   ## eat-up all spaces (" ") and tabs (\t)
-  end
-  spaces
-end
-def parse_lines( io_maybe, trim: true,
-                           comments: true,
-                           blanks: true,   &block )
-  ## find a better name for io_maybe
-  ##   make sure io is a wrapped into BufferIO!!!!!!
-  if io_maybe.is_a?( BufferIO )    ### allow (re)use of BufferIO if managed from "outside"
-    io = io_maybe
-  else
-    io = BufferIO.new( io_maybe )
-  end
+def parse_lines_human( io, sep:, &block )
   loop do
     break if io.eof?
-    ## hack: use own space buffer for peek( x ) lookahead (more than one char)
-    ## check for comments or blank lines
-    if comments || blanks
-      spaces = parse_spaces( io )
-    end
+    skip_spaces( io )
-    if comments && io.peek == COMMENT        ## comment line
-      puts "skipping comment - peek >#{io.peek}< (#{io.peek.ord})"
+    if io.peek == COMMENT        ## comment line
+      logger.debug "skipping comment - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
       skip_until_eol( io )
       skip_newlines( io )
-    elsif blanks && (c=io.peek; c==LF || c==CR || io.eof?)
-      puts "skipping blank - peek >#{io.peek}< (#{io.peek.ord})"
+    elsif (c=io.peek; c==LF || c==CR || io.eof?)
+      logger.debug "skipping blank - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
       skip_newlines( io )
-    else  # undo (ungetc spaces)
-      puts "start record - peek >#{io.peek}< (#{io.peek.ord})"
-      if comments || blanks
-        ## note: MUST ungetc in "reverse" order
-        ##   ##   buffer is last in/first out queue!!!!
-        spaces.reverse.each_char { |space| io.ungetc( space ) }
-      end
-      record = parse_record( io, trim: trim )
+    else
+      logger.debug "start record - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+      record = parse_record( io, sep: sep )
       ## note: requires block - enforce? how? why? why not?
       block.call( record )   ## yield( record )
     end
   end  # loop
-end # method parse_lines
+end # method parse_lines_human
+def parse_lines_strict( io, sep:, &block )
+  ## no leading and trailing whitespaces trimmed/stripped
+  ## no comments skipped
+  ## no blanks skipped
+  ## - follows strict rules of
+  ##  note: this csv format is NOT recommended;
+  ##    please, use a format with comments, leading and trailing whitespaces, etc.
+  ##    only added for checking compatibility
+  loop do
+    break if io.eof?
+    logger.debug "start record (strict) - peek >#{io.peek}< (#{io.peek.ord})"  if logger.debug?
+    record = parse_record_strict( io, sep: sep )
+    ## note: requires block - enforce? how? why? why not?
+    block.call( record )   ## yield( record )
+  end  # loop
+end # method parse_lines_strict
+def parse_lines( io_maybe, sep: config[:sep], &block )
+  ## find a better name for io_maybe
+  ##   make sure io is a wrapped into BufferIO!!!!!!
+  if io_maybe.is_a?( BufferIO )    ### allow (re)use of BufferIO if managed from "outside"
+    io = io_maybe
+  else
+    io = BufferIO.new( io_maybe )
+  end
+  if strict?
+    parse_lines_strict( io, sep: sep, &block )
+  else
+    parse_lines_human( io, sep: sep, &block )
+  end
+end  ## parse_lines
+##   fix: add optional block  - lets you use it like foreach!!!
+##    make foreach an alias of parse with block - why? why not?
+##
+##   unifiy with (make one) parse and parse_lines!!!! - why? why not?
-def parse( io_maybe, trim: true,
-               comments: true,
-               blanks: true,
-               limit: nil )
+def parse( io_maybe, sep: config[:sep], limit: nil )
   records = []
-  parse_lines( io_maybe, trim: trim, comments: comments, blanks: blanks ) do |record|
+  parse_lines( io_maybe, sep: sep  ) do |record|
     records << record
     ## set limit to 1 for processing "single" line (that is, get one record)
-    return records   if limit && limit >= records.size
+    break  if limit && limit >= records.size
   end
   records
 end ## method parse
-def foreach( io_maybe, trim: true,
-                 comments: true,
-                 blanks: true,    &block )
-  parse_lines( io_maybe, trim: trim, comments: comments, blanks: blanks, &block )
-end
 end # class Parser
 end # class CsvReader

data/lib/csvreader/reader.rb CHANGED Viewed

@@ -1,150 +1,98 @@
 # encoding: utf-8
-module Csv    ## check: rename to CsvSettings / CsvPref / CsvGlobals or similar - why? why not???
+class CsvReader
-class Dialect   ## todo: use a module - it's just a namespace/module now - why? why not?
-  ###
-  # (auto-)add these flavors/dialects:
-  #     :tab                   -> uses TabReader(!)
-  #     :strict|:rfc4180
-  #     :unix                   -> uses unix-style escapes e.g. \n \" etc.
-  #     :windows|:excel
-  #     :guess|:auto     -> guess (auto-detect) separator - why? why not?
-  ##  e.g. use Dialect.registry[:unix] = { ... } etc.
-  ##   note use @@ - there is only one registry
-  def self.registry() @@registry ||={} end
-  ## add built-in dialects:
-  ##    trim - use strip? why? why not? use alias?
-  registry[:tab]     = {}   ##{ class: TabReader }
-  registry[:strict]  = { strict: true, trim: false }   ## add no comments, blank lines, etc. ???
-  registry[:rfc4180] = :strict    ## alternative name
-  registry[:windows] = {}
-  registry[:excel]   = :windows
-  registry[:unix]    = {}
-  ## todo: add some more
-end  # class Dialect
-  class Configuration
+  def initialize( parser )
+    @parser = parser
+  end
-    attr_accessor :sep   ## col_sep (column separator)
-    attr_accessor :na    ## not available (string or array of strings or nil) - rename to nas/nils/nulls - why? why not?
-    attr_accessor :trim        ### allow ltrim/rtrim/trim - why? why not?
-    attr_accessor :blanks
-    attr_accessor :comments
-    attr_accessor :dialect
+  DEFAULT = new( Parser::DEFAULT )
+  RFC4180 = new( Parser::RFC4180 )
+  EXCEL   = new( Parser::EXCEL )
-    def initialize
-      @sep      = ','
-      @blanks   = true
-      @comments = true
-      @trim     = true
-      ## note: do NOT add headers as global - should ALWAYS be explicit
-      ##   headers (true/false) - changes resultset and requires different processing!!!
+  def self.default()  DEFAULT; end    ## alternative alias for DEFAULT
+  def self.rfc4180()  RFC4180; end    ## alternative alias for RFC4180
+  def self.excel()    EXCEL; end      ## alternative alias for EXCEL
-      self  ## return self for chaining
-    end
-    ## strip leading and trailing spaces
-    def trim?() @trim; end
-    ## skip blank lines (with only 1+ spaces)
-    ## note: for now blank lines with no spaces will always get skipped
-    def blanks?() @blanks; end
-    def comments?() @comments; end
-    ## built-in (default) options
-    ##  todo: find a better name?
-    def default_options
-      ## note:
-      ##   do NOT include sep character and
-      ##   do NOT include headers true/false here
-      ##
-      ##  make default sep its own "global" default config
-      ##   e.g. Csv.config.sep =
-      ## common options
-      ##   skip comments starting with #
-      ##   skip blank lines
-      ##   strip leading and trailing spaces
-      ##    NOTE/WARN:  leading and trailing spaces NOT allowed/working with double quoted values!!!!
-      defaults = {
-        blanks:   @blanks,    ## note: skips lines with no whitespaces only!! (e.g. line with space is NOT blank!!)
-        comments: @comments,
-        trim:     @trim
-        ## :converters => :strip
-      }
-      defaults
-    end
-  end # class Configuration
+  #####################
+  ## convenience helpers defaulting to default csv dialect/format reader
+  ##
+  ##   CsvReader.parse_line is the same as
+  ##     CsvReader::DEFAULT.parse_line or CsvReader.default.parse_line
+  ##
+  def self.parse_line( data, sep: nil,
+                             converters: nil )
+     DEFAULT.parse_line( data, sep: sep, converters: converters )
+  end
-  ## lets you use
-  ##   Csv.configure do |config|
-  ##      config.sep = ','   ## or "/t"
-  ##   end
+  def self.parse( data, sep: nil,
+                        converters: nil )
+     DEFAULT.parse( data, sep: sep, converters: converters )
+  end
-  def self.configure
-    yield( config )
+  #### fix!!! remove - replace with parse with (optional) block!!!!!
+  def self.parse_lines( data, sep: nil,
+                              converters: nil, &block )
+     DEFAULT.parse_lines( data, sep: sep, converters: nil, &block )
   end
-  def self.config
-    @config ||= Configuration.new
+  def self.read( path, sep: nil,
+                       converters: nil )
+     DEFAULT.read( path, sep: sep, converters: converters )
   end
-end   # module Csvv
+  def self.header( path, sep: nil )
+     DEFAULT.header( path, sep: sep )
+  end
+  def self.foreach( path, sep: nil,
+                          converters: nil, &block )
+     DEFAULT.foreach( path, sep: sep, converters: converters, &block )
+  end
-####
-## use our own wrapper
-class CsvReader
-  def self.parse_line( txt, sep:        Csv.config.sep,
-                            trim:       Csv.config.trim?,
-                            na:         Csv.config.na,
-                            dialect:    Csv.config.dialect,
-                            converters: nil)
-    ## note: do NOT include headers option (otherwise single row gets skipped as first header row :-)
-    csv_options = Csv.config.default_options.merge(
-                    col_sep: sep
-    )
-    ## pp csv_options
-    Parser.parse_line( txt )  ##, csv_options )
-  end
+  #############################
+  ## all "high-level" reader methods
+  ##
+  ## note: allow "overriding" of separator
+  ##    if sep is not nil otherwise use default dialect/format separator
   ##
   ##  todo/fix: "unify" parse and parse_lines  !!!
   ##    check for block_given? - why? why not?
-  def self.parse( txt, sep: Csv.config.sep )
-    csv_options = Csv.config.default_options.merge(
-                     col_sep: sep
-    )
-    ## pp csv_options
-    Parser.parse( txt )  ###, csv_options )
+  def parse( data, sep: nil, limit: nil,
+                   converters: nil )
+    sep = @parser.config[:sep]  if sep.nil?
+    @parser.parse( data, sep: sep, limit: limit )
+  end
+  #### fix!!! remove - replace with parse with (optional) block!!!!!
+  def parse_lines( data, sep: nil,
+                         converters: nil, &block )
+    sep = @parser.config[:sep]  if sep.nil?
+    @parser.parse_lines( data, sep: sep, &block )
   end
-  def self.parse_lines( txt, sep: Csv.config.sep, &block )
-    csv_options = Csv.config.default_options.merge(
-                     col_sep: sep
-    )
-    ## pp csv_options
-    Parser.parse_lines( txt, &block )  ###, csv_options )
+  def parse_line( data, sep: nil,
+                        converters: nil )
+    records = parse( data, sep: sep, limit: 1 )
+    ## unwrap record if empty return nil - why? why not?
+    ##  return empty record e.g. [] - why? why not?
+    records.size == 0 ? nil : records.first
   end
-  def self.read( path, sep: Csv.config.sep )
+  def read( path, sep: nil,
+                  converters: nil )
     ## note: use our own file.open
     ##   always use utf-8 for now
     ##    check/todo: add skip option bom too - why? why not?
@@ -152,33 +100,26 @@ class CsvReader
     parse( txt, sep: sep )
   end
-  def self.foreach( path, sep: Csv.config.sep, &block )
-    csv_options = Csv.config.default_options.merge(
-                     col_sep: sep
-    )
-    Parser.foreach( path, &block ) ###, csv_options )
+  def foreach( path, sep: nil,
+                     converters: nil, &block )
+    File.open( path, 'r:bom|utf-8' ) do |file|
+      parse_lines( file, sep: sep, &block )
+    end
   end
-  def self.header( path, sep: Csv.config.sep )   ## use header or headers - or use both (with alias)?
-      # read first lines (only)
-      #  and parse with csv to get header from csv library itself
-      #
-      #  check - if there's an easier or built-in way for the csv library
-      ## readlines until
-      ##  - NOT a comments line or
-      ##  - NOT a blank line
+  def header( path, sep: nil )   ## use header or headers - or use both (with alias)?
+     # read first lines (only)
+     #  and parse with csv to get header from csv library itself
      record = nil
      File.open( path, 'r:bom|utf-8' ) do |file|
-        record = Parser.parse_line( file )
+        record = parse_line( file, sep: sep )
      end
-     record  ## todo/fix: return nil for empty - why? why not?
-    end  # method self.header
+     record  ## todo/fix: returns nil for empty - why? why not?
+  end  # method self.header
 end # class CsvReader
@@ -188,13 +129,13 @@ end # class CsvReader
 class CsvHashReader
-def self.parse( txt, sep: Csv.config.sep, headers: nil )
+def self.parse( data, sep: nil, headers: nil )
   ## pass in headers as array e.g. ['A', 'B', 'C']
   names = headers ? headers : nil
   records = []
-  CsvReader.parse_lines( txt ) do |values|     # sep: sep
+  CsvReader.parse_lines( data ) do |values|     # sep: sep
     if names.nil?
       names = values   ## store header row / a.k.a. field/column names
     else
@@ -206,13 +147,13 @@ def self.parse( txt, sep: Csv.config.sep, headers: nil )
 end
-def self.read( path, sep: Csv.config.sep, headers: nil )
+def self.read( path, sep: nil, headers: nil )
   txt = File.open( path, 'r:bom|utf-8' ).read
   parse( txt, sep: sep, headers: headers )
 end
-def self.foreach( path, sep: Csv.config.sep, headers: nil, &block )
+def self.foreach( path, sep: nil, headers: nil, &block )
   ## pass in headers as array e.g. ['A', 'B', 'C']
   names = headers ? headers : nil
@@ -228,7 +169,7 @@ def self.foreach( path, sep: Csv.config.sep, headers: nil, &block )
 end
-def self.header( path, sep: Csv.config.sep )   ## add header too? why? why not?
+def self.header( path, sep: nil )   ## add header too? why? why not?
   ## same as "classic" header method - delegate/reuse :-)
   CsvReader.header( path, sep: sep )
 end

data/lib/csvreader/version.rb CHANGED Viewed

@@ -4,7 +4,7 @@
 class CsvReader   ## note: uses a class for now - change to module - why? why not?
   MAJOR = 0    ## todo: namespace inside version or something - why? why not??
-  MINOR = 4
+  MINOR = 5
   PATCH = 0
   VERSION = [MAJOR,MINOR,PATCH].join('.')

data/test/test_parser.rb CHANGED Viewed

@@ -9,24 +9,38 @@ require 'helper'
 class TestParser < MiniTest::Test
+def setup
+  CsvReader::Parser.logger.level = :debug   ## turn on "global" logging - move to helper - why? why not?
+end
+def parser
+  parser = CsvReader::Parser::DEFAULT
+end
-def test_parse1
-   records = [["a", "b", "c"],
-              ["1", "2", "3"],
-              ["4", "5", "6"]]
-   ## don't care about newlines (\r\n)
-   assert_equal records, CsvReader::Parser.parse( "a,b,c\n1,2,3\n4,5,6" )
-   assert_equal records, CsvReader::Parser.parse( "a,b,c\n1,2,3\n4,5,6\n" )
-   assert_equal records, CsvReader::Parser.parse( "a,b,c\r1,2,3\r4,5,6" )
-   assert_equal records, CsvReader::Parser.parse( "a,b,c\r\n1,2,3\r\n4,5,6\r\n" )
-   ## or leading and trailing spaces
-   assert_equal records, CsvReader::Parser.parse( "    \n a , b , c \n 1,2  ,3 \n 4,5,6   " )
-   assert_equal records, CsvReader::Parser.parse( "\n\na,  b,c   \n  1, 2, 3\n 4, 5, 6" )
-   assert_equal records, CsvReader::Parser.parse( "   \"a\"  , b ,  \"c\"   \n1,  2,\"3\"   \n4,5,  \"6\"" )
-   assert_equal records, CsvReader::Parser.parse( "a, b, c\n1,  2,3\n\n\n4,5,6\n\n\n" )
-   assert_equal records, CsvReader::Parser.parse( " a, b ,c  \n 1 , 2 , 3 \n4,5,6  " )
+def test_parser_default
+  pp CsvReader::Parser::DEFAULT
+  pp CsvReader::Parser.default
+  assert true
+end
+def test_parse
+  records = [["a", "b", "c"],
+             ["1", "2", "3"],
+             ["4", "5", "6"]]
+  ## don't care about newlines (\r\n)
+  assert_equal records, parser.parse( "a,b,c\n1,2,3\n4,5,6" )
+  assert_equal records, parser.parse( "a,b,c\n1,2,3\n4,5,6\n" )
+  assert_equal records, parser.parse( "a,b,c\r1,2,3\r4,5,6" )
+  assert_equal records, parser.parse( "a,b,c\r\n1,2,3\r\n4,5,6\r\n" )
+  ## or leading and trailing spaces
+  assert_equal records, parser.parse( "    \n a , b , c \n 1,2  ,3 \n 4,5,6   " )
+  assert_equal records, parser.parse( "\n\na,  b,c   \n  1, 2, 3\n 4, 5, 6" )
+  assert_equal records, parser.parse( "   \"a\"  , b ,  \"c\"   \n1,  2,\"3\"   \n4,5,  \"6\"" )
+  assert_equal records, parser.parse( "a, b, c\n1,  2,3\n\n\n4,5,6\n\n\n" )
+  assert_equal records, parser.parse( " a, b ,c  \n 1 , 2 , 3 \n4,5,6  " )
 end
@@ -34,19 +48,19 @@ def test_parse_quotes
   records = [["a", "b", "c"],
              ["11 \n 11", "\"2\"", "3"]]
-  assert_equal records, CsvReader::Parser.parse( " a, b ,c  \n\"11 \n 11\", \"\"\"2\"\"\" , 3 \n" )
-  assert_equal records, CsvReader::Parser.parse( "\n\n \"a\", \"b\" ,\"c\"  \n  \"11 \n 11\"  ,  \"\"\"2\"\"\" , 3 \n" )
+  assert_equal records, parser.parse( " a, b ,c  \n\"11 \n 11\", \"\"\"2\"\"\" , 3 \n" )
+  assert_equal records, parser.parse( "\n\n \"a\", \"b\" ,\"c\"  \n  \"11 \n 11\"  ,  \"\"\"2\"\"\" , 3 \n" )
 end
 def test_parse_empties
   records = [["", "", ""]]
-  assert_equal records, CsvReader::Parser.parse( ",," )
-  assert_equal records, CsvReader::Parser.parse( <<TXT )
+  assert_equal records, parser.parse( ",," )
+  assert_equal records, parser.parse( <<TXT )
   "","",""
 TXT
-  assert_equal [], CsvReader::Parser.parse( "" )
+  assert_equal [], parser.parse( "" )
 end
@@ -54,7 +68,7 @@ def test_parse_comments
   records = [["a", "b", "c"],
              ["1", "2", "3"]]
-  assert_equal records, CsvReader::Parser.parse( <<TXT )
+  assert_equal records, parser.parse( <<TXT )
 # comment
 # comment
 ## comment
@@ -64,7 +78,7 @@ a, b, c
 TXT
-  assert_equal records, CsvReader::Parser.parse( <<TXT )
+  assert_equal records, parser.parse( <<TXT )
    a,   b,   c
    1,   2,   3

data/test/test_parser_formats.rb ADDED Viewed

@@ -0,0 +1,69 @@
+# encoding: utf-8
+###
+#  to run use
+#     ruby -I ./lib -I ./test test/test_parser_formats.rb
+require 'helper'
+class TestParserFormats < MiniTest::Test
+def setup
+  CsvReader::Parser.logger.level = :debug   ## turn on "global" logging - move to helper - why? why not?
+end
+def parser
+  CsvReader::Parser
+end
+def test_parse_whitespace
+   records = [["a", "b", "c"],
+              ["1", "2", "3"]]
+   ## don't care about newlines (\r\n) ??? - fix? why? why not?
+   assert_equal records, parser.default.parse( "a,b,c\n1,2,3" )
+   assert_equal records, parser.default.parse( "a,b,c\n1,2,3\n" )
+   assert_equal records, parser.default.parse( " a, b ,c \n\n1,2,3\n" )
+   assert_equal records, parser.default.parse( " a, b ,c \n \n1,2,3\n" )
+   assert_equal [["a", "b", "c"],
+                 [""],
+                 ["1", "2", "3"]], parser.default.parse( %Q{a,b,c\n""\n1,2,3\n} )
+   assert_equal [["", ""],
+                 [""],
+                 ["", "", ""]], parser.default.parse( %Q{,\n""\n"","",""\n} )
+   ## strict rfc4180 - no trim leading or trailing spaces or blank lines
+   assert_equal records,   parser.rfc4180.parse( "a,b,c\n1,2,3" )
+   assert_equal [["a", "b", "c"],
+                 [""],
+                 ["1", "2", "3"]], parser.rfc4180.parse( "a,b,c\n\n1,2,3" )
+   assert_equal [[" a", " b ", "c "],
+                 [""],
+                 ["1", "2", "3"]], parser.rfc4180.parse( " a, b ,c \n\n1,2,3" )
+    assert_equal [[" a", " b ", "c "],
+                  [" "],
+                  ["",""],
+                  ["1", "2", "3"]], parser.rfc4180.parse( " a, b ,c \n \n,\n1,2,3" )
+end
+def test_parse_empties
+    assert_equal [], parser.default.parse( "\n \n \n" )
+    ## strict rfc4180 - no trim leading or trailing spaces or blank lines
+    assert_equal [[""],
+                  [" "],
+                  [" "]], parser.rfc4180.parse( "\n \n \n" )
+    assert_equal [[""],
+                  [" "],
+                  [" "]], parser.rfc4180.parse( "\n \n " )
+    assert_equal [[""]], parser.rfc4180.parse( "\n" )
+    assert_equal [],     parser.rfc4180.parse( "" )
+end
+end # class TestParserFormats

data/test/test_parser_rfc4180.rb ADDED Viewed

@@ -0,0 +1,95 @@
+# encoding: utf-8
+###
+#  to run use
+#     ruby -I ./lib -I ./test test/test_parser_rfc4180.rb
+require 'helper'
+class TestParserRfc4180 < MiniTest::Test
+def setup
+  CsvReader::Parser.logger.level = :debug   ## turn on "global" logging - move to helper - why? why not?
+end
+def parser
+  CsvReader::Parser::RFC4180
+end
+def test_parser_rfc4180
+  pp CsvReader::Parser::RFC4180
+  pp CsvReader::Parser.rfc4180
+  assert true
+end
+def test_parse
+   records = [["a", "b", "c"],
+              ["1", "2", "3"],
+              ["4", "5", "6"]]
+   ## don't care about newlines (\r\n) ??? - fix? why? why not?
+   assert_equal records, parser.parse( "a,b,c\n1,2,3\n4,5,6" )
+   assert_equal records, parser.parse( "a,b,c\n1,2,3\n4,5,6\n" )
+   assert_equal records, parser.parse( "a,b,c\r1,2,3\r4,5,6" )
+   assert_equal records, parser.parse( "a,b,c\r\n1,2,3\r\n4,5,6\r\n" )
+end
+def test_parse_semicolon
+   records = [["a", "b", "c"],
+              ["1", "2", "3"],
+              ["4", "5", "6"]]
+   ## don't care about newlines (\r\n) ??? - fix? why? why not?
+   assert_equal records, parser.parse( "a;b;c\n1;2;3\n4;5;6",         sep: ';' )
+   assert_equal records, parser.parse( "a;b;c\n1;2;3\n4;5;6\n",       sep: ';' )
+   assert_equal records, parser.parse( "a;b;c\r1;2;3\r4;5;6",         sep: ';' )
+   assert_equal records, parser.parse( "a;b;c\r\n1;2;3\r\n4;5;6\r\n", sep: ';' )
+end
+def test_parse_tab
+   records = [["a", "b", "c"],
+              ["1", "2", "3"],
+              ["4", "5", "6"]]
+   ## don't care about newlines (\r\n) ??? - fix? why? why not?
+   assert_equal records, parser.parse( "a\tb\tc\n1\t2\t3\n4\t5\t6",         sep: "\t" )
+   assert_equal records, parser.parse( "a\tb\tc\n1\t2\t3\n4\t5\t6\n",       sep: "\t" )
+   assert_equal records, parser.parse( "a\tb\tc\r1\t2\t3\r4\t5\t6",         sep: "\t" )
+   assert_equal records, parser.parse( "a\tb\tc\r\n1\t2\t3\r\n4\t5\t6\r\n", sep: "\t" )
+end
+def test_parse_empties
+  assert_equal [["","",""],["","",""]], parser.parse( %Q{"","",""\n,,} )
+  parser.config[:quoted_empty] = nil
+  assert_nil       parser.config[:quoted_empty]
+  assert_equal "", parser.config[:unquoted_empty]
+  assert_equal [[nil,nil,nil," "],["","",""," "]], parser.parse( %Q{"","",""," "\n,,, } )
+  parser.config[:unquoted_empty] = nil
+  assert_nil parser.config[:quoted_empty]
+  assert_nil parser.config[:unquoted_empty]
+  assert_equal [[nil,nil,nil," "],[nil,nil,nil," "]], parser.parse( %Q{"","",""," "\n,,, } )
+  ## reset to defaults
+  parser.config[:quoted_empty]   = ""
+  parser.config[:unquoted_empty] = ""
+  assert_equal "", parser.config[:quoted_empty]
+  assert_equal "", parser.config[:unquoted_empty]
+  assert_equal [["","",""],["","",""]], parser.parse( %Q{"","",""\n,,} )
+end
+end # class TestParserRfc4180

data/test/test_reader.rb CHANGED Viewed

@@ -9,6 +9,10 @@ require 'helper'
 class TestReader < MiniTest::Test
+def setup
+  CsvReader::Parser.logger.level = :debug   ## turn on "global" logging - move to helper - why? why not?
+end
 def test_read
   puts "== read: beer.csv:"

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: csvreader
 version: !ruby/object:Gem::Version
-  version: 0.4.0
+  version: 0.5.0
 platform: ruby
 authors:
 - Gerald Bauer
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-08-21 00:00:00.000000000 Z
+date: 2018-09-25 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rdoc
@@ -64,6 +64,8 @@ files:
 - test/data/shakespeare.csv
 - test/helper.rb
 - test/test_parser.rb
+- test/test_parser_formats.rb
+- test/test_parser_rfc4180.rb
 - test/test_reader.rb
 - test/test_reader_hash.rb
 homepage: https://github.com/csv11/csvreader