RubyGems - csvreader - Versions diffs - 1.2.1 → 1.2.2 - Mend

csvreader 1.2.1 → 1.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: e61cda6f5b0fae762451efa0b0819e53b6da9966
-  data.tar.gz: e6dadbde1d714247046603fbdcb1fbd348cacc4c
+  metadata.gz: a8e2f4f6e06ec63483735c1e0966b61398df85eb
+  data.tar.gz: 6c867acfa43c261473b6d6300e3ecd8d7042f0dc
 SHA512:
-  metadata.gz: 6494cb0052000592cff4766946c3b7db0ec026db220f2e3857563d6070f282089034a8405c3eb4d8807f1a2dbe4cce67bd789b2794f803765f5fe9702d62a856
-  data.tar.gz: b47bd4cc6a342c5cc5e01e5ec1a67c7ae2fc3a7fc6fb4c1bcbcac018ab0291c6af076444aafb7ecac2f83ba814a502366583bbb61fa36c961e7009b217a3819b
+  metadata.gz: 2ddc944ee42de5660c68e057d0cbdbdf81b0f135b603e87231da2258bbd9001fc2483881e1c0ff15d5d7b23d2cb9f1b2c799a77a3551dbaee18094ba9aad5086
+  data.tar.gz: 39743a7df8b49b45a9ad1dd7ce8598616733348ecdbc52b94de4e8fc1fb7a54e33d0cbc36f3750a5940c9d46310d94a4c6d983811fb506efa2349bb0adc94d16

data/Manifest.txt CHANGED

@@ -32,6 +32,7 @@ test/helper.rb
 test/test_buffer.rb
 test/test_converter.rb
 test/test_parser.rb
+test/test_parser_autofix.rb
 test/test_parser_directive.rb
 test/test_parser_fixed.rb
 test/test_parser_formats.rb

data/README.md CHANGED

@@ -12,10 +12,16 @@
 ## What's News?
+**v1.2.2** Added auto-fix/correction/recovery
+for double quoted value with extra trailing value
+to the default parser (`ParserStd`) e.g. `"Freddy" Mercury`
+will get read "as is" and turned
+into an "unquoted" value with "literal" quotes e.g. `"Freddy" Mercury`.
-**v1.2.1** Added support for (optional) hashtag to the
+**v1.2.1** Added support for (optional) hashtag to the
 to the default parser (`ParserStd`) for
-supporting the [Humanitarian eXchange Language (HXL)](http://hxlstandard.org).
+supporting the [Humanitarian eXchange Language (HXL)](https://github.com/csvspecs/csv-hxl).
 Default is turned off (`false`). Use `Csv.human`
 or `Csv.hum` or `Csv.hxl` for pre-defined with hashtag turned on.
@@ -53,7 +59,7 @@ With the "strict" parser you will get a firework of "stray" quote errors / excep
 **v1.1.1**: Added built-in support for (optional) alternative comments (`%`) - used by
-[ARFF (attribute-relation file format)](https://waikato.github.io/weka-wiki/arff/) -
+[ARFF (attribute-relation file format)](https://github.com/csvspecs/csv-meta#attribute-relation-classic) -
 and support for (optional) directives (`@`) in header (that is, before any records)
 to default parser ("The Right Way").
 Now you can use either `#` or `%` for comments, the first one "wins" - you CANNOT use both.
@@ -68,13 +74,12 @@ e.g. `Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
 **v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
 in header (that is, before any records)
-to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](http://csvy.org).
+to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](https://github.com/csvspecs/csv-meta#front-matter-in-yaml).
 Use `Csv.parser.meta` to get the parsed meta data block hash (or `nil`) if none.
 ## Usage

data/lib/csvreader/parser_std.rb CHANGED

@@ -197,6 +197,26 @@ def parse_quote( input, sep:, opening_quote:, closing_quote:)
 end
+def parse_field_until_sep( input, sep: )
+  value = ""
+  logger.debug "start reg field - peek >#{input.peek}< (#{input.peek.ord})"  if logger.debug?
+  ## consume simple value
+  ##   until we hit "," or "\n" or "\r"
+  ##    note: will eat-up quotes too!!!
+  while (c=input.peek; !(c==sep || c==LF || c==CR || input.eof?))
+    if input.peek == BACKSLASH
+      value << parse_escape( input, sep: sep )
+    else
+      logger.debug "  add char >#{input.peek}< (#{input.peek.ord})"  if logger.debug?
+      value << input.getc   ## note: eat-up all spaces (" ") and tabs (\t) too (strip trailing spaces at the end)
+    end
+  end
+  ##  note: only strip **trailing** spaces (space and tab only)
+  ##    do NOT strip newlines etc. might have been added via escape! e.g. \\\n
+  value = value.sub( /[ \t]+$/, '' )
+  value
+end
 def parse_field( input, sep: )
@@ -226,7 +246,23 @@ def parse_field( input, sep: )
                                  closing_quote: DOUBLE_QUOTE )
     ## note: always eat-up all trailing spaces (" ") and tabs (\t)
-    skip_spaces( input )
+    spaces_count = skip_spaces( input )
+    ##  check for auto-fix trailing data after quoted value e.g. ---,"Fredy" Mercury,---
+    ##   todo/fix: add auto-fix for all quote variants!!!!!!!!!!!!!!!!!!!!
+    if (c=input.peek; c==sep || c==LF || c==CR || input.eof?)
+       ## everything ok (that is, regular quoted value)!!!
+    else
+      ## try auto-fix
+      ##   todo: report warning/issue error (if configured)!!!
+      extra_value = parse_field_until_sep( input, sep: sep )
+      ## "reconstruct" non-quoted value
+      spaces = ' ' * spaces_count   ## todo: preserve tab (\t) - why? why not?
+      ## note: minor (theoratical) issue (doubled quoted got "collapsed/escaped" to one from two in quoted value)
+      ##    e.g. "hello """ extra,  (becomes)=>  "hello "" extra (one quote less/"eaten up")
+      value = %Q{"#{value}"#{spaces}#{extra_value}}
+    end
     logger.debug "end double_quote field - peek >#{input.peek}< (#{input.peek.ord})"  if logger.debug?
   elsif input.peek == SINGLE_QUOTE    ## allow single quote too (by default)
     logger.debug "start single_quote field - peek >#{input.peek}< (#{input.peek.ord})"  if logger.debug?

data/lib/csvreader/version.rb CHANGED

@@ -6,7 +6,7 @@ class CsvReader   ## note: uses a class for now - change to module - why? why no
   module Version
     MAJOR = 1    ## todo: namespace inside version or something - why? why not??
     MINOR = 2
-    PATCH = 1
+    PATCH = 2
     ## self.to_s  - why? why not?
   end

data/test/test_parser_autofix.rb ADDED

@@ -0,0 +1,28 @@
+# encoding: utf-8
+###
+#  to run use
+#     ruby -I ./lib -I ./test test/test_parser_autofix.rb
+require 'helper'
+class TestParserAutofix < MiniTest::Test
+def parser
+  CsvReader::Parser::DEFAULT
+end
+def test_quote_with_trailing_value
+  recs = [[ "Farrokh", "\"Freddy\" Mercury", "Bulsara" ]]
+  assert_equal recs, parser.parse( %Q{Farrokh,"Freddy" Mercury,Bulsara} )
+  assert_equal recs, parser.parse( %Q{  Farrokh , "Freddy" Mercury  , Bulsara } )
+  assert_equal recs, parser.parse( %Q{Farrokh,  "Freddy" Mercury   ,Bulsara} )
+end
+end # class TestParserAutofix

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: csvreader
 version: !ruby/object:Gem::Version
-  version: 1.2.1
+  version: 1.2.2
 platform: ruby
 authors:
 - Gerald Bauer
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-11-06 00:00:00.000000000 Z
+date: 2018-11-19 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rdoc
@@ -83,6 +83,7 @@ files:
 - test/test_buffer.rb
 - test/test_converter.rb
 - test/test_parser.rb
+- test/test_parser_autofix.rb
 - test/test_parser_directive.rb
 - test/test_parser_fixed.rb
 - test/test_parser_formats.rb