RubyGems - csvreader - Versions diffs - 1.1.3 → 1.1.4 - Mend

csvreader 1.1.3 → 1.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/Manifest.txt +2 -0
data/README.md +32 -37
data/lib/csvreader/base.rb +8 -2
data/lib/csvreader/parser_table.rb +89 -0
data/lib/csvreader/version.rb +13 -5
data/test/helper.rb +1 -0
data/test/test_parser_table.rb +35 -0
metadata +4 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: a920108ec183cff7c7cad8c0d967390b4f2bd38f
-  data.tar.gz: 2a32715b6e1eb3e83b3837de1d151169d8b3455f
+  metadata.gz: dd541a30e12622b666a01bc18b5ba80d93cbd1aa
+  data.tar.gz: 773aa117ae2b41bfce6a8cde5df1ed1df4eadeab
 SHA512:
-  metadata.gz: f2264455eda5136261628cc77de24494d9ea11bb116c9ca5e36495f4f4b90101356444c9da75c37b6d5b9419b57ce4a145830bd1d6919ce0cbdb2ef05673bfad
-  data.tar.gz: 9c539db1ccac369ae23113587e9d529a95de0b080f3c12687e237d46bea1bdbb157b57f7b2f61d72f637cd914aecb11fbd8daeec11ecb38f49b65669a004e774
+  metadata.gz: b4720613f00b591b3d419f03c06aef8d5e8bb274e4d35e7b1f12f6d14aadb64d1611c7822e3fdc0d2a63c60b653b79a64339aabfd107fb73b14a47bf646c7153
+  data.tar.gz: 3ce20e6925e81e80c6a61e03fd208d852741b7c8d3dd32d045f56b748844782a47810ce041eaa1a8d30f63f282c1655fc6939f7f0ff10294e25ec0be48ede043

data/Manifest.txt CHANGED

@@ -14,6 +14,7 @@ lib/csvreader/parser_json.rb
 lib/csvreader/parser_std.rb
 lib/csvreader/parser_strict.rb
 lib/csvreader/parser_tab.rb
+lib/csvreader/parser_table.rb
 lib/csvreader/reader.rb
 lib/csvreader/reader_hash.rb
 lib/csvreader/version.rb
@@ -40,6 +41,7 @@ test/test_parser_numeric.rb
 test/test_parser_quotes.rb
 test/test_parser_strict.rb
 test/test_parser_tab.rb
+test/test_parser_table.rb
 test/test_reader.rb
 test/test_reader_converters.rb
 test/test_reader_hash.rb

data/README.md CHANGED

@@ -8,8 +8,13 @@
 * forum :: [wwwmake](http://groups.google.com/group/wwwmake)
 ## What's News?
+**v1.1.4**  Added new "classic" table parser (see `ParserTable`) for supporting fields separated by (one or more) spaces
+e.g. `Csv.table.parse( txt )`.
 **v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
 Now you can use both, that is, single (`‹...›'` or `›...‹'`)
 or double (`«...»` or `»...«`).
@@ -38,7 +43,7 @@ for meta data, the first one "wins" - you CANNOT use both.
 **v1.1.0**: Added new fixed width field (fwf) parser (see `ParserFixed`) for supporting fields with fixed width (and no separator)
-e.g.`Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
+e.g. `Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
 **v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
@@ -396,6 +401,32 @@ Hacker-Pschorr Bräu,                      München,   Münchner Dunkel,  5.0%
 Staatliches Hofbräuhaus München,          München,   Hofbräu Oktoberfestbier, 6.3%
 ```
+Or use the ARFF (attribute-relation file format)-like alternative style with  `@`-directives
+inside comments (for easier backwards compatibility with old readers)
+for "meta data" in the header (before any records):
+```
+##########################
+# try with some comments
+#   and blank lines even before @-directives in header
+#
+# @RELATION Beer
+#
+# @ATTRIBUTE Brewery
+# @ATTRIBUTE City
+# @ATTRIBUTE Name
+# @ATTRIBUTE Abv
+Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
+Augustiner Bräu München,München,Edelstoff,5.6%
+Bayerische Staatsbrauerei Weihenstephan,  Freising,  Hefe Weissbier,   5.4%
+Brauerei Spezial,                         Bamberg,   Rauchbier Märzen, 5.1%
+Hacker-Pschorr Bräu,                      München,   Münchner Dunkel,  5.0%
+Staatliches Hofbräuhaus München,          München,   Hofbräu Oktoberfestbier, 6.3%
+```
 ### Q: How can I change the default format / dialect?
@@ -535,42 +566,6 @@ Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )  # or Csv.fix or Csv.f
 ```
-Bonus: If the width is a string (not an array)
-(e.g. `'a8 a8 a32 Z*'` or `'A8 A8 A32 Z*'` and so on)
-than the fixed width field parser
-will use  `String#unpack` and the value of width as its format string spec.
-Example:
-``` ruby
-txt = <<TXT
-12345678123456781234567890123456789012345678901212345678901234
-TXT
-Csv.fixed.parse( txt, width: 'a8 a8 a32 Z*' )  # or Csv.fix or Csv.f
-# => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
-txt = <<TXT
-John    Smith   john@example.com                1-888-555-6666
-Michele O'Reileymichele@example.com             1-333-321-8765
-TXT
-Csv.fixed.parse( txt, width: 'A8 A8 A32 Z*' )   # or Csv.fix or Csv.f
-# => [["John",    "Smith",    "john@example.com",    "1-888-555-6666"],
-#     ["Michele", "O'Reiley", "michele@example.com", "1-333-321-8765"]]
-# and so on
-```
-| String Directive | Returns | Meaning                 |
-|------------------|---------|-------------------------|
-| `A`              | String  | Arbitrary binary string (remove trailing nulls and ASCII spaces) |
-| `a`              | String  | Arbitrary binary string |
-| `Z`              | String  | Null-terminated string  |
-and many more. See the `String#unpack` documentation
-for the complete format spec and directives.

data/lib/csvreader/base.rb CHANGED

@@ -19,6 +19,7 @@ require 'csvreader/parser_strict'   # flexible (strict - no leading/trailing spa
 require 'csvreader/parser_tab'
 require 'csvreader/parser_fixed'
 require 'csvreader/parser_json'
+require 'csvreader/parser_table'
 require 'csvreader/parser'
 require 'csvreader/converter'
 require 'csvreader/reader'
@@ -62,8 +63,8 @@ class Parser
                                                       null: "\\N" )
-  TAB     = ParserTab.new
+  TAB     = ParserTab.new      ## (strict) tab-separated
+  TABLE   = ParserTable.new    ## space-separated e.g /[ \t]+/
   FIXED   = ParserFixed.new
@@ -80,6 +81,7 @@ class Parser
   def self.postgresql_text() POSTGRESQL_TEXT; end
   def self.postgres_text()   postgresql_text; end
   def self.tab()             TAB;             end
+  def self.table()           TABLE;           end
   def self.fixed()           FIXED;           end
   def self.fix()             fixed;           end
   def self.f()               fixed;           end
@@ -103,6 +105,7 @@ class CsvReader
   TAB   = Builder.new( Parser::TAB )
+  TABLE = Builder.new( Parser::TABLE )
   FIXED = Builder.new( Parser::FIXED )
@@ -119,6 +122,7 @@ class CsvReader
   def self.postgresql_text() POSTGRESQL_TEXT; end
   def self.postgres_text()   postgresql_text; end
   def self.tab()             TAB;             end
+  def self.table()           TABLE;           end
   def self.fixed()           FIXED;           end
   def self.fix()             fixed;           end
   def self.f()               fixed;           end
@@ -141,6 +145,7 @@ class CsvHashReader
   TAB   = Builder.new( Parser::TAB )
+  TABLE = Builder.new( Parser::TABLE )
   FIXED = Builder.new( Parser::FIXED )
@@ -157,6 +162,7 @@ class CsvHashReader
   def self.postgresql_text() POSTGRESQL_TEXT; end
   def self.postgres_text()   postgresql_text; end
   def self.tab()             TAB;             end
+  def self.table()           TABLE;           end
   def self.fixed()           FIXED;           end
   def self.fix()             fixed;           end
   def self.f()               fixed;           end

data/lib/csvreader/parser_table.rb ADDED

@@ -0,0 +1,89 @@
+# encoding: utf-8
+class CsvReader
+class ParserTable
+###################################
+## add simple logger with debug flag/switch
+#
+#  use Parser.debug = true   # to turn on
+#
+#  todo/fix: use logutils instead of std logger - why? why not?
+def self.build_logger()
+  l = Logger.new( STDOUT )
+  l.level = :info    ## set to :info on start; note: is 0 (debug) by default
+  l
+end
+def self.logger() @@logger ||= build_logger; end
+def logger()  self.class.logger; end
+def parse( data, **kwargs, &block )
+  ## note: input: required each_line (string or io/file for example)
+  ## note: kwargs NOT used for now (but required for "protocol/interface" by other parsers)
+  input = data   ## assume it's a string or io/file handle
+  if block_given?
+    parse_lines( input, &block )
+  else
+    records = []
+    parse_lines( input ) do |record|
+      records << record
+    end
+    records
+  end
+end ## method parse
+private
+def parse_lines( input, &block )
+  ## note: each line only works with \n (windows) or \r\n (unix)
+  ##   will NOT work with \r (old mac, any others?) only!!!!
+  input.each_line do |line|
+    logger.debug  "line:"             if logger.debug?
+    logger.debug line.pretty_inspect  if logger.debug?
+    ##  note: chomp('') if is an empty string,
+    ##    it will remove all trailing newlines from the string.
+    ##    use line.sub(/[\n\r]*$/, '') or similar instead - why? why not?
+    line = line.chomp( '' )
+    line = line.strip         ## strip leading and trailing whitespaces (space/tab) too
+    logger.debug line.pretty_inspect    if logger.debug?
+    if line.empty?             ## skip blank lines
+      logger.debug "skip blank line"    if logger.debug?
+      next
+    end
+    if line.start_with?( "#" )  ## skip comment lines
+      logger.debug "skip comment line"   if logger.debug?
+      next
+    end
+    # note: string.split defaults to split by space (e.g. /\s+/) :-)
+    #          for  just make it "explicit" with /[ \t]+/
+    values = line.split( /[ \t]+/ )
+    logger.debug values.pretty_inspect   if logger.debug?
+    ## note: requires block - enforce? how? why? why not?
+    block.call( values )
+  end
+end # method parse_lines
+end # class ParserTable
+end # class CsvReader

data/lib/csvreader/version.rb CHANGED

@@ -3,16 +3,24 @@
 class CsvReader   ## note: uses a class for now - change to module - why? why not?
-  MAJOR = 1    ## todo: namespace inside version or something - why? why not??
-  MINOR = 1
-  PATCH = 3
-  VERSION = [MAJOR,MINOR,PATCH].join('.')
+  module Version
+    MAJOR = 1    ## todo: namespace inside version or something - why? why not??
+    MINOR = 1
+    PATCH = 4
+    ## self.to_s  - why? why not?
+  end
+  VERSION = [Version::MAJOR,
+             Version::MINOR,
+             Version::PATCH].join('.')
-  def self.version
+  def self.version   ## keep (as an alternative to VERSION) - why? why not?
     VERSION
   end
   def self.banner
     "csvreader/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}]"
   end

data/test/helper.rb CHANGED

@@ -21,3 +21,4 @@ end
 ## CsvReader::ParserStd.logger.level    = :debug   ## turn on "global" logging
 ## CsvReader::ParserStrict.logger.level = :debug   ## turn on "global" logging
 ## CsvReader::ParserFixed.logger.level = :debug   ## turn on "global" logging
+CsvReader::ParserTable.logger.level = :debug   ## turn on "global" logging

data/test/test_parser_table.rb ADDED

@@ -0,0 +1,35 @@
+# encoding: utf-8
+###
+#  to run use
+#     ruby -I ./lib -I ./test test/test_parser_table.rb
+require 'helper'
+class TestParserTable < MiniTest::Test
+def parser() CsvReader::Parser::TABLE;  end
+def test_contacts
+  records = [["aa", "bbb"],
+             ["cc", "dd", "ee"]]
+  assert_equal records, parser.parse( <<TXT )
+# space-separated with comments and blank lines
+ aa bbb
+cc    dd ee
+TXT
+   assert_equal records, parser.parse( <<TXT )
+ aa bbb
+cc    dd ee
+TXT
+end
+end # class TestParserTable

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: csvreader
 version: !ruby/object:Gem::Version
-  version: 1.1.3
+  version: 1.1.4
 platform: ruby
 authors:
 - Gerald Bauer
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-10-24 00:00:00.000000000 Z
+date: 2018-10-27 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rdoc
@@ -65,6 +65,7 @@ files:
 - lib/csvreader/parser_std.rb
 - lib/csvreader/parser_strict.rb
 - lib/csvreader/parser_tab.rb
+- lib/csvreader/parser_table.rb
 - lib/csvreader/reader.rb
 - lib/csvreader/reader_hash.rb
 - lib/csvreader/version.rb
@@ -91,6 +92,7 @@ files:
 - test/test_parser_quotes.rb
 - test/test_parser_strict.rb
 - test/test_parser_tab.rb
+- test/test_parser_table.rb
 - test/test_reader.rb
 - test/test_reader_converters.rb
 - test/test_reader_hash.rb