csvreader 1.1.3 → 1.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a920108ec183cff7c7cad8c0d967390b4f2bd38f
4
- data.tar.gz: 2a32715b6e1eb3e83b3837de1d151169d8b3455f
3
+ metadata.gz: dd541a30e12622b666a01bc18b5ba80d93cbd1aa
4
+ data.tar.gz: 773aa117ae2b41bfce6a8cde5df1ed1df4eadeab
5
5
  SHA512:
6
- metadata.gz: f2264455eda5136261628cc77de24494d9ea11bb116c9ca5e36495f4f4b90101356444c9da75c37b6d5b9419b57ce4a145830bd1d6919ce0cbdb2ef05673bfad
7
- data.tar.gz: 9c539db1ccac369ae23113587e9d529a95de0b080f3c12687e237d46bea1bdbb157b57f7b2f61d72f637cd914aecb11fbd8daeec11ecb38f49b65669a004e774
6
+ metadata.gz: b4720613f00b591b3d419f03c06aef8d5e8bb274e4d35e7b1f12f6d14aadb64d1611c7822e3fdc0d2a63c60b653b79a64339aabfd107fb73b14a47bf646c7153
7
+ data.tar.gz: 3ce20e6925e81e80c6a61e03fd208d852741b7c8d3dd32d045f56b748844782a47810ce041eaa1a8d30f63f282c1655fc6939f7f0ff10294e25ec0be48ede043
@@ -14,6 +14,7 @@ lib/csvreader/parser_json.rb
14
14
  lib/csvreader/parser_std.rb
15
15
  lib/csvreader/parser_strict.rb
16
16
  lib/csvreader/parser_tab.rb
17
+ lib/csvreader/parser_table.rb
17
18
  lib/csvreader/reader.rb
18
19
  lib/csvreader/reader_hash.rb
19
20
  lib/csvreader/version.rb
@@ -40,6 +41,7 @@ test/test_parser_numeric.rb
40
41
  test/test_parser_quotes.rb
41
42
  test/test_parser_strict.rb
42
43
  test/test_parser_tab.rb
44
+ test/test_parser_table.rb
43
45
  test/test_reader.rb
44
46
  test/test_reader_converters.rb
45
47
  test/test_reader_hash.rb
data/README.md CHANGED
@@ -8,8 +8,13 @@
8
8
  * forum :: [wwwmake](http://groups.google.com/group/wwwmake)
9
9
 
10
10
 
11
+
11
12
  ## What's News?
12
13
 
14
+ **v1.1.4** Added new "classic" table parser (see `ParserTable`) for supporting fields separated by (one or more) spaces
15
+ e.g. `Csv.table.parse( txt )`.
16
+
17
+
13
18
  **v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
14
19
  Now you can use both, that is, single (`‹...›'` or `›...‹'`)
15
20
  or double (`«...»` or `»...«`).
@@ -38,7 +43,7 @@ for meta data, the first one "wins" - you CANNOT use both.
38
43
 
39
44
 
40
45
  **v1.1.0**: Added new fixed width field (fwf) parser (see `ParserFixed`) for supporting fields with fixed width (and no separator)
41
- e.g.`Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
46
+ e.g. `Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
42
47
 
43
48
 
44
49
  **v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
@@ -396,6 +401,32 @@ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
396
401
  Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
397
402
  ```
398
403
 
404
+ Or use the ARFF (attribute-relation file format)-like alternative style with `@`-directives
405
+ inside comments (for easier backwards compatibility with old readers)
406
+ for "meta data" in the header (before any records):
407
+
408
+ ```
409
+ ##########################
410
+ # try with some comments
411
+ # and blank lines even before @-directives in header
412
+ #
413
+ # @RELATION Beer
414
+ #
415
+ # @ATTRIBUTE Brewery
416
+ # @ATTRIBUTE City
417
+ # @ATTRIBUTE Name
418
+ # @ATTRIBUTE Abv
419
+
420
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
421
+ Augustiner Bräu München,München,Edelstoff,5.6%
422
+
423
+ Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
424
+ Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
425
+ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
426
+ Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
427
+ ```
428
+
429
+
399
430
 
400
431
  ### Q: How can I change the default format / dialect?
401
432
 
@@ -535,42 +566,6 @@ Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] ) # or Csv.fix or Csv.f
535
566
  ```
536
567
 
537
568
 
538
- Bonus: If the width is a string (not an array)
539
- (e.g. `'a8 a8 a32 Z*'` or `'A8 A8 A32 Z*'` and so on)
540
- than the fixed width field parser
541
- will use `String#unpack` and the value of width as its format string spec.
542
- Example:
543
-
544
- ``` ruby
545
- txt = <<TXT
546
- 12345678123456781234567890123456789012345678901212345678901234
547
- TXT
548
-
549
- Csv.fixed.parse( txt, width: 'a8 a8 a32 Z*' ) # or Csv.fix or Csv.f
550
- # => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
551
-
552
- txt = <<TXT
553
- John Smith john@example.com 1-888-555-6666
554
- Michele O'Reileymichele@example.com 1-333-321-8765
555
- TXT
556
-
557
- Csv.fixed.parse( txt, width: 'A8 A8 A32 Z*' ) # or Csv.fix or Csv.f
558
- # => [["John", "Smith", "john@example.com", "1-888-555-6666"],
559
- # ["Michele", "O'Reiley", "michele@example.com", "1-333-321-8765"]]
560
-
561
- # and so on
562
- ```
563
-
564
- | String Directive | Returns | Meaning |
565
- |------------------|---------|-------------------------|
566
- | `A` | String | Arbitrary binary string (remove trailing nulls and ASCII spaces) |
567
- | `a` | String | Arbitrary binary string |
568
- | `Z` | String | Null-terminated string |
569
-
570
-
571
- and many more. See the `String#unpack` documentation
572
- for the complete format spec and directives.
573
-
574
569
 
575
570
 
576
571
 
@@ -19,6 +19,7 @@ require 'csvreader/parser_strict' # flexible (strict - no leading/trailing spa
19
19
  require 'csvreader/parser_tab'
20
20
  require 'csvreader/parser_fixed'
21
21
  require 'csvreader/parser_json'
22
+ require 'csvreader/parser_table'
22
23
  require 'csvreader/parser'
23
24
  require 'csvreader/converter'
24
25
  require 'csvreader/reader'
@@ -62,8 +63,8 @@ class Parser
62
63
  null: "\\N" )
63
64
 
64
65
 
65
- TAB = ParserTab.new
66
-
66
+ TAB = ParserTab.new ## (strict) tab-separated
67
+ TABLE = ParserTable.new ## space-separated e.g /[ \t]+/
67
68
  FIXED = ParserFixed.new
68
69
 
69
70
 
@@ -80,6 +81,7 @@ class Parser
80
81
  def self.postgresql_text() POSTGRESQL_TEXT; end
81
82
  def self.postgres_text() postgresql_text; end
82
83
  def self.tab() TAB; end
84
+ def self.table() TABLE; end
83
85
  def self.fixed() FIXED; end
84
86
  def self.fix() fixed; end
85
87
  def self.f() fixed; end
@@ -103,6 +105,7 @@ class CsvReader
103
105
 
104
106
 
105
107
  TAB = Builder.new( Parser::TAB )
108
+ TABLE = Builder.new( Parser::TABLE )
106
109
  FIXED = Builder.new( Parser::FIXED )
107
110
 
108
111
 
@@ -119,6 +122,7 @@ class CsvReader
119
122
  def self.postgresql_text() POSTGRESQL_TEXT; end
120
123
  def self.postgres_text() postgresql_text; end
121
124
  def self.tab() TAB; end
125
+ def self.table() TABLE; end
122
126
  def self.fixed() FIXED; end
123
127
  def self.fix() fixed; end
124
128
  def self.f() fixed; end
@@ -141,6 +145,7 @@ class CsvHashReader
141
145
 
142
146
 
143
147
  TAB = Builder.new( Parser::TAB )
148
+ TABLE = Builder.new( Parser::TABLE )
144
149
  FIXED = Builder.new( Parser::FIXED )
145
150
 
146
151
 
@@ -157,6 +162,7 @@ class CsvHashReader
157
162
  def self.postgresql_text() POSTGRESQL_TEXT; end
158
163
  def self.postgres_text() postgresql_text; end
159
164
  def self.tab() TAB; end
165
+ def self.table() TABLE; end
160
166
  def self.fixed() FIXED; end
161
167
  def self.fix() fixed; end
162
168
  def self.f() fixed; end
@@ -0,0 +1,89 @@
1
+ # encoding: utf-8
2
+
3
+ class CsvReader
4
+
5
+ class ParserTable
6
+
7
+ ###################################
8
+ ## add simple logger with debug flag/switch
9
+ #
10
+ # use Parser.debug = true # to turn on
11
+ #
12
+ # todo/fix: use logutils instead of std logger - why? why not?
13
+
14
+ def self.build_logger()
15
+ l = Logger.new( STDOUT )
16
+ l.level = :info ## set to :info on start; note: is 0 (debug) by default
17
+ l
18
+ end
19
+ def self.logger() @@logger ||= build_logger; end
20
+ def logger() self.class.logger; end
21
+
22
+
23
+
24
+
25
+ def parse( data, **kwargs, &block )
26
+
27
+ ## note: input: required each_line (string or io/file for example)
28
+ ## note: kwargs NOT used for now (but required for "protocol/interface" by other parsers)
29
+
30
+ input = data ## assume it's a string or io/file handle
31
+
32
+ if block_given?
33
+ parse_lines( input, &block )
34
+ else
35
+ records = []
36
+
37
+ parse_lines( input ) do |record|
38
+ records << record
39
+ end
40
+
41
+ records
42
+ end
43
+ end ## method parse
44
+
45
+
46
+
47
+ private
48
+
49
+ def parse_lines( input, &block )
50
+
51
+ ## note: each line only works with \n (windows) or \r\n (unix)
52
+ ## will NOT work with \r (old mac, any others?) only!!!!
53
+ input.each_line do |line|
54
+
55
+ logger.debug "line:" if logger.debug?
56
+ logger.debug line.pretty_inspect if logger.debug?
57
+
58
+
59
+ ## note: chomp('') if is an empty string,
60
+ ## it will remove all trailing newlines from the string.
61
+ ## use line.sub(/[\n\r]*$/, '') or similar instead - why? why not?
62
+ line = line.chomp( '' )
63
+ line = line.strip ## strip leading and trailing whitespaces (space/tab) too
64
+ logger.debug line.pretty_inspect if logger.debug?
65
+
66
+ if line.empty? ## skip blank lines
67
+ logger.debug "skip blank line" if logger.debug?
68
+ next
69
+ end
70
+
71
+ if line.start_with?( "#" ) ## skip comment lines
72
+ logger.debug "skip comment line" if logger.debug?
73
+ next
74
+ end
75
+
76
+ # note: string.split defaults to split by space (e.g. /\s+/) :-)
77
+ # for just make it "explicit" with /[ \t]+/
78
+
79
+ values = line.split( /[ \t]+/ )
80
+ logger.debug values.pretty_inspect if logger.debug?
81
+
82
+ ## note: requires block - enforce? how? why? why not?
83
+ block.call( values )
84
+ end
85
+ end # method parse_lines
86
+
87
+
88
+ end # class ParserTable
89
+ end # class CsvReader
@@ -3,16 +3,24 @@
3
3
 
4
4
  class CsvReader ## note: uses a class for now - change to module - why? why not?
5
5
 
6
- MAJOR = 1 ## todo: namespace inside version or something - why? why not??
7
- MINOR = 1
8
- PATCH = 3
9
- VERSION = [MAJOR,MINOR,PATCH].join('.')
6
+ module Version
7
+ MAJOR = 1 ## todo: namespace inside version or something - why? why not??
8
+ MINOR = 1
9
+ PATCH = 4
10
10
 
11
+ ## self.to_s - why? why not?
12
+ end
13
+
14
+ VERSION = [Version::MAJOR,
15
+ Version::MINOR,
16
+ Version::PATCH].join('.')
11
17
 
12
- def self.version
18
+ def self.version ## keep (as an alternative to VERSION) - why? why not?
13
19
  VERSION
14
20
  end
15
21
 
22
+
23
+
16
24
  def self.banner
17
25
  "csvreader/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}]"
18
26
  end
@@ -21,3 +21,4 @@ end
21
21
  ## CsvReader::ParserStd.logger.level = :debug ## turn on "global" logging
22
22
  ## CsvReader::ParserStrict.logger.level = :debug ## turn on "global" logging
23
23
  ## CsvReader::ParserFixed.logger.level = :debug ## turn on "global" logging
24
+ CsvReader::ParserTable.logger.level = :debug ## turn on "global" logging
@@ -0,0 +1,35 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_parser_table.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+ class TestParserTable < MiniTest::Test
11
+
12
+
13
+ def parser() CsvReader::Parser::TABLE; end
14
+
15
+
16
+ def test_contacts
17
+ records = [["aa", "bbb"],
18
+ ["cc", "dd", "ee"]]
19
+
20
+ assert_equal records, parser.parse( <<TXT )
21
+ # space-separated with comments and blank lines
22
+
23
+ aa bbb
24
+ cc dd ee
25
+
26
+ TXT
27
+
28
+ assert_equal records, parser.parse( <<TXT )
29
+ aa bbb
30
+ cc dd ee
31
+ TXT
32
+ end
33
+
34
+
35
+ end # class TestParserTable
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csvreader
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.3
4
+ version: 1.1.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gerald Bauer
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-10-24 00:00:00.000000000 Z
11
+ date: 2018-10-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rdoc
@@ -65,6 +65,7 @@ files:
65
65
  - lib/csvreader/parser_std.rb
66
66
  - lib/csvreader/parser_strict.rb
67
67
  - lib/csvreader/parser_tab.rb
68
+ - lib/csvreader/parser_table.rb
68
69
  - lib/csvreader/reader.rb
69
70
  - lib/csvreader/reader_hash.rb
70
71
  - lib/csvreader/version.rb
@@ -91,6 +92,7 @@ files:
91
92
  - test/test_parser_quotes.rb
92
93
  - test/test_parser_strict.rb
93
94
  - test/test_parser_tab.rb
95
+ - test/test_parser_table.rb
94
96
  - test/test_reader.rb
95
97
  - test/test_reader_converters.rb
96
98
  - test/test_reader_hash.rb