csvreader 1.1.3 → 1.1.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a920108ec183cff7c7cad8c0d967390b4f2bd38f
4
- data.tar.gz: 2a32715b6e1eb3e83b3837de1d151169d8b3455f
3
+ metadata.gz: dd541a30e12622b666a01bc18b5ba80d93cbd1aa
4
+ data.tar.gz: 773aa117ae2b41bfce6a8cde5df1ed1df4eadeab
5
5
  SHA512:
6
- metadata.gz: f2264455eda5136261628cc77de24494d9ea11bb116c9ca5e36495f4f4b90101356444c9da75c37b6d5b9419b57ce4a145830bd1d6919ce0cbdb2ef05673bfad
7
- data.tar.gz: 9c539db1ccac369ae23113587e9d529a95de0b080f3c12687e237d46bea1bdbb157b57f7b2f61d72f637cd914aecb11fbd8daeec11ecb38f49b65669a004e774
6
+ metadata.gz: b4720613f00b591b3d419f03c06aef8d5e8bb274e4d35e7b1f12f6d14aadb64d1611c7822e3fdc0d2a63c60b653b79a64339aabfd107fb73b14a47bf646c7153
7
+ data.tar.gz: 3ce20e6925e81e80c6a61e03fd208d852741b7c8d3dd32d045f56b748844782a47810ce041eaa1a8d30f63f282c1655fc6939f7f0ff10294e25ec0be48ede043
@@ -14,6 +14,7 @@ lib/csvreader/parser_json.rb
14
14
  lib/csvreader/parser_std.rb
15
15
  lib/csvreader/parser_strict.rb
16
16
  lib/csvreader/parser_tab.rb
17
+ lib/csvreader/parser_table.rb
17
18
  lib/csvreader/reader.rb
18
19
  lib/csvreader/reader_hash.rb
19
20
  lib/csvreader/version.rb
@@ -40,6 +41,7 @@ test/test_parser_numeric.rb
40
41
  test/test_parser_quotes.rb
41
42
  test/test_parser_strict.rb
42
43
  test/test_parser_tab.rb
44
+ test/test_parser_table.rb
43
45
  test/test_reader.rb
44
46
  test/test_reader_converters.rb
45
47
  test/test_reader_hash.rb
data/README.md CHANGED
@@ -8,8 +8,13 @@
8
8
  * forum :: [wwwmake](http://groups.google.com/group/wwwmake)
9
9
 
10
10
 
11
+
11
12
  ## What's News?
12
13
 
14
+ **v1.1.4** Added new "classic" table parser (see `ParserTable`) for supporting fields separated by (one or more) spaces
15
+ e.g. `Csv.table.parse( txt )`.
16
+
17
+
13
18
  **v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
14
19
  Now you can use both, that is, single (`‹...›'` or `›...‹'`)
15
20
  or double (`«...»` or `»...«`).
@@ -38,7 +43,7 @@ for meta data, the first one "wins" - you CANNOT use both.
38
43
 
39
44
 
40
45
  **v1.1.0**: Added new fixed width field (fwf) parser (see `ParserFixed`) for supporting fields with fixed width (and no separator)
41
- e.g.`Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
46
+ e.g. `Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
42
47
 
43
48
 
44
49
  **v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
@@ -396,6 +401,32 @@ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
396
401
  Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
397
402
  ```
398
403
 
404
+ Or use the ARFF (attribute-relation file format)-like alternative style with `@`-directives
405
+ inside comments (for easier backwards compatibility with old readers)
406
+ for "meta data" in the header (before any records):
407
+
408
+ ```
409
+ ##########################
410
+ # try with some comments
411
+ # and blank lines even before @-directives in header
412
+ #
413
+ # @RELATION Beer
414
+ #
415
+ # @ATTRIBUTE Brewery
416
+ # @ATTRIBUTE City
417
+ # @ATTRIBUTE Name
418
+ # @ATTRIBUTE Abv
419
+
420
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
421
+ Augustiner Bräu München,München,Edelstoff,5.6%
422
+
423
+ Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
424
+ Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
425
+ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
426
+ Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
427
+ ```
428
+
429
+
399
430
 
400
431
  ### Q: How can I change the default format / dialect?
401
432
 
@@ -535,42 +566,6 @@ Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] ) # or Csv.fix or Csv.f
535
566
  ```
536
567
 
537
568
 
538
- Bonus: If the width is a string (not an array)
539
- (e.g. `'a8 a8 a32 Z*'` or `'A8 A8 A32 Z*'` and so on)
540
- than the fixed width field parser
541
- will use `String#unpack` and the value of width as its format string spec.
542
- Example:
543
-
544
- ``` ruby
545
- txt = <<TXT
546
- 12345678123456781234567890123456789012345678901212345678901234
547
- TXT
548
-
549
- Csv.fixed.parse( txt, width: 'a8 a8 a32 Z*' ) # or Csv.fix or Csv.f
550
- # => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
551
-
552
- txt = <<TXT
553
- John Smith john@example.com 1-888-555-6666
554
- Michele O'Reileymichele@example.com 1-333-321-8765
555
- TXT
556
-
557
- Csv.fixed.parse( txt, width: 'A8 A8 A32 Z*' ) # or Csv.fix or Csv.f
558
- # => [["John", "Smith", "john@example.com", "1-888-555-6666"],
559
- # ["Michele", "O'Reiley", "michele@example.com", "1-333-321-8765"]]
560
-
561
- # and so on
562
- ```
563
-
564
- | String Directive | Returns | Meaning |
565
- |------------------|---------|-------------------------|
566
- | `A` | String | Arbitrary binary string (remove trailing nulls and ASCII spaces) |
567
- | `a` | String | Arbitrary binary string |
568
- | `Z` | String | Null-terminated string |
569
-
570
-
571
- and many more. See the `String#unpack` documentation
572
- for the complete format spec and directives.
573
-
574
569
 
575
570
 
576
571
 
@@ -19,6 +19,7 @@ require 'csvreader/parser_strict' # flexible (strict - no leading/trailing spa
19
19
  require 'csvreader/parser_tab'
20
20
  require 'csvreader/parser_fixed'
21
21
  require 'csvreader/parser_json'
22
+ require 'csvreader/parser_table'
22
23
  require 'csvreader/parser'
23
24
  require 'csvreader/converter'
24
25
  require 'csvreader/reader'
@@ -62,8 +63,8 @@ class Parser
62
63
  null: "\\N" )
63
64
 
64
65
 
65
- TAB = ParserTab.new
66
-
66
+ TAB = ParserTab.new ## (strict) tab-separated
67
+ TABLE = ParserTable.new ## space-separated e.g /[ \t]+/
67
68
  FIXED = ParserFixed.new
68
69
 
69
70
 
@@ -80,6 +81,7 @@ class Parser
80
81
  def self.postgresql_text() POSTGRESQL_TEXT; end
81
82
  def self.postgres_text() postgresql_text; end
82
83
  def self.tab() TAB; end
84
+ def self.table() TABLE; end
83
85
  def self.fixed() FIXED; end
84
86
  def self.fix() fixed; end
85
87
  def self.f() fixed; end
@@ -103,6 +105,7 @@ class CsvReader
103
105
 
104
106
 
105
107
  TAB = Builder.new( Parser::TAB )
108
+ TABLE = Builder.new( Parser::TABLE )
106
109
  FIXED = Builder.new( Parser::FIXED )
107
110
 
108
111
 
@@ -119,6 +122,7 @@ class CsvReader
119
122
  def self.postgresql_text() POSTGRESQL_TEXT; end
120
123
  def self.postgres_text() postgresql_text; end
121
124
  def self.tab() TAB; end
125
+ def self.table() TABLE; end
122
126
  def self.fixed() FIXED; end
123
127
  def self.fix() fixed; end
124
128
  def self.f() fixed; end
@@ -141,6 +145,7 @@ class CsvHashReader
141
145
 
142
146
 
143
147
  TAB = Builder.new( Parser::TAB )
148
+ TABLE = Builder.new( Parser::TABLE )
144
149
  FIXED = Builder.new( Parser::FIXED )
145
150
 
146
151
 
@@ -157,6 +162,7 @@ class CsvHashReader
157
162
  def self.postgresql_text() POSTGRESQL_TEXT; end
158
163
  def self.postgres_text() postgresql_text; end
159
164
  def self.tab() TAB; end
165
+ def self.table() TABLE; end
160
166
  def self.fixed() FIXED; end
161
167
  def self.fix() fixed; end
162
168
  def self.f() fixed; end
@@ -0,0 +1,89 @@
1
+ # encoding: utf-8
2
+
3
+ class CsvReader
4
+
5
+ class ParserTable
6
+
7
+ ###################################
8
+ ## add simple logger with debug flag/switch
9
+ #
10
+ # use Parser.debug = true # to turn on
11
+ #
12
+ # todo/fix: use logutils instead of std logger - why? why not?
13
+
14
+ def self.build_logger()
15
+ l = Logger.new( STDOUT )
16
+ l.level = :info ## set to :info on start; note: is 0 (debug) by default
17
+ l
18
+ end
19
+ def self.logger() @@logger ||= build_logger; end
20
+ def logger() self.class.logger; end
21
+
22
+
23
+
24
+
25
+ def parse( data, **kwargs, &block )
26
+
27
+ ## note: input: required each_line (string or io/file for example)
28
+ ## note: kwargs NOT used for now (but required for "protocol/interface" by other parsers)
29
+
30
+ input = data ## assume it's a string or io/file handle
31
+
32
+ if block_given?
33
+ parse_lines( input, &block )
34
+ else
35
+ records = []
36
+
37
+ parse_lines( input ) do |record|
38
+ records << record
39
+ end
40
+
41
+ records
42
+ end
43
+ end ## method parse
44
+
45
+
46
+
47
+ private
48
+
49
+ def parse_lines( input, &block )
50
+
51
+ ## note: each line only works with \n (windows) or \r\n (unix)
52
+ ## will NOT work with \r (old mac, any others?) only!!!!
53
+ input.each_line do |line|
54
+
55
+ logger.debug "line:" if logger.debug?
56
+ logger.debug line.pretty_inspect if logger.debug?
57
+
58
+
59
+ ## note: chomp('') if is an empty string,
60
+ ## it will remove all trailing newlines from the string.
61
+ ## use line.sub(/[\n\r]*$/, '') or similar instead - why? why not?
62
+ line = line.chomp( '' )
63
+ line = line.strip ## strip leading and trailing whitespaces (space/tab) too
64
+ logger.debug line.pretty_inspect if logger.debug?
65
+
66
+ if line.empty? ## skip blank lines
67
+ logger.debug "skip blank line" if logger.debug?
68
+ next
69
+ end
70
+
71
+ if line.start_with?( "#" ) ## skip comment lines
72
+ logger.debug "skip comment line" if logger.debug?
73
+ next
74
+ end
75
+
76
+ # note: string.split defaults to split by space (e.g. /\s+/) :-)
77
+ # for just make it "explicit" with /[ \t]+/
78
+
79
+ values = line.split( /[ \t]+/ )
80
+ logger.debug values.pretty_inspect if logger.debug?
81
+
82
+ ## note: requires block - enforce? how? why? why not?
83
+ block.call( values )
84
+ end
85
+ end # method parse_lines
86
+
87
+
88
+ end # class ParserTable
89
+ end # class CsvReader
@@ -3,16 +3,24 @@
3
3
 
4
4
  class CsvReader ## note: uses a class for now - change to module - why? why not?
5
5
 
6
- MAJOR = 1 ## todo: namespace inside version or something - why? why not??
7
- MINOR = 1
8
- PATCH = 3
9
- VERSION = [MAJOR,MINOR,PATCH].join('.')
6
+ module Version
7
+ MAJOR = 1 ## todo: namespace inside version or something - why? why not??
8
+ MINOR = 1
9
+ PATCH = 4
10
10
 
11
+ ## self.to_s - why? why not?
12
+ end
13
+
14
+ VERSION = [Version::MAJOR,
15
+ Version::MINOR,
16
+ Version::PATCH].join('.')
11
17
 
12
- def self.version
18
+ def self.version ## keep (as an alternative to VERSION) - why? why not?
13
19
  VERSION
14
20
  end
15
21
 
22
+
23
+
16
24
  def self.banner
17
25
  "csvreader/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}]"
18
26
  end
@@ -21,3 +21,4 @@ end
21
21
  ## CsvReader::ParserStd.logger.level = :debug ## turn on "global" logging
22
22
  ## CsvReader::ParserStrict.logger.level = :debug ## turn on "global" logging
23
23
  ## CsvReader::ParserFixed.logger.level = :debug ## turn on "global" logging
24
+ CsvReader::ParserTable.logger.level = :debug ## turn on "global" logging
@@ -0,0 +1,35 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_parser_table.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+ class TestParserTable < MiniTest::Test
11
+
12
+
13
+ def parser() CsvReader::Parser::TABLE; end
14
+
15
+
16
+ def test_contacts
17
+ records = [["aa", "bbb"],
18
+ ["cc", "dd", "ee"]]
19
+
20
+ assert_equal records, parser.parse( <<TXT )
21
+ # space-separated with comments and blank lines
22
+
23
+ aa bbb
24
+ cc dd ee
25
+
26
+ TXT
27
+
28
+ assert_equal records, parser.parse( <<TXT )
29
+ aa bbb
30
+ cc dd ee
31
+ TXT
32
+ end
33
+
34
+
35
+ end # class TestParserTable
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csvreader
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.3
4
+ version: 1.1.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gerald Bauer
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-10-24 00:00:00.000000000 Z
11
+ date: 2018-10-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rdoc
@@ -65,6 +65,7 @@ files:
65
65
  - lib/csvreader/parser_std.rb
66
66
  - lib/csvreader/parser_strict.rb
67
67
  - lib/csvreader/parser_tab.rb
68
+ - lib/csvreader/parser_table.rb
68
69
  - lib/csvreader/reader.rb
69
70
  - lib/csvreader/reader_hash.rb
70
71
  - lib/csvreader/version.rb
@@ -91,6 +92,7 @@ files:
91
92
  - test/test_parser_quotes.rb
92
93
  - test/test_parser_strict.rb
93
94
  - test/test_parser_tab.rb
95
+ - test/test_parser_table.rb
94
96
  - test/test_reader.rb
95
97
  - test/test_reader_converters.rb
96
98
  - test/test_reader_hash.rb