csvreader 1.1.3 → 1.1.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Manifest.txt +2 -0
- data/README.md +32 -37
- data/lib/csvreader/base.rb +8 -2
- data/lib/csvreader/parser_table.rb +89 -0
- data/lib/csvreader/version.rb +13 -5
- data/test/helper.rb +1 -0
- data/test/test_parser_table.rb +35 -0
- metadata +4 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: dd541a30e12622b666a01bc18b5ba80d93cbd1aa
|
4
|
+
data.tar.gz: 773aa117ae2b41bfce6a8cde5df1ed1df4eadeab
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b4720613f00b591b3d419f03c06aef8d5e8bb274e4d35e7b1f12f6d14aadb64d1611c7822e3fdc0d2a63c60b653b79a64339aabfd107fb73b14a47bf646c7153
|
7
|
+
data.tar.gz: 3ce20e6925e81e80c6a61e03fd208d852741b7c8d3dd32d045f56b748844782a47810ce041eaa1a8d30f63f282c1655fc6939f7f0ff10294e25ec0be48ede043
|
data/Manifest.txt
CHANGED
@@ -14,6 +14,7 @@ lib/csvreader/parser_json.rb
|
|
14
14
|
lib/csvreader/parser_std.rb
|
15
15
|
lib/csvreader/parser_strict.rb
|
16
16
|
lib/csvreader/parser_tab.rb
|
17
|
+
lib/csvreader/parser_table.rb
|
17
18
|
lib/csvreader/reader.rb
|
18
19
|
lib/csvreader/reader_hash.rb
|
19
20
|
lib/csvreader/version.rb
|
@@ -40,6 +41,7 @@ test/test_parser_numeric.rb
|
|
40
41
|
test/test_parser_quotes.rb
|
41
42
|
test/test_parser_strict.rb
|
42
43
|
test/test_parser_tab.rb
|
44
|
+
test/test_parser_table.rb
|
43
45
|
test/test_reader.rb
|
44
46
|
test/test_reader_converters.rb
|
45
47
|
test/test_reader_hash.rb
|
data/README.md
CHANGED
@@ -8,8 +8,13 @@
|
|
8
8
|
* forum :: [wwwmake](http://groups.google.com/group/wwwmake)
|
9
9
|
|
10
10
|
|
11
|
+
|
11
12
|
## What's News?
|
12
13
|
|
14
|
+
**v1.1.4** Added new "classic" table parser (see `ParserTable`) for supporting fields separated by (one or more) spaces
|
15
|
+
e.g. `Csv.table.parse( txt )`.
|
16
|
+
|
17
|
+
|
13
18
|
**v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
|
14
19
|
Now you can use both, that is, single (`‹...›'` or `›...‹'`)
|
15
20
|
or double (`«...»` or `»...«`).
|
@@ -38,7 +43,7 @@ for meta data, the first one "wins" - you CANNOT use both.
|
|
38
43
|
|
39
44
|
|
40
45
|
**v1.1.0**: Added new fixed width field (fwf) parser (see `ParserFixed`) for supporting fields with fixed width (and no separator)
|
41
|
-
e.g
|
46
|
+
e.g. `Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
|
42
47
|
|
43
48
|
|
44
49
|
**v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
|
@@ -396,6 +401,32 @@ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
|
|
396
401
|
Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
|
397
402
|
```
|
398
403
|
|
404
|
+
Or use the ARFF (attribute-relation file format)-like alternative style with `@`-directives
|
405
|
+
inside comments (for easier backwards compatibility with old readers)
|
406
|
+
for "meta data" in the header (before any records):
|
407
|
+
|
408
|
+
```
|
409
|
+
##########################
|
410
|
+
# try with some comments
|
411
|
+
# and blank lines even before @-directives in header
|
412
|
+
#
|
413
|
+
# @RELATION Beer
|
414
|
+
#
|
415
|
+
# @ATTRIBUTE Brewery
|
416
|
+
# @ATTRIBUTE City
|
417
|
+
# @ATTRIBUTE Name
|
418
|
+
# @ATTRIBUTE Abv
|
419
|
+
|
420
|
+
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
|
421
|
+
Augustiner Bräu München,München,Edelstoff,5.6%
|
422
|
+
|
423
|
+
Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
|
424
|
+
Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
|
425
|
+
Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
|
426
|
+
Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
|
427
|
+
```
|
428
|
+
|
429
|
+
|
399
430
|
|
400
431
|
### Q: How can I change the default format / dialect?
|
401
432
|
|
@@ -535,42 +566,6 @@ Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] ) # or Csv.fix or Csv.f
|
|
535
566
|
```
|
536
567
|
|
537
568
|
|
538
|
-
Bonus: If the width is a string (not an array)
|
539
|
-
(e.g. `'a8 a8 a32 Z*'` or `'A8 A8 A32 Z*'` and so on)
|
540
|
-
than the fixed width field parser
|
541
|
-
will use `String#unpack` and the value of width as its format string spec.
|
542
|
-
Example:
|
543
|
-
|
544
|
-
``` ruby
|
545
|
-
txt = <<TXT
|
546
|
-
12345678123456781234567890123456789012345678901212345678901234
|
547
|
-
TXT
|
548
|
-
|
549
|
-
Csv.fixed.parse( txt, width: 'a8 a8 a32 Z*' ) # or Csv.fix or Csv.f
|
550
|
-
# => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
|
551
|
-
|
552
|
-
txt = <<TXT
|
553
|
-
John Smith john@example.com 1-888-555-6666
|
554
|
-
Michele O'Reileymichele@example.com 1-333-321-8765
|
555
|
-
TXT
|
556
|
-
|
557
|
-
Csv.fixed.parse( txt, width: 'A8 A8 A32 Z*' ) # or Csv.fix or Csv.f
|
558
|
-
# => [["John", "Smith", "john@example.com", "1-888-555-6666"],
|
559
|
-
# ["Michele", "O'Reiley", "michele@example.com", "1-333-321-8765"]]
|
560
|
-
|
561
|
-
# and so on
|
562
|
-
```
|
563
|
-
|
564
|
-
| String Directive | Returns | Meaning |
|
565
|
-
|------------------|---------|-------------------------|
|
566
|
-
| `A` | String | Arbitrary binary string (remove trailing nulls and ASCII spaces) |
|
567
|
-
| `a` | String | Arbitrary binary string |
|
568
|
-
| `Z` | String | Null-terminated string |
|
569
|
-
|
570
|
-
|
571
|
-
and many more. See the `String#unpack` documentation
|
572
|
-
for the complete format spec and directives.
|
573
|
-
|
574
569
|
|
575
570
|
|
576
571
|
|
data/lib/csvreader/base.rb
CHANGED
@@ -19,6 +19,7 @@ require 'csvreader/parser_strict' # flexible (strict - no leading/trailing spa
|
|
19
19
|
require 'csvreader/parser_tab'
|
20
20
|
require 'csvreader/parser_fixed'
|
21
21
|
require 'csvreader/parser_json'
|
22
|
+
require 'csvreader/parser_table'
|
22
23
|
require 'csvreader/parser'
|
23
24
|
require 'csvreader/converter'
|
24
25
|
require 'csvreader/reader'
|
@@ -62,8 +63,8 @@ class Parser
|
|
62
63
|
null: "\\N" )
|
63
64
|
|
64
65
|
|
65
|
-
TAB = ParserTab.new
|
66
|
-
|
66
|
+
TAB = ParserTab.new ## (strict) tab-separated
|
67
|
+
TABLE = ParserTable.new ## space-separated e.g /[ \t]+/
|
67
68
|
FIXED = ParserFixed.new
|
68
69
|
|
69
70
|
|
@@ -80,6 +81,7 @@ class Parser
|
|
80
81
|
def self.postgresql_text() POSTGRESQL_TEXT; end
|
81
82
|
def self.postgres_text() postgresql_text; end
|
82
83
|
def self.tab() TAB; end
|
84
|
+
def self.table() TABLE; end
|
83
85
|
def self.fixed() FIXED; end
|
84
86
|
def self.fix() fixed; end
|
85
87
|
def self.f() fixed; end
|
@@ -103,6 +105,7 @@ class CsvReader
|
|
103
105
|
|
104
106
|
|
105
107
|
TAB = Builder.new( Parser::TAB )
|
108
|
+
TABLE = Builder.new( Parser::TABLE )
|
106
109
|
FIXED = Builder.new( Parser::FIXED )
|
107
110
|
|
108
111
|
|
@@ -119,6 +122,7 @@ class CsvReader
|
|
119
122
|
def self.postgresql_text() POSTGRESQL_TEXT; end
|
120
123
|
def self.postgres_text() postgresql_text; end
|
121
124
|
def self.tab() TAB; end
|
125
|
+
def self.table() TABLE; end
|
122
126
|
def self.fixed() FIXED; end
|
123
127
|
def self.fix() fixed; end
|
124
128
|
def self.f() fixed; end
|
@@ -141,6 +145,7 @@ class CsvHashReader
|
|
141
145
|
|
142
146
|
|
143
147
|
TAB = Builder.new( Parser::TAB )
|
148
|
+
TABLE = Builder.new( Parser::TABLE )
|
144
149
|
FIXED = Builder.new( Parser::FIXED )
|
145
150
|
|
146
151
|
|
@@ -157,6 +162,7 @@ class CsvHashReader
|
|
157
162
|
def self.postgresql_text() POSTGRESQL_TEXT; end
|
158
163
|
def self.postgres_text() postgresql_text; end
|
159
164
|
def self.tab() TAB; end
|
165
|
+
def self.table() TABLE; end
|
160
166
|
def self.fixed() FIXED; end
|
161
167
|
def self.fix() fixed; end
|
162
168
|
def self.f() fixed; end
|
@@ -0,0 +1,89 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
class CsvReader
|
4
|
+
|
5
|
+
class ParserTable
|
6
|
+
|
7
|
+
###################################
|
8
|
+
## add simple logger with debug flag/switch
|
9
|
+
#
|
10
|
+
# use Parser.debug = true # to turn on
|
11
|
+
#
|
12
|
+
# todo/fix: use logutils instead of std logger - why? why not?
|
13
|
+
|
14
|
+
def self.build_logger()
|
15
|
+
l = Logger.new( STDOUT )
|
16
|
+
l.level = :info ## set to :info on start; note: is 0 (debug) by default
|
17
|
+
l
|
18
|
+
end
|
19
|
+
def self.logger() @@logger ||= build_logger; end
|
20
|
+
def logger() self.class.logger; end
|
21
|
+
|
22
|
+
|
23
|
+
|
24
|
+
|
25
|
+
def parse( data, **kwargs, &block )
|
26
|
+
|
27
|
+
## note: input: required each_line (string or io/file for example)
|
28
|
+
## note: kwargs NOT used for now (but required for "protocol/interface" by other parsers)
|
29
|
+
|
30
|
+
input = data ## assume it's a string or io/file handle
|
31
|
+
|
32
|
+
if block_given?
|
33
|
+
parse_lines( input, &block )
|
34
|
+
else
|
35
|
+
records = []
|
36
|
+
|
37
|
+
parse_lines( input ) do |record|
|
38
|
+
records << record
|
39
|
+
end
|
40
|
+
|
41
|
+
records
|
42
|
+
end
|
43
|
+
end ## method parse
|
44
|
+
|
45
|
+
|
46
|
+
|
47
|
+
private
|
48
|
+
|
49
|
+
def parse_lines( input, &block )
|
50
|
+
|
51
|
+
## note: each line only works with \n (windows) or \r\n (unix)
|
52
|
+
## will NOT work with \r (old mac, any others?) only!!!!
|
53
|
+
input.each_line do |line|
|
54
|
+
|
55
|
+
logger.debug "line:" if logger.debug?
|
56
|
+
logger.debug line.pretty_inspect if logger.debug?
|
57
|
+
|
58
|
+
|
59
|
+
## note: chomp('') if is an empty string,
|
60
|
+
## it will remove all trailing newlines from the string.
|
61
|
+
## use line.sub(/[\n\r]*$/, '') or similar instead - why? why not?
|
62
|
+
line = line.chomp( '' )
|
63
|
+
line = line.strip ## strip leading and trailing whitespaces (space/tab) too
|
64
|
+
logger.debug line.pretty_inspect if logger.debug?
|
65
|
+
|
66
|
+
if line.empty? ## skip blank lines
|
67
|
+
logger.debug "skip blank line" if logger.debug?
|
68
|
+
next
|
69
|
+
end
|
70
|
+
|
71
|
+
if line.start_with?( "#" ) ## skip comment lines
|
72
|
+
logger.debug "skip comment line" if logger.debug?
|
73
|
+
next
|
74
|
+
end
|
75
|
+
|
76
|
+
# note: string.split defaults to split by space (e.g. /\s+/) :-)
|
77
|
+
# for just make it "explicit" with /[ \t]+/
|
78
|
+
|
79
|
+
values = line.split( /[ \t]+/ )
|
80
|
+
logger.debug values.pretty_inspect if logger.debug?
|
81
|
+
|
82
|
+
## note: requires block - enforce? how? why? why not?
|
83
|
+
block.call( values )
|
84
|
+
end
|
85
|
+
end # method parse_lines
|
86
|
+
|
87
|
+
|
88
|
+
end # class ParserTable
|
89
|
+
end # class CsvReader
|
data/lib/csvreader/version.rb
CHANGED
@@ -3,16 +3,24 @@
|
|
3
3
|
|
4
4
|
class CsvReader ## note: uses a class for now - change to module - why? why not?
|
5
5
|
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
6
|
+
module Version
|
7
|
+
MAJOR = 1 ## todo: namespace inside version or something - why? why not??
|
8
|
+
MINOR = 1
|
9
|
+
PATCH = 4
|
10
10
|
|
11
|
+
## self.to_s - why? why not?
|
12
|
+
end
|
13
|
+
|
14
|
+
VERSION = [Version::MAJOR,
|
15
|
+
Version::MINOR,
|
16
|
+
Version::PATCH].join('.')
|
11
17
|
|
12
|
-
def self.version
|
18
|
+
def self.version ## keep (as an alternative to VERSION) - why? why not?
|
13
19
|
VERSION
|
14
20
|
end
|
15
21
|
|
22
|
+
|
23
|
+
|
16
24
|
def self.banner
|
17
25
|
"csvreader/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}]"
|
18
26
|
end
|
data/test/helper.rb
CHANGED
@@ -21,3 +21,4 @@ end
|
|
21
21
|
## CsvReader::ParserStd.logger.level = :debug ## turn on "global" logging
|
22
22
|
## CsvReader::ParserStrict.logger.level = :debug ## turn on "global" logging
|
23
23
|
## CsvReader::ParserFixed.logger.level = :debug ## turn on "global" logging
|
24
|
+
CsvReader::ParserTable.logger.level = :debug ## turn on "global" logging
|
@@ -0,0 +1,35 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
###
|
4
|
+
# to run use
|
5
|
+
# ruby -I ./lib -I ./test test/test_parser_table.rb
|
6
|
+
|
7
|
+
|
8
|
+
require 'helper'
|
9
|
+
|
10
|
+
class TestParserTable < MiniTest::Test
|
11
|
+
|
12
|
+
|
13
|
+
def parser() CsvReader::Parser::TABLE; end
|
14
|
+
|
15
|
+
|
16
|
+
def test_contacts
|
17
|
+
records = [["aa", "bbb"],
|
18
|
+
["cc", "dd", "ee"]]
|
19
|
+
|
20
|
+
assert_equal records, parser.parse( <<TXT )
|
21
|
+
# space-separated with comments and blank lines
|
22
|
+
|
23
|
+
aa bbb
|
24
|
+
cc dd ee
|
25
|
+
|
26
|
+
TXT
|
27
|
+
|
28
|
+
assert_equal records, parser.parse( <<TXT )
|
29
|
+
aa bbb
|
30
|
+
cc dd ee
|
31
|
+
TXT
|
32
|
+
end
|
33
|
+
|
34
|
+
|
35
|
+
end # class TestParserTable
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: csvreader
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.1.
|
4
|
+
version: 1.1.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Gerald Bauer
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-10-
|
11
|
+
date: 2018-10-27 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rdoc
|
@@ -65,6 +65,7 @@ files:
|
|
65
65
|
- lib/csvreader/parser_std.rb
|
66
66
|
- lib/csvreader/parser_strict.rb
|
67
67
|
- lib/csvreader/parser_tab.rb
|
68
|
+
- lib/csvreader/parser_table.rb
|
68
69
|
- lib/csvreader/reader.rb
|
69
70
|
- lib/csvreader/reader_hash.rb
|
70
71
|
- lib/csvreader/version.rb
|
@@ -91,6 +92,7 @@ files:
|
|
91
92
|
- test/test_parser_quotes.rb
|
92
93
|
- test/test_parser_strict.rb
|
93
94
|
- test/test_parser_tab.rb
|
95
|
+
- test/test_parser_table.rb
|
94
96
|
- test/test_reader.rb
|
95
97
|
- test/test_reader_converters.rb
|
96
98
|
- test/test_reader_hash.rb
|