csvreader 1.1.3 → 1.1.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Manifest.txt +2 -0
- data/README.md +32 -37
- data/lib/csvreader/base.rb +8 -2
- data/lib/csvreader/parser_table.rb +89 -0
- data/lib/csvreader/version.rb +13 -5
- data/test/helper.rb +1 -0
- data/test/test_parser_table.rb +35 -0
- metadata +4 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: dd541a30e12622b666a01bc18b5ba80d93cbd1aa
|
4
|
+
data.tar.gz: 773aa117ae2b41bfce6a8cde5df1ed1df4eadeab
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b4720613f00b591b3d419f03c06aef8d5e8bb274e4d35e7b1f12f6d14aadb64d1611c7822e3fdc0d2a63c60b653b79a64339aabfd107fb73b14a47bf646c7153
|
7
|
+
data.tar.gz: 3ce20e6925e81e80c6a61e03fd208d852741b7c8d3dd32d045f56b748844782a47810ce041eaa1a8d30f63f282c1655fc6939f7f0ff10294e25ec0be48ede043
|
data/Manifest.txt
CHANGED
@@ -14,6 +14,7 @@ lib/csvreader/parser_json.rb
|
|
14
14
|
lib/csvreader/parser_std.rb
|
15
15
|
lib/csvreader/parser_strict.rb
|
16
16
|
lib/csvreader/parser_tab.rb
|
17
|
+
lib/csvreader/parser_table.rb
|
17
18
|
lib/csvreader/reader.rb
|
18
19
|
lib/csvreader/reader_hash.rb
|
19
20
|
lib/csvreader/version.rb
|
@@ -40,6 +41,7 @@ test/test_parser_numeric.rb
|
|
40
41
|
test/test_parser_quotes.rb
|
41
42
|
test/test_parser_strict.rb
|
42
43
|
test/test_parser_tab.rb
|
44
|
+
test/test_parser_table.rb
|
43
45
|
test/test_reader.rb
|
44
46
|
test/test_reader_converters.rb
|
45
47
|
test/test_reader_hash.rb
|
data/README.md
CHANGED
@@ -8,8 +8,13 @@
|
|
8
8
|
* forum :: [wwwmake](http://groups.google.com/group/wwwmake)
|
9
9
|
|
10
10
|
|
11
|
+
|
11
12
|
## What's News?
|
12
13
|
|
14
|
+
**v1.1.4** Added new "classic" table parser (see `ParserTable`) for supporting fields separated by (one or more) spaces
|
15
|
+
e.g. `Csv.table.parse( txt )`.
|
16
|
+
|
17
|
+
|
13
18
|
**v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
|
14
19
|
Now you can use both, that is, single (`‹...›'` or `›...‹'`)
|
15
20
|
or double (`«...»` or `»...«`).
|
@@ -38,7 +43,7 @@ for meta data, the first one "wins" - you CANNOT use both.
|
|
38
43
|
|
39
44
|
|
40
45
|
**v1.1.0**: Added new fixed width field (fwf) parser (see `ParserFixed`) for supporting fields with fixed width (and no separator)
|
41
|
-
e.g
|
46
|
+
e.g. `Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
|
42
47
|
|
43
48
|
|
44
49
|
**v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
|
@@ -396,6 +401,32 @@ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
|
|
396
401
|
Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
|
397
402
|
```
|
398
403
|
|
404
|
+
Or use the ARFF (attribute-relation file format)-like alternative style with `@`-directives
|
405
|
+
inside comments (for easier backwards compatibility with old readers)
|
406
|
+
for "meta data" in the header (before any records):
|
407
|
+
|
408
|
+
```
|
409
|
+
##########################
|
410
|
+
# try with some comments
|
411
|
+
# and blank lines even before @-directives in header
|
412
|
+
#
|
413
|
+
# @RELATION Beer
|
414
|
+
#
|
415
|
+
# @ATTRIBUTE Brewery
|
416
|
+
# @ATTRIBUTE City
|
417
|
+
# @ATTRIBUTE Name
|
418
|
+
# @ATTRIBUTE Abv
|
419
|
+
|
420
|
+
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
|
421
|
+
Augustiner Bräu München,München,Edelstoff,5.6%
|
422
|
+
|
423
|
+
Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
|
424
|
+
Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
|
425
|
+
Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
|
426
|
+
Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
|
427
|
+
```
|
428
|
+
|
429
|
+
|
399
430
|
|
400
431
|
### Q: How can I change the default format / dialect?
|
401
432
|
|
@@ -535,42 +566,6 @@ Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] ) # or Csv.fix or Csv.f
|
|
535
566
|
```
|
536
567
|
|
537
568
|
|
538
|
-
Bonus: If the width is a string (not an array)
|
539
|
-
(e.g. `'a8 a8 a32 Z*'` or `'A8 A8 A32 Z*'` and so on)
|
540
|
-
than the fixed width field parser
|
541
|
-
will use `String#unpack` and the value of width as its format string spec.
|
542
|
-
Example:
|
543
|
-
|
544
|
-
``` ruby
|
545
|
-
txt = <<TXT
|
546
|
-
12345678123456781234567890123456789012345678901212345678901234
|
547
|
-
TXT
|
548
|
-
|
549
|
-
Csv.fixed.parse( txt, width: 'a8 a8 a32 Z*' ) # or Csv.fix or Csv.f
|
550
|
-
# => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
|
551
|
-
|
552
|
-
txt = <<TXT
|
553
|
-
John Smith john@example.com 1-888-555-6666
|
554
|
-
Michele O'Reileymichele@example.com 1-333-321-8765
|
555
|
-
TXT
|
556
|
-
|
557
|
-
Csv.fixed.parse( txt, width: 'A8 A8 A32 Z*' ) # or Csv.fix or Csv.f
|
558
|
-
# => [["John", "Smith", "john@example.com", "1-888-555-6666"],
|
559
|
-
# ["Michele", "O'Reiley", "michele@example.com", "1-333-321-8765"]]
|
560
|
-
|
561
|
-
# and so on
|
562
|
-
```
|
563
|
-
|
564
|
-
| String Directive | Returns | Meaning |
|
565
|
-
|------------------|---------|-------------------------|
|
566
|
-
| `A` | String | Arbitrary binary string (remove trailing nulls and ASCII spaces) |
|
567
|
-
| `a` | String | Arbitrary binary string |
|
568
|
-
| `Z` | String | Null-terminated string |
|
569
|
-
|
570
|
-
|
571
|
-
and many more. See the `String#unpack` documentation
|
572
|
-
for the complete format spec and directives.
|
573
|
-
|
574
569
|
|
575
570
|
|
576
571
|
|
data/lib/csvreader/base.rb
CHANGED
@@ -19,6 +19,7 @@ require 'csvreader/parser_strict' # flexible (strict - no leading/trailing spa
|
|
19
19
|
require 'csvreader/parser_tab'
|
20
20
|
require 'csvreader/parser_fixed'
|
21
21
|
require 'csvreader/parser_json'
|
22
|
+
require 'csvreader/parser_table'
|
22
23
|
require 'csvreader/parser'
|
23
24
|
require 'csvreader/converter'
|
24
25
|
require 'csvreader/reader'
|
@@ -62,8 +63,8 @@ class Parser
|
|
62
63
|
null: "\\N" )
|
63
64
|
|
64
65
|
|
65
|
-
TAB = ParserTab.new
|
66
|
-
|
66
|
+
TAB = ParserTab.new ## (strict) tab-separated
|
67
|
+
TABLE = ParserTable.new ## space-separated e.g /[ \t]+/
|
67
68
|
FIXED = ParserFixed.new
|
68
69
|
|
69
70
|
|
@@ -80,6 +81,7 @@ class Parser
|
|
80
81
|
def self.postgresql_text() POSTGRESQL_TEXT; end
|
81
82
|
def self.postgres_text() postgresql_text; end
|
82
83
|
def self.tab() TAB; end
|
84
|
+
def self.table() TABLE; end
|
83
85
|
def self.fixed() FIXED; end
|
84
86
|
def self.fix() fixed; end
|
85
87
|
def self.f() fixed; end
|
@@ -103,6 +105,7 @@ class CsvReader
|
|
103
105
|
|
104
106
|
|
105
107
|
TAB = Builder.new( Parser::TAB )
|
108
|
+
TABLE = Builder.new( Parser::TABLE )
|
106
109
|
FIXED = Builder.new( Parser::FIXED )
|
107
110
|
|
108
111
|
|
@@ -119,6 +122,7 @@ class CsvReader
|
|
119
122
|
def self.postgresql_text() POSTGRESQL_TEXT; end
|
120
123
|
def self.postgres_text() postgresql_text; end
|
121
124
|
def self.tab() TAB; end
|
125
|
+
def self.table() TABLE; end
|
122
126
|
def self.fixed() FIXED; end
|
123
127
|
def self.fix() fixed; end
|
124
128
|
def self.f() fixed; end
|
@@ -141,6 +145,7 @@ class CsvHashReader
|
|
141
145
|
|
142
146
|
|
143
147
|
TAB = Builder.new( Parser::TAB )
|
148
|
+
TABLE = Builder.new( Parser::TABLE )
|
144
149
|
FIXED = Builder.new( Parser::FIXED )
|
145
150
|
|
146
151
|
|
@@ -157,6 +162,7 @@ class CsvHashReader
|
|
157
162
|
def self.postgresql_text() POSTGRESQL_TEXT; end
|
158
163
|
def self.postgres_text() postgresql_text; end
|
159
164
|
def self.tab() TAB; end
|
165
|
+
def self.table() TABLE; end
|
160
166
|
def self.fixed() FIXED; end
|
161
167
|
def self.fix() fixed; end
|
162
168
|
def self.f() fixed; end
|
@@ -0,0 +1,89 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
class CsvReader
|
4
|
+
|
5
|
+
class ParserTable
|
6
|
+
|
7
|
+
###################################
|
8
|
+
## add simple logger with debug flag/switch
|
9
|
+
#
|
10
|
+
# use Parser.debug = true # to turn on
|
11
|
+
#
|
12
|
+
# todo/fix: use logutils instead of std logger - why? why not?
|
13
|
+
|
14
|
+
def self.build_logger()
|
15
|
+
l = Logger.new( STDOUT )
|
16
|
+
l.level = :info ## set to :info on start; note: is 0 (debug) by default
|
17
|
+
l
|
18
|
+
end
|
19
|
+
def self.logger() @@logger ||= build_logger; end
|
20
|
+
def logger() self.class.logger; end
|
21
|
+
|
22
|
+
|
23
|
+
|
24
|
+
|
25
|
+
def parse( data, **kwargs, &block )
|
26
|
+
|
27
|
+
## note: input: required each_line (string or io/file for example)
|
28
|
+
## note: kwargs NOT used for now (but required for "protocol/interface" by other parsers)
|
29
|
+
|
30
|
+
input = data ## assume it's a string or io/file handle
|
31
|
+
|
32
|
+
if block_given?
|
33
|
+
parse_lines( input, &block )
|
34
|
+
else
|
35
|
+
records = []
|
36
|
+
|
37
|
+
parse_lines( input ) do |record|
|
38
|
+
records << record
|
39
|
+
end
|
40
|
+
|
41
|
+
records
|
42
|
+
end
|
43
|
+
end ## method parse
|
44
|
+
|
45
|
+
|
46
|
+
|
47
|
+
private
|
48
|
+
|
49
|
+
def parse_lines( input, &block )
|
50
|
+
|
51
|
+
## note: each line only works with \n (windows) or \r\n (unix)
|
52
|
+
## will NOT work with \r (old mac, any others?) only!!!!
|
53
|
+
input.each_line do |line|
|
54
|
+
|
55
|
+
logger.debug "line:" if logger.debug?
|
56
|
+
logger.debug line.pretty_inspect if logger.debug?
|
57
|
+
|
58
|
+
|
59
|
+
## note: chomp('') if is an empty string,
|
60
|
+
## it will remove all trailing newlines from the string.
|
61
|
+
## use line.sub(/[\n\r]*$/, '') or similar instead - why? why not?
|
62
|
+
line = line.chomp( '' )
|
63
|
+
line = line.strip ## strip leading and trailing whitespaces (space/tab) too
|
64
|
+
logger.debug line.pretty_inspect if logger.debug?
|
65
|
+
|
66
|
+
if line.empty? ## skip blank lines
|
67
|
+
logger.debug "skip blank line" if logger.debug?
|
68
|
+
next
|
69
|
+
end
|
70
|
+
|
71
|
+
if line.start_with?( "#" ) ## skip comment lines
|
72
|
+
logger.debug "skip comment line" if logger.debug?
|
73
|
+
next
|
74
|
+
end
|
75
|
+
|
76
|
+
# note: string.split defaults to split by space (e.g. /\s+/) :-)
|
77
|
+
# for just make it "explicit" with /[ \t]+/
|
78
|
+
|
79
|
+
values = line.split( /[ \t]+/ )
|
80
|
+
logger.debug values.pretty_inspect if logger.debug?
|
81
|
+
|
82
|
+
## note: requires block - enforce? how? why? why not?
|
83
|
+
block.call( values )
|
84
|
+
end
|
85
|
+
end # method parse_lines
|
86
|
+
|
87
|
+
|
88
|
+
end # class ParserTable
|
89
|
+
end # class CsvReader
|
data/lib/csvreader/version.rb
CHANGED
@@ -3,16 +3,24 @@
|
|
3
3
|
|
4
4
|
class CsvReader ## note: uses a class for now - change to module - why? why not?
|
5
5
|
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
6
|
+
module Version
|
7
|
+
MAJOR = 1 ## todo: namespace inside version or something - why? why not??
|
8
|
+
MINOR = 1
|
9
|
+
PATCH = 4
|
10
10
|
|
11
|
+
## self.to_s - why? why not?
|
12
|
+
end
|
13
|
+
|
14
|
+
VERSION = [Version::MAJOR,
|
15
|
+
Version::MINOR,
|
16
|
+
Version::PATCH].join('.')
|
11
17
|
|
12
|
-
def self.version
|
18
|
+
def self.version ## keep (as an alternative to VERSION) - why? why not?
|
13
19
|
VERSION
|
14
20
|
end
|
15
21
|
|
22
|
+
|
23
|
+
|
16
24
|
def self.banner
|
17
25
|
"csvreader/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}]"
|
18
26
|
end
|
data/test/helper.rb
CHANGED
@@ -21,3 +21,4 @@ end
|
|
21
21
|
## CsvReader::ParserStd.logger.level = :debug ## turn on "global" logging
|
22
22
|
## CsvReader::ParserStrict.logger.level = :debug ## turn on "global" logging
|
23
23
|
## CsvReader::ParserFixed.logger.level = :debug ## turn on "global" logging
|
24
|
+
CsvReader::ParserTable.logger.level = :debug ## turn on "global" logging
|
@@ -0,0 +1,35 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
###
|
4
|
+
# to run use
|
5
|
+
# ruby -I ./lib -I ./test test/test_parser_table.rb
|
6
|
+
|
7
|
+
|
8
|
+
require 'helper'
|
9
|
+
|
10
|
+
class TestParserTable < MiniTest::Test
|
11
|
+
|
12
|
+
|
13
|
+
def parser() CsvReader::Parser::TABLE; end
|
14
|
+
|
15
|
+
|
16
|
+
def test_contacts
|
17
|
+
records = [["aa", "bbb"],
|
18
|
+
["cc", "dd", "ee"]]
|
19
|
+
|
20
|
+
assert_equal records, parser.parse( <<TXT )
|
21
|
+
# space-separated with comments and blank lines
|
22
|
+
|
23
|
+
aa bbb
|
24
|
+
cc dd ee
|
25
|
+
|
26
|
+
TXT
|
27
|
+
|
28
|
+
assert_equal records, parser.parse( <<TXT )
|
29
|
+
aa bbb
|
30
|
+
cc dd ee
|
31
|
+
TXT
|
32
|
+
end
|
33
|
+
|
34
|
+
|
35
|
+
end # class TestParserTable
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: csvreader
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.1.
|
4
|
+
version: 1.1.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Gerald Bauer
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-10-
|
11
|
+
date: 2018-10-27 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rdoc
|
@@ -65,6 +65,7 @@ files:
|
|
65
65
|
- lib/csvreader/parser_std.rb
|
66
66
|
- lib/csvreader/parser_strict.rb
|
67
67
|
- lib/csvreader/parser_tab.rb
|
68
|
+
- lib/csvreader/parser_table.rb
|
68
69
|
- lib/csvreader/reader.rb
|
69
70
|
- lib/csvreader/reader_hash.rb
|
70
71
|
- lib/csvreader/version.rb
|
@@ -91,6 +92,7 @@ files:
|
|
91
92
|
- test/test_parser_quotes.rb
|
92
93
|
- test/test_parser_strict.rb
|
93
94
|
- test/test_parser_tab.rb
|
95
|
+
- test/test_parser_table.rb
|
94
96
|
- test/test_reader.rb
|
95
97
|
- test/test_reader_converters.rb
|
96
98
|
- test/test_reader_hash.rb
|