csvreader 1.1.2 → 1.1.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Manifest.txt +1 -0
- data/README.md +39 -2
- data/lib/csvreader/base.rb +1 -1
- data/lib/csvreader/parser_std.rb +28 -8
- data/lib/csvreader/version.rb +1 -1
- data/test/test_parser_quotes.rb +53 -0
- metadata +2 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a920108ec183cff7c7cad8c0d967390b4f2bd38f
|
4
|
+
data.tar.gz: 2a32715b6e1eb3e83b3837de1d151169d8b3455f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f2264455eda5136261628cc77de24494d9ea11bb116c9ca5e36495f4f4b90101356444c9da75c37b6d5b9419b57ce4a145830bd1d6919ce0cbdb2ef05673bfad
|
7
|
+
data.tar.gz: 9c539db1ccac369ae23113587e9d529a95de0b080f3c12687e237d46bea1bdbb157b57f7b2f61d72f637cd914aecb11fbd8daeec11ecb38f49b65669a004e774
|
data/Manifest.txt
CHANGED
data/README.md
CHANGED
@@ -10,15 +10,25 @@
|
|
10
10
|
|
11
11
|
## What's News?
|
12
12
|
|
13
|
+
**v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
|
14
|
+
Now you can use both, that is, single (`‹...›'` or `›...‹'`)
|
15
|
+
or double (`«...»` or `»...«`).
|
16
|
+
Note: A quote only "kicks-in" if it's the first (non-whitespace)
|
17
|
+
character of the value (otherwise it's just a "vanilla" literal character).
|
13
18
|
|
14
19
|
|
15
20
|
**v1.1.2**: Added built-in support for single quotes (`'`) to default parser ("The Right Way").
|
16
21
|
Now you can use both, that is, single (`'...'`) or double quotes (`"..."`)
|
17
22
|
like in ruby (or javascript or html or ...) :-).
|
23
|
+
Note: A quote only "kicks-in" if it's the first (non-whitespace)
|
24
|
+
character of the value (otherwise it's just a "vanilla" literal character)
|
25
|
+
e.g. `48°51'24"N` needs no quote :-).
|
26
|
+
With the "strict" parser you will get a firework of "stray" quote errors / exceptions.
|
27
|
+
|
18
28
|
|
19
29
|
|
20
30
|
**v1.1.1**: Added built-in support for (optional) alternative comments (`%`) - used by
|
21
|
-
ARFF (attribute
|
31
|
+
[ARFF (attribute-relation file format)](https://waikato.github.io/weka-wiki/arff/) -
|
22
32
|
and support for (optional) directives (`@`) in header (that is, before any records)
|
23
33
|
to default parser ("The Right Way").
|
24
34
|
Now you can use either `#` or `%` for comments, the first one "wins" - you CANNOT use both.
|
@@ -33,12 +43,13 @@ e.g.`Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
|
|
33
43
|
|
34
44
|
**v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
|
35
45
|
in header (that is, before any records)
|
36
|
-
to default parser ("The Right Way")
|
46
|
+
to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](http://csvy.org).
|
37
47
|
Use `Csv.parser.meta` to get the parsed meta data block hash (or `nil`) if none.
|
38
48
|
|
39
49
|
|
40
50
|
|
41
51
|
|
52
|
+
|
42
53
|
## Usage
|
43
54
|
|
44
55
|
|
@@ -359,6 +370,32 @@ Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
|
|
359
370
|
```
|
360
371
|
|
361
372
|
|
373
|
+
Or use the ARFF (attribute-relation file format)-like alternative style
|
374
|
+
with `%` for comments and `@`-directives
|
375
|
+
for "meta data" in the header (before any records):
|
376
|
+
|
377
|
+
```
|
378
|
+
%%%%%%%%%%%%%%%%%%
|
379
|
+
% try with some comments
|
380
|
+
% and blank lines even before @-directives in header
|
381
|
+
|
382
|
+
@RELATION Beer
|
383
|
+
|
384
|
+
@ATTRIBUTE Brewery
|
385
|
+
@ATTRIBUTE City
|
386
|
+
@ATTRIBUTE Name
|
387
|
+
@ATTRIBUTE Abv
|
388
|
+
|
389
|
+
@DATA
|
390
|
+
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
|
391
|
+
Augustiner Bräu München,München,Edelstoff,5.6%
|
392
|
+
|
393
|
+
Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
|
394
|
+
Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
|
395
|
+
Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
|
396
|
+
Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
|
397
|
+
```
|
398
|
+
|
362
399
|
|
363
400
|
### Q: How can I change the default format / dialect?
|
364
401
|
|
data/lib/csvreader/base.rb
CHANGED
data/lib/csvreader/parser_std.rb
CHANGED
@@ -128,13 +128,13 @@ end
|
|
128
128
|
|
129
129
|
|
130
130
|
|
131
|
-
def parse_quote( input,
|
131
|
+
def parse_quote( input, opening_quote:, closing_quote:)
|
132
132
|
value = ""
|
133
|
-
if input.peek ==
|
134
|
-
input.getc ## eat-up quote
|
133
|
+
if input.peek == opening_quote
|
134
|
+
input.getc ## eat-up opening quote
|
135
135
|
|
136
136
|
loop do
|
137
|
-
while (c=input.peek; !(c==
|
137
|
+
while (c=input.peek; !(c==closing_quote || c==BACKSLASH || input.eof?))
|
138
138
|
value << input.getc ## eat-up everything until hitting quote (e.g. " or ') or backslash (escape)
|
139
139
|
end
|
140
140
|
|
@@ -144,7 +144,9 @@ def parse_quote( input, quote:)
|
|
144
144
|
value << parse_escape( input )
|
145
145
|
else ## assume input.peek == quote
|
146
146
|
input.getc ## eat-up quote
|
147
|
-
if input.peek ==
|
147
|
+
if opening_quote == closing_quote && input.peek == closing_quote
|
148
|
+
## doubled up quote?
|
149
|
+
# note: only works (enabled) for "" or '' and NOT for «»,‹›.. (if opening and closing differ)
|
148
150
|
value << input.getc ## add doube quote and continue!!!!
|
149
151
|
else
|
150
152
|
break
|
@@ -152,7 +154,7 @@ def parse_quote( input, quote:)
|
|
152
154
|
end
|
153
155
|
end
|
154
156
|
else
|
155
|
-
raise ParseError.new( "found >#{input.peek} (#{input.peek.ord})< - QUOTE (#{
|
157
|
+
raise ParseError.new( "found >#{input.peek} (#{input.peek.ord})< - CLOSING QUOTE (#{closing_quote}) expected in parse_quote!!!!" )
|
156
158
|
end
|
157
159
|
value
|
158
160
|
end
|
@@ -182,18 +184,36 @@ def parse_field( input )
|
|
182
184
|
end
|
183
185
|
elsif input.peek == DOUBLE_QUOTE
|
184
186
|
logger.debug "start double_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
|
185
|
-
value << parse_quote( input,
|
187
|
+
value << parse_quote( input, opening_quote: DOUBLE_QUOTE,
|
188
|
+
closing_quote: DOUBLE_QUOTE )
|
186
189
|
|
187
190
|
## note: always eat-up all trailing spaces (" ") and tabs (\t)
|
188
191
|
skip_spaces( input )
|
189
192
|
logger.debug "end double_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
|
190
193
|
elsif input.peek == SINGLE_QUOTE ## allow single quote too (by default)
|
191
194
|
logger.debug "start single_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
|
192
|
-
value << parse_quote( input,
|
195
|
+
value << parse_quote( input, opening_quote: SINGLE_QUOTE,
|
196
|
+
closing_quote: SINGLE_QUOTE )
|
193
197
|
|
194
198
|
## note: always eat-up all trailing spaces (" ") and tabs (\t)
|
195
199
|
skip_spaces( input )
|
196
200
|
logger.debug "end single_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
|
201
|
+
elsif input.peek == "«"
|
202
|
+
value << parse_quote( input, opening_quote: "«",
|
203
|
+
closing_quote: "»" )
|
204
|
+
skip_spaces( input )
|
205
|
+
elsif input.peek == "»"
|
206
|
+
value << parse_quote( input, opening_quote: "»",
|
207
|
+
closing_quote: "«" )
|
208
|
+
skip_spaces( input )
|
209
|
+
elsif input.peek == "‹"
|
210
|
+
value << parse_quote( input, opening_quote: "‹",
|
211
|
+
closing_quote: "›" )
|
212
|
+
skip_spaces( input )
|
213
|
+
elsif input.peek == "›"
|
214
|
+
value << parse_quote( input, opening_quote: "›",
|
215
|
+
closing_quote: "‹" )
|
216
|
+
skip_spaces( input )
|
197
217
|
else
|
198
218
|
logger.debug "start reg field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
|
199
219
|
## consume simple value
|
data/lib/csvreader/version.rb
CHANGED
@@ -0,0 +1,53 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
###
|
4
|
+
# to run use
|
5
|
+
# ruby -I ./lib -I ./test test/test_parser_quotes.rb
|
6
|
+
|
7
|
+
|
8
|
+
require 'helper'
|
9
|
+
|
10
|
+
|
11
|
+
class TestParserQuotes < MiniTest::Test
|
12
|
+
|
13
|
+
|
14
|
+
def parser
|
15
|
+
CsvReader::Parser::DEFAULT
|
16
|
+
end
|
17
|
+
|
18
|
+
|
19
|
+
def test_french_single
|
20
|
+
assert_equal [[ "a", "b", "c" ]],
|
21
|
+
parser.parse( " ‹a›, ‹b›, ›c‹ " )
|
22
|
+
|
23
|
+
assert_equal [[ "a,1", " b,2", "c, 3" ]],
|
24
|
+
parser.parse( " ‹a,1›, ‹ b,2›, ›c, 3‹ " )
|
25
|
+
|
26
|
+
assert_equal [[ %Q{"a"}, %Q{'b'}, %Q{c'"'"} ]],
|
27
|
+
parser.parse( %Q{ ‹"a"›, ‹'b'›, ›c'"'"‹} )
|
28
|
+
|
29
|
+
# note: quote matches only if first non-whitespace char
|
30
|
+
assert_equal [[ "_‹a›", "_‹b›", "›c‹" ]],
|
31
|
+
parser.parse( %Q{ _‹a›, _‹b›, "›c‹"} )
|
32
|
+
|
33
|
+
end
|
34
|
+
|
35
|
+
|
36
|
+
def test_french_double
|
37
|
+
assert_equal [[ "a", "b", "c" ]],
|
38
|
+
parser.parse( " «a», «b», »c« " )
|
39
|
+
|
40
|
+
assert_equal [[ "a,1", " b,2", "c, 3" ]],
|
41
|
+
parser.parse( " «a,1», « b,2», »c, 3« " )
|
42
|
+
|
43
|
+
assert_equal [[ %Q{"a"}, %Q{'b'}, %Q{c'"'"} ]],
|
44
|
+
parser.parse( %Q{ «"a"», «'b'», »c'"'"«} )
|
45
|
+
|
46
|
+
# note: quote matches only if first non-whitespace char
|
47
|
+
assert_equal [[ "_«a»", "_«b»", "»c«" ]],
|
48
|
+
parser.parse( %Q{ _«a», _«b», "»c«"} )
|
49
|
+
|
50
|
+
end
|
51
|
+
|
52
|
+
|
53
|
+
end # class TestParserQuotes
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: csvreader
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.1.
|
4
|
+
version: 1.1.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Gerald Bauer
|
@@ -88,6 +88,7 @@ files:
|
|
88
88
|
- test/test_parser_meta.rb
|
89
89
|
- test/test_parser_null.rb
|
90
90
|
- test/test_parser_numeric.rb
|
91
|
+
- test/test_parser_quotes.rb
|
91
92
|
- test/test_parser_strict.rb
|
92
93
|
- test/test_parser_tab.rb
|
93
94
|
- test/test_reader.rb
|