csvreader 1.1.2 → 1.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Manifest.txt +1 -0
- data/README.md +39 -2
- data/lib/csvreader/base.rb +1 -1
- data/lib/csvreader/parser_std.rb +28 -8
- data/lib/csvreader/version.rb +1 -1
- data/test/test_parser_quotes.rb +53 -0
- metadata +2 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a920108ec183cff7c7cad8c0d967390b4f2bd38f
|
4
|
+
data.tar.gz: 2a32715b6e1eb3e83b3837de1d151169d8b3455f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f2264455eda5136261628cc77de24494d9ea11bb116c9ca5e36495f4f4b90101356444c9da75c37b6d5b9419b57ce4a145830bd1d6919ce0cbdb2ef05673bfad
|
7
|
+
data.tar.gz: 9c539db1ccac369ae23113587e9d529a95de0b080f3c12687e237d46bea1bdbb157b57f7b2f61d72f637cd914aecb11fbd8daeec11ecb38f49b65669a004e774
|
data/Manifest.txt
CHANGED
data/README.md
CHANGED
@@ -10,15 +10,25 @@
|
|
10
10
|
|
11
11
|
## What's News?
|
12
12
|
|
13
|
+
**v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
|
14
|
+
Now you can use both, that is, single (`‹...›'` or `›...‹'`)
|
15
|
+
or double (`«...»` or `»...«`).
|
16
|
+
Note: A quote only "kicks-in" if it's the first (non-whitespace)
|
17
|
+
character of the value (otherwise it's just a "vanilla" literal character).
|
13
18
|
|
14
19
|
|
15
20
|
**v1.1.2**: Added built-in support for single quotes (`'`) to default parser ("The Right Way").
|
16
21
|
Now you can use both, that is, single (`'...'`) or double quotes (`"..."`)
|
17
22
|
like in ruby (or javascript or html or ...) :-).
|
23
|
+
Note: A quote only "kicks-in" if it's the first (non-whitespace)
|
24
|
+
character of the value (otherwise it's just a "vanilla" literal character)
|
25
|
+
e.g. `48°51'24"N` needs no quote :-).
|
26
|
+
With the "strict" parser you will get a firework of "stray" quote errors / exceptions.
|
27
|
+
|
18
28
|
|
19
29
|
|
20
30
|
**v1.1.1**: Added built-in support for (optional) alternative comments (`%`) - used by
|
21
|
-
ARFF (attribute
|
31
|
+
[ARFF (attribute-relation file format)](https://waikato.github.io/weka-wiki/arff/) -
|
22
32
|
and support for (optional) directives (`@`) in header (that is, before any records)
|
23
33
|
to default parser ("The Right Way").
|
24
34
|
Now you can use either `#` or `%` for comments, the first one "wins" - you CANNOT use both.
|
@@ -33,12 +43,13 @@ e.g.`Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
|
|
33
43
|
|
34
44
|
**v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
|
35
45
|
in header (that is, before any records)
|
36
|
-
to default parser ("The Right Way")
|
46
|
+
to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](http://csvy.org).
|
37
47
|
Use `Csv.parser.meta` to get the parsed meta data block hash (or `nil`) if none.
|
38
48
|
|
39
49
|
|
40
50
|
|
41
51
|
|
52
|
+
|
42
53
|
## Usage
|
43
54
|
|
44
55
|
|
@@ -359,6 +370,32 @@ Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
|
|
359
370
|
```
|
360
371
|
|
361
372
|
|
373
|
+
Or use the ARFF (attribute-relation file format)-like alternative style
|
374
|
+
with `%` for comments and `@`-directives
|
375
|
+
for "meta data" in the header (before any records):
|
376
|
+
|
377
|
+
```
|
378
|
+
%%%%%%%%%%%%%%%%%%
|
379
|
+
% try with some comments
|
380
|
+
% and blank lines even before @-directives in header
|
381
|
+
|
382
|
+
@RELATION Beer
|
383
|
+
|
384
|
+
@ATTRIBUTE Brewery
|
385
|
+
@ATTRIBUTE City
|
386
|
+
@ATTRIBUTE Name
|
387
|
+
@ATTRIBUTE Abv
|
388
|
+
|
389
|
+
@DATA
|
390
|
+
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
|
391
|
+
Augustiner Bräu München,München,Edelstoff,5.6%
|
392
|
+
|
393
|
+
Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
|
394
|
+
Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
|
395
|
+
Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
|
396
|
+
Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
|
397
|
+
```
|
398
|
+
|
362
399
|
|
363
400
|
### Q: How can I change the default format / dialect?
|
364
401
|
|
data/lib/csvreader/base.rb
CHANGED
data/lib/csvreader/parser_std.rb
CHANGED
@@ -128,13 +128,13 @@ end
|
|
128
128
|
|
129
129
|
|
130
130
|
|
131
|
-
def parse_quote( input,
|
131
|
+
def parse_quote( input, opening_quote:, closing_quote:)
|
132
132
|
value = ""
|
133
|
-
if input.peek ==
|
134
|
-
input.getc ## eat-up quote
|
133
|
+
if input.peek == opening_quote
|
134
|
+
input.getc ## eat-up opening quote
|
135
135
|
|
136
136
|
loop do
|
137
|
-
while (c=input.peek; !(c==
|
137
|
+
while (c=input.peek; !(c==closing_quote || c==BACKSLASH || input.eof?))
|
138
138
|
value << input.getc ## eat-up everything until hitting quote (e.g. " or ') or backslash (escape)
|
139
139
|
end
|
140
140
|
|
@@ -144,7 +144,9 @@ def parse_quote( input, quote:)
|
|
144
144
|
value << parse_escape( input )
|
145
145
|
else ## assume input.peek == quote
|
146
146
|
input.getc ## eat-up quote
|
147
|
-
if input.peek ==
|
147
|
+
if opening_quote == closing_quote && input.peek == closing_quote
|
148
|
+
## doubled up quote?
|
149
|
+
# note: only works (enabled) for "" or '' and NOT for «»,‹›.. (if opening and closing differ)
|
148
150
|
value << input.getc ## add doube quote and continue!!!!
|
149
151
|
else
|
150
152
|
break
|
@@ -152,7 +154,7 @@ def parse_quote( input, quote:)
|
|
152
154
|
end
|
153
155
|
end
|
154
156
|
else
|
155
|
-
raise ParseError.new( "found >#{input.peek} (#{input.peek.ord})< - QUOTE (#{
|
157
|
+
raise ParseError.new( "found >#{input.peek} (#{input.peek.ord})< - CLOSING QUOTE (#{closing_quote}) expected in parse_quote!!!!" )
|
156
158
|
end
|
157
159
|
value
|
158
160
|
end
|
@@ -182,18 +184,36 @@ def parse_field( input )
|
|
182
184
|
end
|
183
185
|
elsif input.peek == DOUBLE_QUOTE
|
184
186
|
logger.debug "start double_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
|
185
|
-
value << parse_quote( input,
|
187
|
+
value << parse_quote( input, opening_quote: DOUBLE_QUOTE,
|
188
|
+
closing_quote: DOUBLE_QUOTE )
|
186
189
|
|
187
190
|
## note: always eat-up all trailing spaces (" ") and tabs (\t)
|
188
191
|
skip_spaces( input )
|
189
192
|
logger.debug "end double_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
|
190
193
|
elsif input.peek == SINGLE_QUOTE ## allow single quote too (by default)
|
191
194
|
logger.debug "start single_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
|
192
|
-
value << parse_quote( input,
|
195
|
+
value << parse_quote( input, opening_quote: SINGLE_QUOTE,
|
196
|
+
closing_quote: SINGLE_QUOTE )
|
193
197
|
|
194
198
|
## note: always eat-up all trailing spaces (" ") and tabs (\t)
|
195
199
|
skip_spaces( input )
|
196
200
|
logger.debug "end single_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
|
201
|
+
elsif input.peek == "«"
|
202
|
+
value << parse_quote( input, opening_quote: "«",
|
203
|
+
closing_quote: "»" )
|
204
|
+
skip_spaces( input )
|
205
|
+
elsif input.peek == "»"
|
206
|
+
value << parse_quote( input, opening_quote: "»",
|
207
|
+
closing_quote: "«" )
|
208
|
+
skip_spaces( input )
|
209
|
+
elsif input.peek == "‹"
|
210
|
+
value << parse_quote( input, opening_quote: "‹",
|
211
|
+
closing_quote: "›" )
|
212
|
+
skip_spaces( input )
|
213
|
+
elsif input.peek == "›"
|
214
|
+
value << parse_quote( input, opening_quote: "›",
|
215
|
+
closing_quote: "‹" )
|
216
|
+
skip_spaces( input )
|
197
217
|
else
|
198
218
|
logger.debug "start reg field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
|
199
219
|
## consume simple value
|
data/lib/csvreader/version.rb
CHANGED
@@ -0,0 +1,53 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
###
|
4
|
+
# to run use
|
5
|
+
# ruby -I ./lib -I ./test test/test_parser_quotes.rb
|
6
|
+
|
7
|
+
|
8
|
+
require 'helper'
|
9
|
+
|
10
|
+
|
11
|
+
class TestParserQuotes < MiniTest::Test
|
12
|
+
|
13
|
+
|
14
|
+
def parser
|
15
|
+
CsvReader::Parser::DEFAULT
|
16
|
+
end
|
17
|
+
|
18
|
+
|
19
|
+
def test_french_single
|
20
|
+
assert_equal [[ "a", "b", "c" ]],
|
21
|
+
parser.parse( " ‹a›, ‹b›, ›c‹ " )
|
22
|
+
|
23
|
+
assert_equal [[ "a,1", " b,2", "c, 3" ]],
|
24
|
+
parser.parse( " ‹a,1›, ‹ b,2›, ›c, 3‹ " )
|
25
|
+
|
26
|
+
assert_equal [[ %Q{"a"}, %Q{'b'}, %Q{c'"'"} ]],
|
27
|
+
parser.parse( %Q{ ‹"a"›, ‹'b'›, ›c'"'"‹} )
|
28
|
+
|
29
|
+
# note: quote matches only if first non-whitespace char
|
30
|
+
assert_equal [[ "_‹a›", "_‹b›", "›c‹" ]],
|
31
|
+
parser.parse( %Q{ _‹a›, _‹b›, "›c‹"} )
|
32
|
+
|
33
|
+
end
|
34
|
+
|
35
|
+
|
36
|
+
def test_french_double
|
37
|
+
assert_equal [[ "a", "b", "c" ]],
|
38
|
+
parser.parse( " «a», «b», »c« " )
|
39
|
+
|
40
|
+
assert_equal [[ "a,1", " b,2", "c, 3" ]],
|
41
|
+
parser.parse( " «a,1», « b,2», »c, 3« " )
|
42
|
+
|
43
|
+
assert_equal [[ %Q{"a"}, %Q{'b'}, %Q{c'"'"} ]],
|
44
|
+
parser.parse( %Q{ «"a"», «'b'», »c'"'"«} )
|
45
|
+
|
46
|
+
# note: quote matches only if first non-whitespace char
|
47
|
+
assert_equal [[ "_«a»", "_«b»", "»c«" ]],
|
48
|
+
parser.parse( %Q{ _«a», _«b», "»c«"} )
|
49
|
+
|
50
|
+
end
|
51
|
+
|
52
|
+
|
53
|
+
end # class TestParserQuotes
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: csvreader
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.1.
|
4
|
+
version: 1.1.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Gerald Bauer
|
@@ -88,6 +88,7 @@ files:
|
|
88
88
|
- test/test_parser_meta.rb
|
89
89
|
- test/test_parser_null.rb
|
90
90
|
- test/test_parser_numeric.rb
|
91
|
+
- test/test_parser_quotes.rb
|
91
92
|
- test/test_parser_strict.rb
|
92
93
|
- test/test_parser_tab.rb
|
93
94
|
- test/test_reader.rb
|