csvreader 1.1.2 → 1.1.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: cf620967ec1983a211f8e2436a4b50aca3bbe023
4
- data.tar.gz: 76da0bbce4a76c4b60e37f1cb93be23d2aec504e
3
+ metadata.gz: a920108ec183cff7c7cad8c0d967390b4f2bd38f
4
+ data.tar.gz: 2a32715b6e1eb3e83b3837de1d151169d8b3455f
5
5
  SHA512:
6
- metadata.gz: 6024f630a6c982beffd597107cfa75c1e2d6e86e174408632f4e31aa8d4c5a2ea6be8608f678f64da6bd6ba914e9f3ed55fce044a25593bd92757a82bb0d082e
7
- data.tar.gz: 98bed6e7938399640d942d5c8d9f420d01f4d048d06c09dec2f1e6e7e833a8c38c42419a520445b13166743615de7bd120eec20a4c607d377ebf40a0109bcc47
6
+ metadata.gz: f2264455eda5136261628cc77de24494d9ea11bb116c9ca5e36495f4f4b90101356444c9da75c37b6d5b9419b57ce4a145830bd1d6919ce0cbdb2ef05673bfad
7
+ data.tar.gz: 9c539db1ccac369ae23113587e9d529a95de0b080f3c12687e237d46bea1bdbb157b57f7b2f61d72f637cd914aecb11fbd8daeec11ecb38f49b65669a004e774
@@ -37,6 +37,7 @@ test/test_parser_java.rb
37
37
  test/test_parser_meta.rb
38
38
  test/test_parser_null.rb
39
39
  test/test_parser_numeric.rb
40
+ test/test_parser_quotes.rb
40
41
  test/test_parser_strict.rb
41
42
  test/test_parser_tab.rb
42
43
  test/test_reader.rb
data/README.md CHANGED
@@ -10,15 +10,25 @@
10
10
 
11
11
  ## What's News?
12
12
 
13
+ **v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
14
+ Now you can use both, that is, single (`‹...›'` or `›...‹'`)
15
+ or double (`«...»` or `»...«`).
16
+ Note: A quote only "kicks-in" if it's the first (non-whitespace)
17
+ character of the value (otherwise it's just a "vanilla" literal character).
13
18
 
14
19
 
15
20
  **v1.1.2**: Added built-in support for single quotes (`'`) to default parser ("The Right Way").
16
21
  Now you can use both, that is, single (`'...'`) or double quotes (`"..."`)
17
22
  like in ruby (or javascript or html or ...) :-).
23
+ Note: A quote only "kicks-in" if it's the first (non-whitespace)
24
+ character of the value (otherwise it's just a "vanilla" literal character)
25
+ e.g. `48°51'24"N` needs no quote :-).
26
+ With the "strict" parser you will get a firework of "stray" quote errors / exceptions.
27
+
18
28
 
19
29
 
20
30
  **v1.1.1**: Added built-in support for (optional) alternative comments (`%`) - used by
21
- ARFF (attribute relation file format) -
31
+ [ARFF (attribute-relation file format)](https://waikato.github.io/weka-wiki/arff/) -
22
32
  and support for (optional) directives (`@`) in header (that is, before any records)
23
33
  to default parser ("The Right Way").
24
34
  Now you can use either `#` or `%` for comments, the first one "wins" - you CANNOT use both.
@@ -33,12 +43,13 @@ e.g.`Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
33
43
 
34
44
  **v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
35
45
  in header (that is, before any records)
36
- to default parser ("The Right Way"). See [CSVY.org](http://csvy.org) for more.
46
+ to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](http://csvy.org).
37
47
  Use `Csv.parser.meta` to get the parsed meta data block hash (or `nil`) if none.
38
48
 
39
49
 
40
50
 
41
51
 
52
+
42
53
  ## Usage
43
54
 
44
55
 
@@ -359,6 +370,32 @@ Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
359
370
  ```
360
371
 
361
372
 
373
+ Or use the ARFF (attribute-relation file format)-like alternative style
374
+ with `%` for comments and `@`-directives
375
+ for "meta data" in the header (before any records):
376
+
377
+ ```
378
+ %%%%%%%%%%%%%%%%%%
379
+ % try with some comments
380
+ % and blank lines even before @-directives in header
381
+
382
+ @RELATION Beer
383
+
384
+ @ATTRIBUTE Brewery
385
+ @ATTRIBUTE City
386
+ @ATTRIBUTE Name
387
+ @ATTRIBUTE Abv
388
+
389
+ @DATA
390
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
391
+ Augustiner Bräu München,München,Edelstoff,5.6%
392
+
393
+ Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
394
+ Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
395
+ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
396
+ Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
397
+ ```
398
+
362
399
 
363
400
  ### Q: How can I change the default format / dialect?
364
401
 
@@ -166,4 +166,4 @@ end # class CsvHashReader
166
166
 
167
167
 
168
168
  # say hello
169
- puts CsvReader.banner if $DEBUG || (defined?($RUBYLIBS_DEBUG) && $RUBYLIBS_DEBUG)
169
+ puts CsvReader.banner if $DEBUG || (defined?($RUBYCOCO_DEBUG) && $RUBYCOCO_DEBUG)
@@ -128,13 +128,13 @@ end
128
128
 
129
129
 
130
130
 
131
- def parse_quote( input, quote:)
131
+ def parse_quote( input, opening_quote:, closing_quote:)
132
132
  value = ""
133
- if input.peek == quote
134
- input.getc ## eat-up quote
133
+ if input.peek == opening_quote
134
+ input.getc ## eat-up opening quote
135
135
 
136
136
  loop do
137
- while (c=input.peek; !(c==quote || c==BACKSLASH || input.eof?))
137
+ while (c=input.peek; !(c==closing_quote || c==BACKSLASH || input.eof?))
138
138
  value << input.getc ## eat-up everything until hitting quote (e.g. " or ') or backslash (escape)
139
139
  end
140
140
 
@@ -144,7 +144,9 @@ def parse_quote( input, quote:)
144
144
  value << parse_escape( input )
145
145
  else ## assume input.peek == quote
146
146
  input.getc ## eat-up quote
147
- if input.peek == quote ## doubled up quote?
147
+ if opening_quote == closing_quote && input.peek == closing_quote
148
+ ## doubled up quote?
149
+ # note: only works (enabled) for "" or '' and NOT for «»,‹›.. (if opening and closing differ)
148
150
  value << input.getc ## add doube quote and continue!!!!
149
151
  else
150
152
  break
@@ -152,7 +154,7 @@ def parse_quote( input, quote:)
152
154
  end
153
155
  end
154
156
  else
155
- raise ParseError.new( "found >#{input.peek} (#{input.peek.ord})< - QUOTE (#{quote}) expected in parse_quote!!!!" )
157
+ raise ParseError.new( "found >#{input.peek} (#{input.peek.ord})< - CLOSING QUOTE (#{closing_quote}) expected in parse_quote!!!!" )
156
158
  end
157
159
  value
158
160
  end
@@ -182,18 +184,36 @@ def parse_field( input )
182
184
  end
183
185
  elsif input.peek == DOUBLE_QUOTE
184
186
  logger.debug "start double_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
185
- value << parse_quote( input, quote: DOUBLE_QUOTE )
187
+ value << parse_quote( input, opening_quote: DOUBLE_QUOTE,
188
+ closing_quote: DOUBLE_QUOTE )
186
189
 
187
190
  ## note: always eat-up all trailing spaces (" ") and tabs (\t)
188
191
  skip_spaces( input )
189
192
  logger.debug "end double_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
190
193
  elsif input.peek == SINGLE_QUOTE ## allow single quote too (by default)
191
194
  logger.debug "start single_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
192
- value << parse_quote( input, quote: SINGLE_QUOTE )
195
+ value << parse_quote( input, opening_quote: SINGLE_QUOTE,
196
+ closing_quote: SINGLE_QUOTE )
193
197
 
194
198
  ## note: always eat-up all trailing spaces (" ") and tabs (\t)
195
199
  skip_spaces( input )
196
200
  logger.debug "end single_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
201
+ elsif input.peek == "«"
202
+ value << parse_quote( input, opening_quote: "«",
203
+ closing_quote: "»" )
204
+ skip_spaces( input )
205
+ elsif input.peek == "»"
206
+ value << parse_quote( input, opening_quote: "»",
207
+ closing_quote: "«" )
208
+ skip_spaces( input )
209
+ elsif input.peek == "‹"
210
+ value << parse_quote( input, opening_quote: "‹",
211
+ closing_quote: "›" )
212
+ skip_spaces( input )
213
+ elsif input.peek == "›"
214
+ value << parse_quote( input, opening_quote: "›",
215
+ closing_quote: "‹" )
216
+ skip_spaces( input )
197
217
  else
198
218
  logger.debug "start reg field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
199
219
  ## consume simple value
@@ -5,7 +5,7 @@ class CsvReader ## note: uses a class for now - change to module - why? why no
5
5
 
6
6
  MAJOR = 1 ## todo: namespace inside version or something - why? why not??
7
7
  MINOR = 1
8
- PATCH = 2
8
+ PATCH = 3
9
9
  VERSION = [MAJOR,MINOR,PATCH].join('.')
10
10
 
11
11
 
@@ -0,0 +1,53 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_parser_quotes.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+
11
+ class TestParserQuotes < MiniTest::Test
12
+
13
+
14
+ def parser
15
+ CsvReader::Parser::DEFAULT
16
+ end
17
+
18
+
19
+ def test_french_single
20
+ assert_equal [[ "a", "b", "c" ]],
21
+ parser.parse( " ‹a›, ‹b›, ›c‹ " )
22
+
23
+ assert_equal [[ "a,1", " b,2", "c, 3" ]],
24
+ parser.parse( " ‹a,1›, ‹ b,2›, ›c, 3‹ " )
25
+
26
+ assert_equal [[ %Q{"a"}, %Q{'b'}, %Q{c'"'"} ]],
27
+ parser.parse( %Q{ ‹"a"›, ‹'b'›, ›c'"'"‹} )
28
+
29
+ # note: quote matches only if first non-whitespace char
30
+ assert_equal [[ "_‹a›", "_‹b›", "›c‹" ]],
31
+ parser.parse( %Q{ _‹a›, _‹b›, "›c‹"} )
32
+
33
+ end
34
+
35
+
36
+ def test_french_double
37
+ assert_equal [[ "a", "b", "c" ]],
38
+ parser.parse( " «a», «b», »c« " )
39
+
40
+ assert_equal [[ "a,1", " b,2", "c, 3" ]],
41
+ parser.parse( " «a,1», « b,2», »c, 3« " )
42
+
43
+ assert_equal [[ %Q{"a"}, %Q{'b'}, %Q{c'"'"} ]],
44
+ parser.parse( %Q{ «"a"», «'b'», »c'"'"«} )
45
+
46
+ # note: quote matches only if first non-whitespace char
47
+ assert_equal [[ "_«a»", "_«b»", "»c«" ]],
48
+ parser.parse( %Q{ _«a», _«b», "»c«"} )
49
+
50
+ end
51
+
52
+
53
+ end # class TestParserQuotes
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csvreader
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.2
4
+ version: 1.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gerald Bauer
@@ -88,6 +88,7 @@ files:
88
88
  - test/test_parser_meta.rb
89
89
  - test/test_parser_null.rb
90
90
  - test/test_parser_numeric.rb
91
+ - test/test_parser_quotes.rb
91
92
  - test/test_parser_strict.rb
92
93
  - test/test_parser_tab.rb
93
94
  - test/test_reader.rb