csvreader 1.1.2 → 1.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: cf620967ec1983a211f8e2436a4b50aca3bbe023
4
- data.tar.gz: 76da0bbce4a76c4b60e37f1cb93be23d2aec504e
3
+ metadata.gz: a920108ec183cff7c7cad8c0d967390b4f2bd38f
4
+ data.tar.gz: 2a32715b6e1eb3e83b3837de1d151169d8b3455f
5
5
  SHA512:
6
- metadata.gz: 6024f630a6c982beffd597107cfa75c1e2d6e86e174408632f4e31aa8d4c5a2ea6be8608f678f64da6bd6ba914e9f3ed55fce044a25593bd92757a82bb0d082e
7
- data.tar.gz: 98bed6e7938399640d942d5c8d9f420d01f4d048d06c09dec2f1e6e7e833a8c38c42419a520445b13166743615de7bd120eec20a4c607d377ebf40a0109bcc47
6
+ metadata.gz: f2264455eda5136261628cc77de24494d9ea11bb116c9ca5e36495f4f4b90101356444c9da75c37b6d5b9419b57ce4a145830bd1d6919ce0cbdb2ef05673bfad
7
+ data.tar.gz: 9c539db1ccac369ae23113587e9d529a95de0b080f3c12687e237d46bea1bdbb157b57f7b2f61d72f637cd914aecb11fbd8daeec11ecb38f49b65669a004e774
@@ -37,6 +37,7 @@ test/test_parser_java.rb
37
37
  test/test_parser_meta.rb
38
38
  test/test_parser_null.rb
39
39
  test/test_parser_numeric.rb
40
+ test/test_parser_quotes.rb
40
41
  test/test_parser_strict.rb
41
42
  test/test_parser_tab.rb
42
43
  test/test_reader.rb
data/README.md CHANGED
@@ -10,15 +10,25 @@
10
10
 
11
11
  ## What's News?
12
12
 
13
+ **v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
14
+ Now you can use both, that is, single (`‹...›'` or `›...‹'`)
15
+ or double (`«...»` or `»...«`).
16
+ Note: A quote only "kicks-in" if it's the first (non-whitespace)
17
+ character of the value (otherwise it's just a "vanilla" literal character).
13
18
 
14
19
 
15
20
  **v1.1.2**: Added built-in support for single quotes (`'`) to default parser ("The Right Way").
16
21
  Now you can use both, that is, single (`'...'`) or double quotes (`"..."`)
17
22
  like in ruby (or javascript or html or ...) :-).
23
+ Note: A quote only "kicks-in" if it's the first (non-whitespace)
24
+ character of the value (otherwise it's just a "vanilla" literal character)
25
+ e.g. `48°51'24"N` needs no quote :-).
26
+ With the "strict" parser you will get a firework of "stray" quote errors / exceptions.
27
+
18
28
 
19
29
 
20
30
  **v1.1.1**: Added built-in support for (optional) alternative comments (`%`) - used by
21
- ARFF (attribute relation file format) -
31
+ [ARFF (attribute-relation file format)](https://waikato.github.io/weka-wiki/arff/) -
22
32
  and support for (optional) directives (`@`) in header (that is, before any records)
23
33
  to default parser ("The Right Way").
24
34
  Now you can use either `#` or `%` for comments, the first one "wins" - you CANNOT use both.
@@ -33,12 +43,13 @@ e.g.`Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
33
43
 
34
44
  **v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
35
45
  in header (that is, before any records)
36
- to default parser ("The Right Way"). See [CSVY.org](http://csvy.org) for more.
46
+ to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](http://csvy.org).
37
47
  Use `Csv.parser.meta` to get the parsed meta data block hash (or `nil`) if none.
38
48
 
39
49
 
40
50
 
41
51
 
52
+
42
53
  ## Usage
43
54
 
44
55
 
@@ -359,6 +370,32 @@ Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
359
370
  ```
360
371
 
361
372
 
373
+ Or use the ARFF (attribute-relation file format)-like alternative style
374
+ with `%` for comments and `@`-directives
375
+ for "meta data" in the header (before any records):
376
+
377
+ ```
378
+ %%%%%%%%%%%%%%%%%%
379
+ % try with some comments
380
+ % and blank lines even before @-directives in header
381
+
382
+ @RELATION Beer
383
+
384
+ @ATTRIBUTE Brewery
385
+ @ATTRIBUTE City
386
+ @ATTRIBUTE Name
387
+ @ATTRIBUTE Abv
388
+
389
+ @DATA
390
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
391
+ Augustiner Bräu München,München,Edelstoff,5.6%
392
+
393
+ Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
394
+ Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
395
+ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
396
+ Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
397
+ ```
398
+
362
399
 
363
400
  ### Q: How can I change the default format / dialect?
364
401
 
@@ -166,4 +166,4 @@ end # class CsvHashReader
166
166
 
167
167
 
168
168
  # say hello
169
- puts CsvReader.banner if $DEBUG || (defined?($RUBYLIBS_DEBUG) && $RUBYLIBS_DEBUG)
169
+ puts CsvReader.banner if $DEBUG || (defined?($RUBYCOCO_DEBUG) && $RUBYCOCO_DEBUG)
@@ -128,13 +128,13 @@ end
128
128
 
129
129
 
130
130
 
131
- def parse_quote( input, quote:)
131
+ def parse_quote( input, opening_quote:, closing_quote:)
132
132
  value = ""
133
- if input.peek == quote
134
- input.getc ## eat-up quote
133
+ if input.peek == opening_quote
134
+ input.getc ## eat-up opening quote
135
135
 
136
136
  loop do
137
- while (c=input.peek; !(c==quote || c==BACKSLASH || input.eof?))
137
+ while (c=input.peek; !(c==closing_quote || c==BACKSLASH || input.eof?))
138
138
  value << input.getc ## eat-up everything until hitting quote (e.g. " or ') or backslash (escape)
139
139
  end
140
140
 
@@ -144,7 +144,9 @@ def parse_quote( input, quote:)
144
144
  value << parse_escape( input )
145
145
  else ## assume input.peek == quote
146
146
  input.getc ## eat-up quote
147
- if input.peek == quote ## doubled up quote?
147
+ if opening_quote == closing_quote && input.peek == closing_quote
148
+ ## doubled up quote?
149
+ # note: only works (enabled) for "" or '' and NOT for «»,‹›.. (if opening and closing differ)
148
150
  value << input.getc ## add doube quote and continue!!!!
149
151
  else
150
152
  break
@@ -152,7 +154,7 @@ def parse_quote( input, quote:)
152
154
  end
153
155
  end
154
156
  else
155
- raise ParseError.new( "found >#{input.peek} (#{input.peek.ord})< - QUOTE (#{quote}) expected in parse_quote!!!!" )
157
+ raise ParseError.new( "found >#{input.peek} (#{input.peek.ord})< - CLOSING QUOTE (#{closing_quote}) expected in parse_quote!!!!" )
156
158
  end
157
159
  value
158
160
  end
@@ -182,18 +184,36 @@ def parse_field( input )
182
184
  end
183
185
  elsif input.peek == DOUBLE_QUOTE
184
186
  logger.debug "start double_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
185
- value << parse_quote( input, quote: DOUBLE_QUOTE )
187
+ value << parse_quote( input, opening_quote: DOUBLE_QUOTE,
188
+ closing_quote: DOUBLE_QUOTE )
186
189
 
187
190
  ## note: always eat-up all trailing spaces (" ") and tabs (\t)
188
191
  skip_spaces( input )
189
192
  logger.debug "end double_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
190
193
  elsif input.peek == SINGLE_QUOTE ## allow single quote too (by default)
191
194
  logger.debug "start single_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
192
- value << parse_quote( input, quote: SINGLE_QUOTE )
195
+ value << parse_quote( input, opening_quote: SINGLE_QUOTE,
196
+ closing_quote: SINGLE_QUOTE )
193
197
 
194
198
  ## note: always eat-up all trailing spaces (" ") and tabs (\t)
195
199
  skip_spaces( input )
196
200
  logger.debug "end single_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
201
+ elsif input.peek == "«"
202
+ value << parse_quote( input, opening_quote: "«",
203
+ closing_quote: "»" )
204
+ skip_spaces( input )
205
+ elsif input.peek == "»"
206
+ value << parse_quote( input, opening_quote: "»",
207
+ closing_quote: "«" )
208
+ skip_spaces( input )
209
+ elsif input.peek == "‹"
210
+ value << parse_quote( input, opening_quote: "‹",
211
+ closing_quote: "›" )
212
+ skip_spaces( input )
213
+ elsif input.peek == "›"
214
+ value << parse_quote( input, opening_quote: "›",
215
+ closing_quote: "‹" )
216
+ skip_spaces( input )
197
217
  else
198
218
  logger.debug "start reg field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
199
219
  ## consume simple value
@@ -5,7 +5,7 @@ class CsvReader ## note: uses a class for now - change to module - why? why no
5
5
 
6
6
  MAJOR = 1 ## todo: namespace inside version or something - why? why not??
7
7
  MINOR = 1
8
- PATCH = 2
8
+ PATCH = 3
9
9
  VERSION = [MAJOR,MINOR,PATCH].join('.')
10
10
 
11
11
 
@@ -0,0 +1,53 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_parser_quotes.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+
11
+ class TestParserQuotes < MiniTest::Test
12
+
13
+
14
+ def parser
15
+ CsvReader::Parser::DEFAULT
16
+ end
17
+
18
+
19
+ def test_french_single
20
+ assert_equal [[ "a", "b", "c" ]],
21
+ parser.parse( " ‹a›, ‹b›, ›c‹ " )
22
+
23
+ assert_equal [[ "a,1", " b,2", "c, 3" ]],
24
+ parser.parse( " ‹a,1›, ‹ b,2›, ›c, 3‹ " )
25
+
26
+ assert_equal [[ %Q{"a"}, %Q{'b'}, %Q{c'"'"} ]],
27
+ parser.parse( %Q{ ‹"a"›, ‹'b'›, ›c'"'"‹} )
28
+
29
+ # note: quote matches only if first non-whitespace char
30
+ assert_equal [[ "_‹a›", "_‹b›", "›c‹" ]],
31
+ parser.parse( %Q{ _‹a›, _‹b›, "›c‹"} )
32
+
33
+ end
34
+
35
+
36
+ def test_french_double
37
+ assert_equal [[ "a", "b", "c" ]],
38
+ parser.parse( " «a», «b», »c« " )
39
+
40
+ assert_equal [[ "a,1", " b,2", "c, 3" ]],
41
+ parser.parse( " «a,1», « b,2», »c, 3« " )
42
+
43
+ assert_equal [[ %Q{"a"}, %Q{'b'}, %Q{c'"'"} ]],
44
+ parser.parse( %Q{ «"a"», «'b'», »c'"'"«} )
45
+
46
+ # note: quote matches only if first non-whitespace char
47
+ assert_equal [[ "_«a»", "_«b»", "»c«" ]],
48
+ parser.parse( %Q{ _«a», _«b», "»c«"} )
49
+
50
+ end
51
+
52
+
53
+ end # class TestParserQuotes
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csvreader
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.2
4
+ version: 1.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gerald Bauer
@@ -88,6 +88,7 @@ files:
88
88
  - test/test_parser_meta.rb
89
89
  - test/test_parser_null.rb
90
90
  - test/test_parser_numeric.rb
91
+ - test/test_parser_quotes.rb
91
92
  - test/test_parser_strict.rb
92
93
  - test/test_parser_tab.rb
93
94
  - test/test_reader.rb