csvreader 1.2.1 → 1.2.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: e61cda6f5b0fae762451efa0b0819e53b6da9966
4
- data.tar.gz: e6dadbde1d714247046603fbdcb1fbd348cacc4c
3
+ metadata.gz: a8e2f4f6e06ec63483735c1e0966b61398df85eb
4
+ data.tar.gz: 6c867acfa43c261473b6d6300e3ecd8d7042f0dc
5
5
  SHA512:
6
- metadata.gz: 6494cb0052000592cff4766946c3b7db0ec026db220f2e3857563d6070f282089034a8405c3eb4d8807f1a2dbe4cce67bd789b2794f803765f5fe9702d62a856
7
- data.tar.gz: b47bd4cc6a342c5cc5e01e5ec1a67c7ae2fc3a7fc6fb4c1bcbcac018ab0291c6af076444aafb7ecac2f83ba814a502366583bbb61fa36c961e7009b217a3819b
6
+ metadata.gz: 2ddc944ee42de5660c68e057d0cbdbdf81b0f135b603e87231da2258bbd9001fc2483881e1c0ff15d5d7b23d2cb9f1b2c799a77a3551dbaee18094ba9aad5086
7
+ data.tar.gz: 39743a7df8b49b45a9ad1dd7ce8598616733348ecdbc52b94de4e8fc1fb7a54e33d0cbc36f3750a5940c9d46310d94a4c6d983811fb506efa2349bb0adc94d16
@@ -32,6 +32,7 @@ test/helper.rb
32
32
  test/test_buffer.rb
33
33
  test/test_converter.rb
34
34
  test/test_parser.rb
35
+ test/test_parser_autofix.rb
35
36
  test/test_parser_directive.rb
36
37
  test/test_parser_fixed.rb
37
38
  test/test_parser_formats.rb
data/README.md CHANGED
@@ -12,10 +12,16 @@
12
12
 
13
13
  ## What's News?
14
14
 
15
+ **v1.2.2** Added auto-fix/correction/recovery
16
+ for double quoted value with extra trailing value
17
+ to the default parser (`ParserStd`) e.g. `"Freddy" Mercury`
18
+ will get read "as is" and turned
19
+ into an "unquoted" value with "literal" quotes e.g. `"Freddy" Mercury`.
15
20
 
16
- **v1.2.1** Added support for (optional) hashtag to the
21
+
22
+ **v1.2.1** Added support for (optional) hashtag to the
17
23
  to the default parser (`ParserStd`) for
18
- supporting the [Humanitarian eXchange Language (HXL)](http://hxlstandard.org).
24
+ supporting the [Humanitarian eXchange Language (HXL)](https://github.com/csvspecs/csv-hxl).
19
25
  Default is turned off (`false`). Use `Csv.human`
20
26
  or `Csv.hum` or `Csv.hxl` for pre-defined with hashtag turned on.
21
27
 
@@ -53,7 +59,7 @@ With the "strict" parser you will get a firework of "stray" quote errors / excep
53
59
 
54
60
 
55
61
  **v1.1.1**: Added built-in support for (optional) alternative comments (`%`) - used by
56
- [ARFF (attribute-relation file format)](https://waikato.github.io/weka-wiki/arff/) -
62
+ [ARFF (attribute-relation file format)](https://github.com/csvspecs/csv-meta#attribute-relation-classic) -
57
63
  and support for (optional) directives (`@`) in header (that is, before any records)
58
64
  to default parser ("The Right Way").
59
65
  Now you can use either `#` or `%` for comments, the first one "wins" - you CANNOT use both.
@@ -68,13 +74,12 @@ e.g. `Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
68
74
 
69
75
  **v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
70
76
  in header (that is, before any records)
71
- to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](http://csvy.org).
77
+ to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](https://github.com/csvspecs/csv-meta#front-matter-in-yaml).
72
78
  Use `Csv.parser.meta` to get the parsed meta data block hash (or `nil`) if none.
73
79
 
74
80
 
75
81
 
76
82
 
77
-
78
83
  ## Usage
79
84
 
80
85
 
@@ -197,6 +197,26 @@ def parse_quote( input, sep:, opening_quote:, closing_quote:)
197
197
  end
198
198
 
199
199
 
200
+ def parse_field_until_sep( input, sep: )
201
+ value = ""
202
+ logger.debug "start reg field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
203
+ ## consume simple value
204
+ ## until we hit "," or "\n" or "\r"
205
+ ## note: will eat-up quotes too!!!
206
+ while (c=input.peek; !(c==sep || c==LF || c==CR || input.eof?))
207
+ if input.peek == BACKSLASH
208
+ value << parse_escape( input, sep: sep )
209
+ else
210
+ logger.debug " add char >#{input.peek}< (#{input.peek.ord})" if logger.debug?
211
+ value << input.getc ## note: eat-up all spaces (" ") and tabs (\t) too (strip trailing spaces at the end)
212
+ end
213
+ end
214
+ ## note: only strip **trailing** spaces (space and tab only)
215
+ ## do NOT strip newlines etc. might have been added via escape! e.g. \\\n
216
+ value = value.sub( /[ \t]+$/, '' )
217
+ value
218
+ end
219
+
200
220
 
201
221
 
202
222
  def parse_field( input, sep: )
@@ -226,7 +246,23 @@ def parse_field( input, sep: )
226
246
  closing_quote: DOUBLE_QUOTE )
227
247
 
228
248
  ## note: always eat-up all trailing spaces (" ") and tabs (\t)
229
- skip_spaces( input )
249
+ spaces_count = skip_spaces( input )
250
+
251
+ ## check for auto-fix trailing data after quoted value e.g. ---,"Fredy" Mercury,---
252
+ ## todo/fix: add auto-fix for all quote variants!!!!!!!!!!!!!!!!!!!!
253
+ if (c=input.peek; c==sep || c==LF || c==CR || input.eof?)
254
+ ## everything ok (that is, regular quoted value)!!!
255
+ else
256
+ ## try auto-fix
257
+ ## todo: report warning/issue error (if configured)!!!
258
+ extra_value = parse_field_until_sep( input, sep: sep )
259
+ ## "reconstruct" non-quoted value
260
+ spaces = ' ' * spaces_count ## todo: preserve tab (\t) - why? why not?
261
+ ## note: minor (theoratical) issue (doubled quoted got "collapsed/escaped" to one from two in quoted value)
262
+ ## e.g. "hello """ extra, (becomes)=> "hello "" extra (one quote less/"eaten up")
263
+ value = %Q{"#{value}"#{spaces}#{extra_value}}
264
+ end
265
+
230
266
  logger.debug "end double_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
231
267
  elsif input.peek == SINGLE_QUOTE ## allow single quote too (by default)
232
268
  logger.debug "start single_quote field - peek >#{input.peek}< (#{input.peek.ord})" if logger.debug?
@@ -6,7 +6,7 @@ class CsvReader ## note: uses a class for now - change to module - why? why no
6
6
  module Version
7
7
  MAJOR = 1 ## todo: namespace inside version or something - why? why not??
8
8
  MINOR = 2
9
- PATCH = 1
9
+ PATCH = 2
10
10
 
11
11
  ## self.to_s - why? why not?
12
12
  end
@@ -0,0 +1,28 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_parser_autofix.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+
11
+ class TestParserAutofix < MiniTest::Test
12
+
13
+
14
+ def parser
15
+ CsvReader::Parser::DEFAULT
16
+ end
17
+
18
+
19
+ def test_quote_with_trailing_value
20
+ recs = [[ "Farrokh", "\"Freddy\" Mercury", "Bulsara" ]]
21
+
22
+ assert_equal recs, parser.parse( %Q{Farrokh,"Freddy" Mercury,Bulsara} )
23
+ assert_equal recs, parser.parse( %Q{ Farrokh , "Freddy" Mercury , Bulsara } )
24
+ assert_equal recs, parser.parse( %Q{Farrokh, "Freddy" Mercury ,Bulsara} )
25
+ end
26
+
27
+
28
+ end # class TestParserAutofix
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csvreader
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.1
4
+ version: 1.2.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gerald Bauer
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-11-06 00:00:00.000000000 Z
11
+ date: 2018-11-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rdoc
@@ -83,6 +83,7 @@ files:
83
83
  - test/test_buffer.rb
84
84
  - test/test_converter.rb
85
85
  - test/test_parser.rb
86
+ - test/test_parser_autofix.rb
86
87
  - test/test_parser_directive.rb
87
88
  - test/test_parser_fixed.rb
88
89
  - test/test_parser_formats.rb