csvreader 0.7.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: c61a8e62f99e1a06c119b4995e0e4e1d3c829d71
4
- data.tar.gz: 1b59f3415f3f0fe449a8c2395d2cefc7a7bd855c
3
+ metadata.gz: aa8aec6ffb59bb3e27d09889ebd1294364d288eb
4
+ data.tar.gz: 913002d3c342651381bf38fc952b56913f2554da
5
5
  SHA512:
6
- metadata.gz: 52e3e8effa09f492c38736f1fb16552341237ebe0e8e5452a72c080d03f48e47b9d66fbfa1b8f42e3ccd86d37038975254da0e3038f297174636a3b34ce57542
7
- data.tar.gz: 66c17daaa22d6e5d526c76b9568737b4982da9f02b11e9264fdb9d9644d5e33881255f623a174515c78714352b2df54c811cb8b6e0222aa5cc79682ac320ee62
6
+ metadata.gz: 23d5bedb995926f464a4bd95e62c52eb50dd8ee109ae883b934c616b62cbbf9b9239f184e89027dda2fdf14d41af63c7d41a26f869951544c90ed6ad662be8b3
7
+ data.tar.gz: c212ad5acdc55f5105bd5c412fcfef36f18370282f84295b47ae517cbcf5f03d9bb78440709cf6137c657feb67028de266c18cd3cb577877547249413cc1783a
@@ -6,6 +6,7 @@ Rakefile
6
6
  lib/csvreader.rb
7
7
  lib/csvreader/buffer.rb
8
8
  lib/csvreader/builder.rb
9
+ lib/csvreader/converter.rb
9
10
  lib/csvreader/parser.rb
10
11
  lib/csvreader/parser_std.rb
11
12
  lib/csvreader/parser_strict.rb
@@ -20,6 +21,7 @@ test/data/cities11.csv
20
21
  test/data/customers11.csv
21
22
  test/data/shakespeare.csv
22
23
  test/helper.rb
24
+ test/test_converter.rb
23
25
  test/test_parser.rb
24
26
  test/test_parser_formats.rb
25
27
  test/test_parser_java.rb
@@ -27,4 +29,6 @@ test/test_parser_null.rb
27
29
  test/test_parser_strict.rb
28
30
  test/test_parser_tab.rb
29
31
  test/test_reader.rb
32
+ test/test_reader_converters.rb
30
33
  test/test_reader_hash.rb
34
+ test/test_reader_hash_converters.rb
data/README.md CHANGED
@@ -40,6 +40,67 @@ end
40
40
  ```
41
41
 
42
42
 
43
+ ### What about converters?
44
+
45
+ Use the converters keyword option to (auto-)convert strings to nulls, booleans, integers, floats, dates, etc.
46
+ Example:
47
+
48
+ ``` ruby
49
+ txt <<=TXT
50
+ 1,2,3
51
+ true,false,null
52
+ TXT
53
+
54
+ records = Csv.parse( txt, :converters => :all ) ## or CsvReader.parse
55
+ pp records
56
+ # => [[1,2,3],
57
+ # [true,false,nil]]
58
+ ```
59
+
60
+
61
+ Built-in converters include:
62
+
63
+ | Converter | Comments |
64
+ |--------------|-------------------|
65
+ | `:integer` | convert matching strings to integer |
66
+ | `:float` | convert matching strings to float |
67
+ | `:numeric` | shortcut for `[:integer, :float]` |
68
+ | `:date` | convert matching strings to `Date` (year/month/day) |
69
+ | `:date_time` | convert matching strings to `DateTime` |
70
+ | `:null` | convert matching strings to null (`nil`) |
71
+ | `:boolean` | convert matching strings to boolean (`true` or `false`) |
72
+ | `:all` | shortcut for `[:null, :boolean, :date_time, :numeric]` |
73
+
74
+
75
+
76
+ ### What about Enumerable?
77
+
78
+ Yes, every reader includes `Enumerable` and runs on `each`.
79
+ Use `new` or `open` without a block
80
+ to get the enumerator (iterator).
81
+ Example:
82
+
83
+
84
+ ``` ruby
85
+ csv = Csv.new( "a,b,c" )
86
+ it = csv.to_enum
87
+ pp it.next
88
+ # => ["a","b","c"]
89
+
90
+ # -or-
91
+
92
+ csv = Csv.open( "values.csv" )
93
+ it = csv.to_enum
94
+ pp it.next
95
+ # => ["1","2","3"]
96
+ pp it.next
97
+ # => ["5","6","7"]
98
+ ```
99
+
100
+
101
+
102
+
103
+
43
104
  ### What about headers?
44
105
 
45
106
  Use the `CsvHash`
@@ -87,6 +148,41 @@ end
87
148
  ```
88
149
 
89
150
 
151
+ ### What about symbol keys for hashes?
152
+
153
+ Yes, use can use the header_converters keyword option.
154
+ Use `:symbol` for (auto-)converting header (strings) to symbols.
155
+ Note: the symbol converter will also downcase all letters and
156
+ remove all non-alphanumeric (e.g. `!?$%`) chars
157
+ and replace spaces with underscores.
158
+
159
+ Example:
160
+
161
+ ``` ruby
162
+ txt <<=TXT
163
+ a,b,c
164
+ 1,2,3
165
+ true,false,null
166
+ TXT
167
+
168
+ records = CsvHash.parse( txt, :converters => :all, :header_converters => :symbol )
169
+ pp records
170
+ # => [{a: 1, b: 2, c: 3},
171
+ # {a: true, b: false, c: nil}]
172
+ ```
173
+
174
+ Built-in header converters include:
175
+
176
+ | Converter | Comments |
177
+ |--------------|---------------------|
178
+ | `:downcase` | downcase strings |
179
+ | `:symbol` | convert strings to symbols (and downcase and remove non-alphanumerics) |
180
+
181
+
182
+
183
+
184
+
185
+
90
186
 
91
187
  ## Frequently Asked Questions (FAQ) and Answers
92
188
 
@@ -128,25 +224,94 @@ Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
128
224
 
129
225
 
130
226
 
131
- ### Q: How can I change the separator to semicolon (`;`) or pipe (`|`)?
227
+ ### Q: How can I change the default format / dialect?
228
+
229
+ The reader includes more than half a dozen pre-configured formats,
230
+ dialects.
132
231
 
133
- Pass in the `sep` keyword option. Example:
232
+ Use strict if you do NOT want to trim leading and trailing spaces
233
+ and if you do NOT want to skip blank lines. Example:
134
234
 
135
235
  ``` ruby
136
- Csv.parse( ..., sep: ';' )
137
- Csv.read( ..., sep: ';' )
138
- # ...
139
- Csv.parse( ..., sep: '|' )
140
- Csv.read( ..., sep: '|' )
236
+ txt <<=TXT
237
+ 1, 2,3
238
+ 4,5 ,6
239
+
240
+ TXT
241
+
242
+ records = Csv.strict.parse( txt )
243
+ pp records
244
+ # => [["1","•2","3"],
245
+ # ["4","5•","6"],
246
+ # [""]]
247
+ ```
248
+
249
+ More strict pre-configured variants include:
250
+
251
+ `Csv.mysql` uses:
252
+
253
+ ``` ruby
254
+ ParserStrict.new( sep: "\t",
255
+ quote: false,
256
+ escape: true,
257
+ null: "\\N" )
258
+ ```
259
+
260
+ `Csv.postgres` or `Csv.postgresql` uses:
261
+
262
+ ``` ruby
263
+ ParserStrict.new( doublequote: false,
264
+ escape: true,
265
+ null: "" )
266
+ ```
267
+
268
+ `Csv.postgres_text` or `Csv.postgresql_text` uses:
269
+
270
+ ``` ruby
271
+ ParserStrict.new( sep: "\t",
272
+ quote: false,
273
+ escape: true,
274
+ null: "\\N" )
275
+ ```
276
+
277
+ and so on.
278
+
279
+
280
+ ### Q: How can I change the separator to semicolon (`;`) or pipe (`|`) or tab (`\t`)?
281
+
282
+ Pass in the `sep` keyword option
283
+ to the "strict" parser. Example:
284
+
285
+ ``` ruby
286
+ Csv.strict.parse( ..., sep: ';' )
287
+ Csv.strict.read( ..., sep: ';' )
141
288
  # ...
289
+ Csv.strict.parse( ..., sep: '|' )
290
+ Csv.strict.read( ..., sep: '|' )
142
291
  # and so on
143
292
  ```
144
293
 
145
-
146
- Note: If you use tab (`\t`) use the `TabReader`! Why? Tab =! CSV. Yes, tab is
294
+ Note: If you use tab (`\t`) use the `TabReader`
295
+ (or for your convenience the built-in `Csv.tab` alias)!
296
+ Why? Tab =! CSV. Yes, tab is
147
297
  its own (even) simpler format
148
298
  (e.g. no escape rules, no newlines in values, etc.),
149
- see [`TabReader` »](https://github.com/datatxt/tabreader).
299
+ see [`TabReader` »](https://github.com/csv11/tabreader).
300
+
301
+ ``` ruby
302
+ Csv.tab.parse( ... ) # note: "classic" strict tab format
303
+ Csv.tab.read( ... )
304
+ # ...
305
+ ```
306
+
307
+ If you want double quote escape rules, newlines in quotes values, etc. use
308
+ the "strict" parser with the separator (`sep`) changed to tab (`\t`).
309
+
310
+ ``` ruby
311
+ Csv.strict.parse( ..., sep: "\t" ) # note: csv-like tab format with quotes
312
+ Csv.strict.read( ..., sep: "\t" )
313
+ # ...
314
+ ```
150
315
 
151
316
 
152
317
 
@@ -4,6 +4,8 @@
4
4
  require 'pp'
5
5
  require 'logger'
6
6
  require 'forwardable'
7
+ require 'stringio'
8
+ require 'date' ## use for Date.parse and DateTime.parse
7
9
 
8
10
 
9
11
  ###
@@ -17,6 +19,113 @@ require 'csvreader/parser'
17
19
  require 'csvreader/builder'
18
20
  require 'csvreader/reader'
19
21
  require 'csvreader/reader_hash'
22
+ require 'csvreader/converter'
23
+
24
+
25
+
26
+ class CsvReader
27
+ class Parser
28
+
29
+ ## use/allow different "backends" e.g. ParserStd, ParserStrict, ParserTab, etc.
30
+ ## parser must support parse method (with and without block)
31
+ ## e.g. records = parse( data )
32
+ ## -or-
33
+ ## parse( data ) do |record|
34
+ ## end
35
+
36
+ DEFAULT = ParserStd.new
37
+
38
+ RFC4180 = ParserStrict.new
39
+ STRICT = ParserStrict.new ## note: make strict its own instance (so you can change config without "breaking" rfc4180)
40
+ EXCEL = ParserStrict.new ## note: make excel its own instance (so you can change configs without "breaking" rfc4180/strict)
41
+
42
+ MYSQL = ParserStrict.new( sep: "\t",
43
+ quote: false,
44
+ escape: true,
45
+ null: "\\N" )
46
+
47
+ POSTGRES = POSTGRESQL = ParserStrict.new( doublequote: false,
48
+ escape: true,
49
+ null: "" )
50
+
51
+ POSTGRES_TEXT = POSTGRESQL_TEXT = ParserStrict.new( sep: "\t",
52
+ quote: false,
53
+ escape: true,
54
+ null: "\\N" )
55
+
56
+ TAB = ParserTab.new
57
+
58
+
59
+ def self.default() DEFAULT; end ## alternative alias for DEFAULT
60
+ def self.strict() STRICT; end ## alternative alias for STRICT
61
+ def self.rfc4180() RFC4180; end ## alternative alias for RFC4180
62
+ def self.excel() EXCEL; end ## alternative alias for EXCEL
63
+ def self.mysql() MYSQL; end
64
+ def self.postgresql() POSTGRESQL; end
65
+ def self.postgres() postgresql; end
66
+ def self.postgresql_text() POSTGRESQL_TEXT; end
67
+ def self.postgres_text() postgresql_text; end
68
+ def self.tab() TAB; end
69
+ end # class Parser
70
+ end # class CsvReader
71
+
72
+
73
+
74
+ class CsvReader
75
+ ### pre-define CsvReader (built-in) formats/dialect
76
+ DEFAULT = CsvBuilder.new( Parser::DEFAULT )
77
+
78
+ STRICT = CsvBuilder.new( Parser::STRICT )
79
+ RFC4180 = CsvBuilder.new( Parser::RFC4180 )
80
+ EXCEL = CsvBuilder.new( Parser::EXCEL )
81
+
82
+ MYSQL = CsvBuilder.new( Parser::MYSQL )
83
+ POSTGRES = POSTGRESQL = CsvBuilder.new( Parser::POSTGRESQL )
84
+ POSTGRES_TEXT = POSTGRESQL_TEXT = CsvBuilder.new( Parser::POSTGRESQL_TEXT )
85
+
86
+ TAB = CsvBuilder.new( Parser::TAB )
87
+
88
+
89
+ def self.default() DEFAULT; end ## alternative alias for DEFAULT
90
+ def self.strict() STRICT; end ## alternative alias for STRICT
91
+ def self.rfc4180() RFC4180; end ## alternative alias for RFC4180
92
+ def self.excel() EXCEL; end ## alternative alias for EXCEL
93
+ def self.mysql() MYSQL; end
94
+ def self.postgresql() POSTGRESQL; end
95
+ def self.postgres() postgresql; end
96
+ def self.postgresql_text() POSTGRESQL_TEXT; end
97
+ def self.postgres_text() postgresql_text; end
98
+ def self.tab() TAB; end
99
+ end # class CsvReader
100
+
101
+
102
+ class CsvHashReader
103
+ ### pre-define CsvReader (built-in) formats/dialect
104
+ DEFAULT = CsvHashBuilder.new( CsvReader::Parser::DEFAULT )
105
+
106
+ STRICT = CsvHashBuilder.new( CsvReader::Parser::STRICT )
107
+ RFC4180 = CsvHashBuilder.new( CsvReader::Parser::RFC4180 )
108
+ EXCEL = CsvHashBuilder.new( CsvReader::Parser::EXCEL )
109
+
110
+ MYSQL = CsvHashBuilder.new( CsvReader::Parser::MYSQL )
111
+ POSTGRES = POSTGRESQL = CsvHashBuilder.new( CsvReader::Parser::POSTGRESQL )
112
+ POSTGRES_TEXT = POSTGRESQL_TEXT = CsvHashBuilder.new( CsvReader::Parser::POSTGRESQL_TEXT )
113
+
114
+ TAB = CsvHashBuilder.new( CsvReader::Parser::TAB )
115
+
116
+
117
+ def self.default() DEFAULT; end ## alternative alias for DEFAULT
118
+ def self.strict() STRICT; end ## alternative alias for STRICT
119
+ def self.rfc4180() RFC4180; end ## alternative alias for RFC4180
120
+ def self.excel() EXCEL; end ## alternative alias for EXCEL
121
+ def self.mysql() MYSQL; end
122
+ def self.postgresql() POSTGRESQL; end
123
+ def self.postgres() postgresql; end
124
+ def self.postgresql_text() POSTGRESQL_TEXT; end
125
+ def self.postgres_text() postgresql_text; end
126
+ def self.tab() TAB; end
127
+ end # class CsvHashReader
128
+
20
129
 
21
130
 
22
131
 
@@ -2,21 +2,21 @@
2
2
 
3
3
 
4
4
  class CsvBuilder ## rename to CsvReaderBuilder - why? why not?
5
+
6
+
5
7
  def initialize( parser )
6
8
  @parser = parser
7
9
  end
8
10
 
11
+ def config() @parser.config; end ## (auto-)forward to wrapped parser
9
12
 
10
13
  ## todo/fix:
11
14
  ## add parser config (attribute) setter e.g.
12
15
  ## - sep=(value)
13
16
  ## - comment=(value)
14
17
  ## - and so on!!!
15
- ##
16
- ## add config too - why? why not?
17
-
18
18
 
19
- def open( path, mode='r:bom|utf-8',
19
+ def open( path, mode=nil,
20
20
  sep: nil,
21
21
  converters: nil,
22
22
  parser: @parser, &block )
@@ -54,3 +54,67 @@ class CsvBuilder ## rename to CsvReaderBuilder - why? why not?
54
54
  parser: @parser, &block )
55
55
  end
56
56
  end # class CsvBuilder
57
+
58
+
59
+
60
+
61
+ class CsvHashBuilder ## rename to CsvHashReaderBuilder - why? why not?
62
+ def initialize( parser )
63
+ @parser = parser
64
+ end
65
+
66
+ def config() @parser.config; end ## (auto-)forward to wrapped parser
67
+
68
+ ## todo/fix:
69
+ ## add parser config (attribute) setter e.g.
70
+ ## - sep=(value)
71
+ ## - comment=(value)
72
+ ## - and so on!!!
73
+
74
+
75
+ def open( path, mode=nil,
76
+ headers: nil,
77
+ sep: nil,
78
+ converters: nil,
79
+ header_converters: nil,
80
+ parser: @parser, &block )
81
+ CsvHashReader.open( path, mode,
82
+ headers: headers, sep: sep, converters: converters,
83
+ header_converters: header_converters,
84
+ parser: @parser, &block )
85
+ end
86
+
87
+ def read( path, headers: nil,
88
+ sep: nil,
89
+ converters: nil,
90
+ header_converters: nil )
91
+ CsvHashReader.read( path,
92
+ headers: headers,
93
+ sep: sep, converters: converters,
94
+ header_converters: header_converters,
95
+ parser: @parser )
96
+ end
97
+
98
+ def foreach( path, headers: nil,
99
+ sep: nil,
100
+ converters: nil,
101
+ header_converters: nil, &block )
102
+ CsvHashReader.foreach( path,
103
+ headers: headers,
104
+ sep: sep, converters: converters,
105
+ header_converters: header_converters,
106
+ parser: @parser, &block )
107
+ end
108
+
109
+
110
+ def parse( data, headers: nil,
111
+ sep: nil,
112
+ converters: nil,
113
+ header_converters: nil, &block )
114
+ CsvHashReader.parse( data,
115
+ headers: headers,
116
+ sep: sep, converters: converters,
117
+ header_converters: header_converters,
118
+ parser: @parser, &block )
119
+ end
120
+ end # class CsvHashBuilder