rcsv 0.0.6 → 0.0.8
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile.lock +1 -1
- data/README.md +38 -4
- data/RELNOTES +5 -0
- data/ext/rcsv/csv.h +5 -3
- data/ext/rcsv/libcsv.c +4 -2
- data/ext/rcsv/rcsv.c +83 -42
- data/lib/rcsv/version.rb +1 -1
- data/lib/rcsv.rb +18 -4
- data/test/test_rcsv_raw_parse.rb +117 -6
- metadata +3 -2
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -4,7 +4,7 @@
|
|
4
4
|
|
5
5
|
Rcsv is a fast CSV parsing library for MRI Ruby. Tested on REE 1.8.7 and Ruby 1.9.3.
|
6
6
|
|
7
|
-
Contrary to many other gems that implement their own parsers, Rcsv uses libcsv 3.0.
|
7
|
+
Contrary to many other gems that implement their own parsers, Rcsv uses libcsv 3.0.3 (http://sourceforge.net/projects/libcsv/). As long as libcsv's API is stable, getting Rcsv to use newer libcsv version is as simple as updating two files (csv.h and libcsv.c).
|
8
8
|
|
9
9
|
## Benchmarks
|
10
10
|
user system total real
|
@@ -48,6 +48,7 @@ Quickstart:
|
|
48
48
|
|
49
49
|
|
50
50
|
Rcsv class exposes a class method *parse* that accepts a CSV string as its first parameter and options hash as its second parameter.
|
51
|
+
If block is passed, Rcsv sequentially yields every parsed line to it and the #parse method itself returns nil.
|
51
52
|
|
52
53
|
|
53
54
|
Options supported:
|
@@ -60,7 +61,19 @@ A single-character string that is used as a separator. Default is ",".
|
|
60
61
|
|
61
62
|
A boolean flag. When enabled, allows to parse oddly quoted CSV data without exceptions being raised. Disabled by default.
|
62
63
|
|
63
|
-
Anything that does not conform to http://www.
|
64
|
+
Anything that does not conform to http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm should better be parsed with this option enabled.
|
65
|
+
|
66
|
+
### :parse_empty_fields_as
|
67
|
+
|
68
|
+
A Ruby symbol that specifies how empty CSV fields should be processed. Accepted values:
|
69
|
+
|
70
|
+
* :nil_or_string (default) - If empty field is quoted, it is parsed as empty Ruby string. If empty field is not quoted, it is parsed as Nil.
|
71
|
+
|
72
|
+
* :nil - Always parse as Nil.
|
73
|
+
|
74
|
+
* :string - Always parse as empty string.
|
75
|
+
|
76
|
+
This option doesn't affect defaults processing: all empty fields are replaced with default values if the latter are provided (via per-column :default).
|
64
77
|
|
65
78
|
### :offset_rows
|
66
79
|
|
@@ -73,7 +86,7 @@ If CSV has a header, :columns keys can be strings that are equal to column names
|
|
73
86
|
|
74
87
|
:columns values are in turn hashes that provide parsing options:
|
75
88
|
|
76
|
-
* :alias - Object of any type (though usually a Symbol) that is used
|
89
|
+
* :alias - Object of any type (though usually a Symbol) that is used as a key that represents column name when :row_as_hash is set.
|
77
90
|
* :type - A Ruby Symbol that specifies Ruby data type that CSV cell value should be converted into. Supported types: :int, :float, :string, :bool. :string is the default.
|
78
91
|
* :default - Object of any type (though usually of the same type that is specified by :type option). If CSV doesn't have any value for a cell, this default value is used.
|
79
92
|
* :match - A string. If set, makes Rcsv skip all the rows where any column doesn't match its :match value. Useful for filtering data.
|
@@ -96,6 +109,10 @@ When :row_as_hash is disabled, return value is represented as array of arrays.
|
|
96
109
|
### :only_listed_columns
|
97
110
|
A boolean flag. If enabled, only parses columns that are listed in :columns. Disabled by default.
|
98
111
|
|
112
|
+
### :buffer_size
|
113
|
+
An integer. Default is 1MiB (1024 * 1024).
|
114
|
+
Specifies a number of bytes that are read at once, thus allowing to read drectly from IO-like objects (files, sockets etc).
|
115
|
+
|
99
116
|
|
100
117
|
## Examples
|
101
118
|
|
@@ -132,12 +149,29 @@ The result would look like this:
|
|
132
149
|
[ nil, 0, "Vacuum" ]
|
133
150
|
]
|
134
151
|
|
152
|
+
And here is an example of passing a block:
|
153
|
+
|
154
|
+
Rcsv.parse(some_csv) { |row|
|
155
|
+
puts row.inspect
|
156
|
+
}
|
157
|
+
|
158
|
+
That would display contents of each row without needing to put the whole parsed result array to memory:
|
159
|
+
|
160
|
+
["a", "b", "c", "d", "e", "f"]
|
161
|
+
["1", "2", "3", "4", "5", "6"]
|
162
|
+
|
163
|
+
|
164
|
+
This way it is possible to read from a File directly, with a 20MiB buffer and parse lines one by one:
|
165
|
+
|
166
|
+
Rcsv.parse(File.open('/some/file.csv'), :buffer_size => 20 * 1024 * 1024) { |row|
|
167
|
+
puts row.inspect
|
168
|
+
}
|
169
|
+
|
135
170
|
|
136
171
|
## To do
|
137
172
|
|
138
173
|
* More specs for boolean values
|
139
174
|
* Specs for Ruby parse
|
140
|
-
* Add custom Ruby callbacks (if block is passed)
|
141
175
|
* Add CSV write support
|
142
176
|
|
143
177
|
|
data/RELNOTES
ADDED
data/ext/rcsv/csv.h
CHANGED
@@ -8,8 +8,8 @@ extern "C" {
|
|
8
8
|
#endif
|
9
9
|
|
10
10
|
#define CSV_MAJOR 3
|
11
|
-
#define CSV_MINOR
|
12
|
-
#define CSV_RELEASE
|
11
|
+
#define CSV_MINOR 0
|
12
|
+
#define CSV_RELEASE 3
|
13
13
|
|
14
14
|
/* Error Codes */
|
15
15
|
#define CSV_SUCCESS 0
|
@@ -25,7 +25,9 @@ extern "C" {
|
|
25
25
|
#define CSV_STRICT_FINI 4 /* causes csv_fini to return CSV_EPARSE if last
|
26
26
|
field is quoted and doesn't containg ending
|
27
27
|
quote */
|
28
|
-
#define CSV_APPEND_NULL 8 /* Ensure that all fields are null-
|
28
|
+
#define CSV_APPEND_NULL 8 /* Ensure that all fields are null-terminated */
|
29
|
+
#define CSV_EMPTY_IS_NULL 16 /* Pass null pointer to cb1 function when
|
30
|
+
empty, unquoted fields are encountered */
|
29
31
|
|
30
32
|
|
31
33
|
/* Character values */
|
data/ext/rcsv/libcsv.c
CHANGED
@@ -25,7 +25,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
|
|
25
25
|
|
26
26
|
#include "csv.h"
|
27
27
|
|
28
|
-
#define VERSION "3.0.
|
28
|
+
#define VERSION "3.0.3"
|
29
29
|
|
30
30
|
#define ROW_NOT_BEGUN 0
|
31
31
|
#define FIELD_NOT_BEGUN 1
|
@@ -50,7 +50,9 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
|
|
50
50
|
entry_pos -= spaces; \
|
51
51
|
if (p->options & CSV_APPEND_NULL) \
|
52
52
|
((p)->entry_buf[entry_pos]) = '\0'; \
|
53
|
-
if (cb1) \
|
53
|
+
if (cb1 && (p->options & CSV_EMPTY_IS_NULL) && !quoted && entry_pos == 0) \
|
54
|
+
cb1(NULL, entry_pos, data); \
|
55
|
+
else if (cb1) \
|
54
56
|
cb1(p->entry_buf, entry_pos, data); \
|
55
57
|
pstate = FIELD_NOT_BEGUN; \
|
56
58
|
entry_pos = quoted = spaces = 0; \
|
data/ext/rcsv/rcsv.c
CHANGED
@@ -12,6 +12,7 @@ static VALUE rcsv_parse_error; /* class Rcsv::ParseError << StandardError; end *
|
|
12
12
|
struct rcsv_metadata {
|
13
13
|
/* Derived from user-specified options */
|
14
14
|
bool row_as_hash; /* Used to return array of hashes rather than array of arrays */
|
15
|
+
bool empty_field_is_nil; /* Do we convert empty fields to nils? */
|
15
16
|
size_t offset_rows; /* Number of rows to skip before parsing */
|
16
17
|
|
17
18
|
char * row_conversions; /* A pointer to string/array of row conversions char specifiers */
|
@@ -30,6 +31,7 @@ struct rcsv_metadata {
|
|
30
31
|
size_t current_col; /* Current column's index */
|
31
32
|
size_t current_row; /* Current row's index */
|
32
33
|
|
34
|
+
VALUE last_entry; /* A pointer to the last entry that's going to be appended to result */
|
33
35
|
VALUE * result; /* A pointer to the parsed data */
|
34
36
|
};
|
35
37
|
|
@@ -41,7 +43,6 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
|
|
41
43
|
struct rcsv_metadata * meta = (struct rcsv_metadata *) data;
|
42
44
|
char row_conversion = 0;
|
43
45
|
VALUE parsed_field;
|
44
|
-
VALUE last_entry = rb_ary_entry(*(meta->result), -1); /* result.last */
|
45
46
|
|
46
47
|
/* No need to parse anything until the end of the line if skip_current_row is set */
|
47
48
|
if (meta->skip_current_row) {
|
@@ -56,6 +57,7 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
|
|
56
57
|
|
57
58
|
/* Filter by string row values listed in meta->only_rows */
|
58
59
|
if ((meta->only_rows != NULL) &&
|
60
|
+
(field_str != NULL) && /* TODO: What if we want to filter out NULLs? */
|
59
61
|
(meta->current_col < meta->num_only_rows) &&
|
60
62
|
(meta->only_rows[meta->current_col] != NULL) &&
|
61
63
|
(strcmp(meta->only_rows[meta->current_col], field_str))) {
|
@@ -74,8 +76,12 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
|
|
74
76
|
/* Assigning appropriate default value if applicable. */
|
75
77
|
if (meta->current_col < meta->num_row_defaults) {
|
76
78
|
parsed_field = meta->row_defaults[meta->current_col];
|
77
|
-
} else { /*
|
78
|
-
|
79
|
+
} else { /* It depends on empty_field_is_nil if we convert empty strings to nils */
|
80
|
+
if (meta->empty_field_is_nil || field_str == NULL) {
|
81
|
+
parsed_field = Qnil;
|
82
|
+
} else {
|
83
|
+
parsed_field = rb_str_new2("");
|
84
|
+
}
|
79
85
|
}
|
80
86
|
} else {
|
81
87
|
if (meta->current_col < meta->num_row_conversions) {
|
@@ -136,10 +142,10 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
|
|
136
142
|
(int)meta->num_columns
|
137
143
|
);
|
138
144
|
} else {
|
139
|
-
rb_hash_aset(last_entry, meta->column_names[meta->current_col], parsed_field);
|
145
|
+
rb_hash_aset(meta->last_entry, meta->column_names[meta->current_col], parsed_field); /* last_entry[column_names[current_col]] = field */
|
140
146
|
}
|
141
147
|
} else { /* Parse into Array */
|
142
|
-
rb_ary_push(last_entry, parsed_field); /*
|
148
|
+
rb_ary_push(meta->last_entry, parsed_field); /* last_entry << field */
|
143
149
|
}
|
144
150
|
}
|
145
151
|
|
@@ -154,16 +160,22 @@ void end_of_line_callback(int last_char, void * data) {
|
|
154
160
|
|
155
161
|
/* If filters didn't match, current row parsing is reverted */
|
156
162
|
if (meta->skip_current_row) {
|
157
|
-
|
163
|
+
/* Do we wanna GC? */
|
158
164
|
meta->skip_current_row = false;
|
165
|
+
} else {
|
166
|
+
if (rb_block_given_p()) { /* STREAMING */
|
167
|
+
rb_yield(meta->last_entry);
|
168
|
+
} else {
|
169
|
+
rb_ary_push(*(meta->result), meta->last_entry);
|
170
|
+
}
|
159
171
|
}
|
160
172
|
|
161
|
-
/*
|
173
|
+
/* Re-initialize last_entry unless EOF reached */
|
162
174
|
if (last_char != -1) {
|
163
175
|
if (meta->row_as_hash) {
|
164
|
-
|
176
|
+
meta->last_entry = rb_hash_new(); /* {} */
|
165
177
|
} else {
|
166
|
-
|
178
|
+
meta->last_entry = rb_ary_new(); /* [] */
|
167
179
|
}
|
168
180
|
}
|
169
181
|
|
@@ -175,12 +187,19 @@ void end_of_line_callback(int last_char, void * data) {
|
|
175
187
|
return;
|
176
188
|
}
|
177
189
|
|
190
|
+
void custom_end_of_line_callback(int last_char, void * data) {
|
191
|
+
struct rcsv_metadata * meta = (struct rcsv_metadata *) data;
|
192
|
+
|
193
|
+
if (!meta->skip_current_row) {
|
194
|
+
}
|
195
|
+
}
|
196
|
+
|
178
197
|
/* C API */
|
179
198
|
|
180
199
|
/* The main method that handles parsing */
|
181
200
|
static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
182
201
|
struct rcsv_metadata meta;
|
183
|
-
VALUE
|
202
|
+
VALUE csvio, csvstr, buffer_size, options, option;
|
184
203
|
|
185
204
|
struct csv_parser cp;
|
186
205
|
unsigned char csv_options = CSV_STRICT_FINI | CSV_APPEND_NULL;
|
@@ -191,6 +210,7 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
|
191
210
|
|
192
211
|
/* Setting up some sane defaults */
|
193
212
|
meta.row_as_hash = false;
|
213
|
+
meta.empty_field_is_nil = false;
|
194
214
|
meta.skip_current_row = false;
|
195
215
|
meta.num_columns = 0;
|
196
216
|
meta.current_col = 0;
|
@@ -205,22 +225,34 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
|
205
225
|
meta.column_names = NULL;
|
206
226
|
meta.result = (VALUE[]){rb_ary_new()}; /* [] */
|
207
227
|
|
208
|
-
/*
|
209
|
-
rb_scan_args(argc, argv, "11", &
|
210
|
-
csv_string = StringValuePtr(str);
|
211
|
-
csv_string_len = strlen(csv_string);
|
228
|
+
/* csvio is required, options is optional (pun intended) */
|
229
|
+
rb_scan_args(argc, argv, "11", &csvio, &options);
|
212
230
|
|
213
231
|
/* options ||= nil */
|
214
232
|
if (NIL_P(options)) {
|
215
233
|
options = rb_hash_new();
|
216
234
|
}
|
217
235
|
|
236
|
+
buffer_size = rb_hash_aref(options, ID2SYM(rb_intern("buffer_size")));
|
237
|
+
|
218
238
|
/* By default, parsing is strict */
|
219
239
|
option = rb_hash_aref(options, ID2SYM(rb_intern("nostrict")));
|
220
240
|
if (!option || (option == Qnil)) {
|
221
241
|
csv_options |= CSV_STRICT;
|
222
242
|
}
|
223
243
|
|
244
|
+
/* By default, empty strings are treated as Nils and quoted empty strings are treated as empty Ruby strings */
|
245
|
+
option = rb_hash_aref(options, ID2SYM(rb_intern("parse_empty_fields_as")));
|
246
|
+
if ((option == Qnil) || (option == ID2SYM(rb_intern("nil_or_string")))) {
|
247
|
+
csv_options |= CSV_EMPTY_IS_NULL;
|
248
|
+
} else if (option == ID2SYM(rb_intern("nil"))) {
|
249
|
+
meta.empty_field_is_nil = true;
|
250
|
+
} else if (option == ID2SYM(rb_intern("string"))) {
|
251
|
+
meta.empty_field_is_nil = false;
|
252
|
+
} else {
|
253
|
+
rb_raise(rcsv_parse_error, "The only valid options for :parse_empty_fields_as are :nil, :string and :nil_or_string, but %s was supplied.", RSTRING_PTR(rb_inspect(option)));
|
254
|
+
}
|
255
|
+
|
224
256
|
/* Try to initialize libcsv */
|
225
257
|
if (csv_init(&cp, csv_options) == -1) {
|
226
258
|
rb_raise(rcsv_parse_error, "Couldn't initialize libcsv");
|
@@ -283,12 +315,14 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
|
283
315
|
meta.row_conversions = StringValuePtr(option);
|
284
316
|
}
|
285
317
|
|
286
|
-
|
318
|
+
/* Column names should be declared explicitly when parsing fields as Hashes */
|
287
319
|
if (meta.row_as_hash) { /* Only matters for hash results */
|
288
320
|
option = rb_hash_aref(options, ID2SYM(rb_intern("column_names")));
|
289
321
|
if (option == Qnil) {
|
290
322
|
rb_raise(rcsv_parse_error, ":row_as_hash requires :column_names to be set.");
|
291
323
|
} else {
|
324
|
+
meta.last_entry = rb_hash_new();
|
325
|
+
|
292
326
|
meta.num_columns = (size_t)RARRAY_LEN(option);
|
293
327
|
meta.column_names = (VALUE*)malloc(meta.num_columns * sizeof(VALUE*));
|
294
328
|
|
@@ -296,34 +330,37 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
|
296
330
|
meta.column_names[i] = rb_ary_entry(option, i);
|
297
331
|
}
|
298
332
|
}
|
299
|
-
}
|
300
|
-
|
301
|
-
/* Initializing result with empty Array */
|
302
|
-
if (meta.row_as_hash) {
|
303
|
-
rb_ary_push(*(meta.result), rb_hash_new()); /* [{}] */
|
304
333
|
} else {
|
305
|
-
|
334
|
+
meta.last_entry = rb_ary_new();
|
306
335
|
}
|
307
336
|
|
308
|
-
|
309
|
-
|
310
|
-
|
311
|
-
|
312
|
-
|
313
|
-
|
314
|
-
|
337
|
+
while(true) {
|
338
|
+
csvstr = rb_funcall(csvio, rb_intern("read"), 1, buffer_size);
|
339
|
+
if ((csvstr == Qnil) || (RSTRING_LEN(csvstr) == 0)) { break; }
|
340
|
+
|
341
|
+
csv_string = StringValuePtr(csvstr);
|
342
|
+
csv_string_len = strlen(csv_string);
|
343
|
+
|
344
|
+
/* Actual parsing and error handling */
|
345
|
+
if (csv_string_len != csv_parse(&cp, csv_string, csv_string_len,
|
346
|
+
&end_of_field_callback, &end_of_line_callback, &meta)) {
|
347
|
+
error = csv_error(&cp);
|
348
|
+
switch(error) {
|
349
|
+
case CSV_EPARSE:
|
350
|
+
rb_raise(rcsv_parse_error, "Error when parsing malformed data");
|
351
|
+
break;
|
352
|
+
case CSV_ENOMEM:
|
353
|
+
rb_raise(rcsv_parse_error, "No memory");
|
354
|
+
break;
|
355
|
+
case CSV_ETOOBIG:
|
356
|
+
rb_raise(rcsv_parse_error, "Field data is too large");
|
357
|
+
break;
|
358
|
+
case CSV_EINVALID:
|
359
|
+
rb_raise(rcsv_parse_error, "%s", (const char *)csv_strerror(error));
|
315
360
|
break;
|
316
|
-
|
317
|
-
|
318
|
-
|
319
|
-
case CSV_ETOOBIG:
|
320
|
-
rb_raise(rcsv_parse_error, "Field data is too large");
|
321
|
-
break;
|
322
|
-
case CSV_EINVALID:
|
323
|
-
rb_raise(rcsv_parse_error, "%s", (const char *)csv_strerror(error));
|
324
|
-
break;
|
325
|
-
default:
|
326
|
-
rb_raise(rcsv_parse_error, "Failed due to unknown reason");
|
361
|
+
default:
|
362
|
+
rb_raise(rcsv_parse_error, "Failed due to unknown reason");
|
363
|
+
}
|
327
364
|
}
|
328
365
|
}
|
329
366
|
|
@@ -344,12 +381,16 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
|
344
381
|
}
|
345
382
|
|
346
383
|
/* Remove the last row if it's empty. That happens if CSV file ends with a newline. */
|
347
|
-
if (RARRAY_LEN(
|
384
|
+
if (RARRAY_LEN(*(meta.result)) && /* meta.result.size != 0 */
|
385
|
+
RARRAY_LEN(rb_ary_entry(*(meta.result), -1)) == 0) {
|
348
386
|
rb_ary_pop(*(meta.result));
|
349
387
|
}
|
350
388
|
|
351
|
-
|
352
|
-
|
389
|
+
if (rb_block_given_p()) {
|
390
|
+
return Qnil; /* STREAMING */
|
391
|
+
} else {
|
392
|
+
return *(meta.result); /* Return accumulated result */
|
393
|
+
}
|
353
394
|
}
|
354
395
|
|
355
396
|
|
data/lib/rcsv/version.rb
CHANGED
data/lib/rcsv.rb
CHANGED
@@ -1,8 +1,10 @@
|
|
1
1
|
require "rcsv/rcsv"
|
2
2
|
require "rcsv/version"
|
3
3
|
|
4
|
+
require "stringio"
|
5
|
+
|
4
6
|
class Rcsv
|
5
|
-
def self.parse(csv_data, options = {})
|
7
|
+
def self.parse(csv_data, options = {}, &block)
|
6
8
|
#options = {
|
7
9
|
#:column_separator => "\t",
|
8
10
|
#:only_listed_columns => true,
|
@@ -25,16 +27,27 @@ class Rcsv
|
|
25
27
|
raw_options[:col_sep] = options[:column_separator] && options[:column_separator][0] || ','
|
26
28
|
raw_options[:offset_rows] = options[:offset_rows] || 0
|
27
29
|
raw_options[:nostrict] = options[:nostrict]
|
30
|
+
raw_options[:parse_empty_fields_as] = options[:parse_empty_fields_as]
|
31
|
+
raw_options[:buffer_size] = options[:buffer_size] || 1024 * 1024 # 1 MiB
|
32
|
+
|
33
|
+
if csv_data.is_a?(String)
|
34
|
+
csv_data = StringIO.new(csv_data)
|
35
|
+
elsif !(csv_data.respond_to?(:lines) && csv_data.respond_to?(:read))
|
36
|
+
inspected_csv_data = csv_data.inspect
|
37
|
+
raise ParseError.new("Supplied CSV object #{inspected_csv_data[0..127]}#{inspected_csv_data.size > 128 ? '...' : ''} is neither String nor looks like IO object.")
|
38
|
+
end
|
39
|
+
|
40
|
+
initial_position = csv_data.pos
|
28
41
|
|
29
42
|
case options[:header]
|
30
43
|
when :use
|
31
|
-
header = self.raw_parse(csv_data.lines.first, raw_options).first
|
44
|
+
header = self.raw_parse(StringIO.new(csv_data.lines.first), raw_options).first
|
32
45
|
raw_options[:offset_rows] += 1
|
33
46
|
when :skip
|
34
47
|
header = (0..(csv_data.lines.first.split(raw_options[:col_sep]).count)).to_a
|
35
48
|
raw_options[:offset_rows] += 1
|
36
49
|
when :none
|
37
|
-
|
50
|
+
header = (0..(csv_data.lines.first.split(raw_options[:col_sep]).count)).to_a
|
38
51
|
end
|
39
52
|
|
40
53
|
raw_options[:row_as_hash] = options[:row_as_hash] # Setting after header parsing
|
@@ -86,6 +99,7 @@ class Rcsv
|
|
86
99
|
raw_options[:row_conversions] = row_conversions
|
87
100
|
end
|
88
101
|
|
89
|
-
|
102
|
+
csv_data.pos = initial_position
|
103
|
+
return self.raw_parse(csv_data, raw_options, &block)
|
90
104
|
end
|
91
105
|
end
|
data/test/test_rcsv_raw_parse.rb
CHANGED
@@ -3,7 +3,7 @@ require 'rcsv'
|
|
3
3
|
|
4
4
|
class RcsvTest < Test::Unit::TestCase
|
5
5
|
def setup
|
6
|
-
@csv_data = File.
|
6
|
+
@csv_data = File.open('test/test_rcsv.csv')
|
7
7
|
end
|
8
8
|
|
9
9
|
def test_rcsv
|
@@ -20,7 +20,7 @@ class RcsvTest < Test::Unit::TestCase
|
|
20
20
|
end
|
21
21
|
|
22
22
|
def test_rcsv_col_sep
|
23
|
-
tsv_data = @csv_data.tr(",", "\t")
|
23
|
+
tsv_data = StringIO.new(@csv_data.read.tr(",", "\t"))
|
24
24
|
raw_parsed_tsv_data = Rcsv.raw_parse(tsv_data, :col_sep => "\t")
|
25
25
|
|
26
26
|
assert_equal(raw_parsed_tsv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
|
@@ -33,8 +33,27 @@ class RcsvTest < Test::Unit::TestCase
|
|
33
33
|
assert_equal(raw_parsed_tsv_data[888][13], "Dallas\t TX")
|
34
34
|
end
|
35
35
|
|
36
|
+
def test_buffer_size
|
37
|
+
raw_parsed_csv_data = Rcsv.raw_parse(@csv_data, :buffer_size => 10)
|
38
|
+
|
39
|
+
assert_equal(raw_parsed_csv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
|
40
|
+
assert_equal(raw_parsed_csv_data[0][13], '$$$908080')
|
41
|
+
assert_equal(raw_parsed_csv_data[0][14], '"')
|
42
|
+
assert_equal(raw_parsed_csv_data[0][15], 'true/false')
|
43
|
+
assert_equal(raw_parsed_csv_data[0][16], nil)
|
44
|
+
assert_equal(raw_parsed_csv_data[9][2], nil)
|
45
|
+
assert_equal(raw_parsed_csv_data[3][6], '""C81E-=; **ECCB; .. 89')
|
46
|
+
assert_equal(raw_parsed_csv_data[888][13], 'Dallas, TX')
|
47
|
+
end
|
48
|
+
|
49
|
+
def test_single_item_csv
|
50
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new("Foo"))
|
51
|
+
|
52
|
+
assert_equal(raw_parsed_csv_data, [["Foo"]])
|
53
|
+
end
|
54
|
+
|
36
55
|
def test_broken_data
|
37
|
-
broken_data = @csv_data.sub(/"/, '')
|
56
|
+
broken_data = StringIO.new(@csv_data.read.sub(/"/, ''))
|
38
57
|
|
39
58
|
assert_raise(Rcsv::ParseError) do
|
40
59
|
Rcsv.raw_parse(broken_data)
|
@@ -42,7 +61,7 @@ class RcsvTest < Test::Unit::TestCase
|
|
42
61
|
end
|
43
62
|
|
44
63
|
def test_broken_data_without_strict
|
45
|
-
broken_data = @csv_data.sub(/"/, '')
|
64
|
+
broken_data = StringIO.new(@csv_data.read.sub(/"/, ''))
|
46
65
|
|
47
66
|
raw_parsed_csv_data = Rcsv.raw_parse(broken_data, :nostrict => true)
|
48
67
|
assert_equal(["DSAdsfksjh", "iii ooo iii", "EDADEDADEDADEDADEDADEDAD", "111 333 555", "NMLKTF", "---==---", "//", "###", "0000000000", "Asdad bvd qwert", ";'''sd", "@@@", "OCTZ", "$$$908080", "\",true/false\nC85A5B9F,85259637,,96,6838,1983-06-14,\"\"\"C4CA-=; **1679; .. 79", "210,11", "908e", "1281-03-09", "7257.4654049904275", "20efe749-50fe-4b6a-a603-7f9cd1dc6c6d", "3", "New York, NY", "u", "2.228169203286535", "t"], raw_parsed_csv_data.first)
|
@@ -83,7 +102,7 @@ class RcsvTest < Test::Unit::TestCase
|
|
83
102
|
end
|
84
103
|
|
85
104
|
def test_row_conversions
|
86
|
-
raw_parsed_csv_data = Rcsv.raw_parse(@csv_data.each_line.to_a[1..-1].join, # skipping string headers
|
105
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(@csv_data.each_line.to_a[1..-1].join), # skipping string headers
|
87
106
|
:row_conversions => 'sisiisssssfsissf')
|
88
107
|
|
89
108
|
assert_equal(raw_parsed_csv_data[0][2], nil)
|
@@ -94,7 +113,7 @@ class RcsvTest < Test::Unit::TestCase
|
|
94
113
|
end
|
95
114
|
|
96
115
|
def test_row_conversions_with_column_exclusions
|
97
|
-
raw_parsed_csv_data = Rcsv.raw_parse(@csv_data.each_line.to_a[1..-1].join, # skipping string headers
|
116
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(@csv_data.each_line.to_a[1..-1].join), # skipping string headers
|
98
117
|
:row_conversions => 's f issss fsis fb')
|
99
118
|
|
100
119
|
assert_equal(raw_parsed_csv_data[0][1], nil)
|
@@ -153,4 +172,96 @@ class RcsvTest < Test::Unit::TestCase
|
|
153
172
|
'booleator' => 't'
|
154
173
|
}, raw_parsed_csv_data[1])
|
155
174
|
end
|
175
|
+
|
176
|
+
def test_array_block_streaming
|
177
|
+
raw_parsed_csv_data = []
|
178
|
+
|
179
|
+
result = Rcsv.raw_parse(@csv_data) { |row|
|
180
|
+
raw_parsed_csv_data << row
|
181
|
+
}
|
182
|
+
|
183
|
+
assert_equal(nil, result)
|
184
|
+
assert_equal(raw_parsed_csv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
|
185
|
+
assert_equal(raw_parsed_csv_data[0][13], '$$$908080')
|
186
|
+
assert_equal(raw_parsed_csv_data[0][14], '"')
|
187
|
+
assert_equal(raw_parsed_csv_data[0][15], 'true/false')
|
188
|
+
assert_equal(raw_parsed_csv_data[0][16], nil)
|
189
|
+
assert_equal(raw_parsed_csv_data[9][2], nil)
|
190
|
+
assert_equal(raw_parsed_csv_data[3][6], '""C81E-=; **ECCB; .. 89')
|
191
|
+
assert_equal(raw_parsed_csv_data[888][13], 'Dallas, TX')
|
192
|
+
end
|
193
|
+
|
194
|
+
def test_hash_block_streaming
|
195
|
+
raw_parsed_csv_data = []
|
196
|
+
result = Rcsv.raw_parse(@csv_data, :row_as_hash => true, :column_names => [
|
197
|
+
'DSAdsfksjh',
|
198
|
+
'iii ooo iii',
|
199
|
+
'EDADEDADEDADEDADEDADEDAD',
|
200
|
+
'111 333 555',
|
201
|
+
'NMLKTF',
|
202
|
+
'---==---',
|
203
|
+
'//',
|
204
|
+
'###',
|
205
|
+
'0000000000',
|
206
|
+
'Asdad bvd qwert',
|
207
|
+
";'''sd",
|
208
|
+
'@@@',
|
209
|
+
'OCTZ',
|
210
|
+
'$$$908080',
|
211
|
+
'"',
|
212
|
+
'noname',
|
213
|
+
'booleator'
|
214
|
+
]) { |row|
|
215
|
+
raw_parsed_csv_data << row
|
216
|
+
}
|
217
|
+
|
218
|
+
assert_equal(nil, result)
|
219
|
+
assert_equal({
|
220
|
+
'DSAdsfksjh' => 'C85A5B9F',
|
221
|
+
'iii ooo iii' => '85259637',
|
222
|
+
'EDADEDADEDADEDADEDADEDAD' => nil,
|
223
|
+
'111 333 555' => '96',
|
224
|
+
'NMLKTF' => '6838',
|
225
|
+
'---==---' => '1983-06-14',
|
226
|
+
'//' => '""C4CA-=; **1679; .. 79',
|
227
|
+
'###' => '210,11',
|
228
|
+
'0000000000' => '908e',
|
229
|
+
'Asdad bvd qwert' => '1281-03-09',
|
230
|
+
";'''sd" => '7257.4654049904275',
|
231
|
+
'@@@' => '20efe749-50fe-4b6a-a603-7f9cd1dc6c6d',
|
232
|
+
'OCTZ' => '3',
|
233
|
+
'$$$908080' => "New York, NY",
|
234
|
+
'"' => 'u',
|
235
|
+
'noname' => '2.228169203286535',
|
236
|
+
'booleator' => 't'
|
237
|
+
}, raw_parsed_csv_data[1])
|
238
|
+
end
|
239
|
+
|
240
|
+
def test_nils_and_empty_strings_default
|
241
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"))
|
242
|
+
|
243
|
+
assert_equal([nil, '', nil, nil, nil, nil], raw_parsed_csv_data[0])
|
244
|
+
assert_equal([nil, nil, '', '', nil, nil], raw_parsed_csv_data[1])
|
245
|
+
end
|
246
|
+
|
247
|
+
def test_nils_and_empty_strings_nil
|
248
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :nil)
|
249
|
+
|
250
|
+
assert_equal([nil, nil, nil, nil, nil, nil], raw_parsed_csv_data[0])
|
251
|
+
assert_equal([nil, nil, nil, nil, nil, nil], raw_parsed_csv_data[1])
|
252
|
+
end
|
253
|
+
|
254
|
+
def test_nils_and_empty_strings_string
|
255
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :string)
|
256
|
+
|
257
|
+
assert_equal(['', '', '', '', '', ''], raw_parsed_csv_data[0])
|
258
|
+
assert_equal(['', '', '', '', '', ''], raw_parsed_csv_data[1])
|
259
|
+
end
|
260
|
+
|
261
|
+
def test_nils_and_empty_strings_nil_or_string
|
262
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :nil_or_string)
|
263
|
+
|
264
|
+
assert_equal([nil, '', nil, nil, nil, nil], raw_parsed_csv_data[0])
|
265
|
+
assert_equal([nil, nil, '', '', nil, nil], raw_parsed_csv_data[1])
|
266
|
+
end
|
156
267
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: rcsv
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.8
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date:
|
12
|
+
date: 2013-01-11 00:00:00.000000000 Z
|
13
13
|
dependencies: []
|
14
14
|
description: A libcsv-based CSV parser for Ruby
|
15
15
|
email:
|
@@ -26,6 +26,7 @@ files:
|
|
26
26
|
- Gemfile.lock
|
27
27
|
- LICENSE
|
28
28
|
- README.md
|
29
|
+
- RELNOTES
|
29
30
|
- Rakefile
|
30
31
|
- bench.rb
|
31
32
|
- ext/rcsv/csv.h
|