rcsv 0.0.6 → 0.0.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/Gemfile.lock +1 -1
- data/README.md +38 -4
- data/RELNOTES +5 -0
- data/ext/rcsv/csv.h +5 -3
- data/ext/rcsv/libcsv.c +4 -2
- data/ext/rcsv/rcsv.c +83 -42
- data/lib/rcsv/version.rb +1 -1
- data/lib/rcsv.rb +18 -4
- data/test/test_rcsv_raw_parse.rb +117 -6
- metadata +3 -2
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -4,7 +4,7 @@
|
|
4
4
|
|
5
5
|
Rcsv is a fast CSV parsing library for MRI Ruby. Tested on REE 1.8.7 and Ruby 1.9.3.
|
6
6
|
|
7
|
-
Contrary to many other gems that implement their own parsers, Rcsv uses libcsv 3.0.
|
7
|
+
Contrary to many other gems that implement their own parsers, Rcsv uses libcsv 3.0.3 (http://sourceforge.net/projects/libcsv/). As long as libcsv's API is stable, getting Rcsv to use newer libcsv version is as simple as updating two files (csv.h and libcsv.c).
|
8
8
|
|
9
9
|
## Benchmarks
|
10
10
|
user system total real
|
@@ -48,6 +48,7 @@ Quickstart:
|
|
48
48
|
|
49
49
|
|
50
50
|
Rcsv class exposes a class method *parse* that accepts a CSV string as its first parameter and options hash as its second parameter.
|
51
|
+
If block is passed, Rcsv sequentially yields every parsed line to it and the #parse method itself returns nil.
|
51
52
|
|
52
53
|
|
53
54
|
Options supported:
|
@@ -60,7 +61,19 @@ A single-character string that is used as a separator. Default is ",".
|
|
60
61
|
|
61
62
|
A boolean flag. When enabled, allows to parse oddly quoted CSV data without exceptions being raised. Disabled by default.
|
62
63
|
|
63
|
-
Anything that does not conform to http://www.
|
64
|
+
Anything that does not conform to http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm should better be parsed with this option enabled.
|
65
|
+
|
66
|
+
### :parse_empty_fields_as
|
67
|
+
|
68
|
+
A Ruby symbol that specifies how empty CSV fields should be processed. Accepted values:
|
69
|
+
|
70
|
+
* :nil_or_string (default) - If empty field is quoted, it is parsed as empty Ruby string. If empty field is not quoted, it is parsed as Nil.
|
71
|
+
|
72
|
+
* :nil - Always parse as Nil.
|
73
|
+
|
74
|
+
* :string - Always parse as empty string.
|
75
|
+
|
76
|
+
This option doesn't affect defaults processing: all empty fields are replaced with default values if the latter are provided (via per-column :default).
|
64
77
|
|
65
78
|
### :offset_rows
|
66
79
|
|
@@ -73,7 +86,7 @@ If CSV has a header, :columns keys can be strings that are equal to column names
|
|
73
86
|
|
74
87
|
:columns values are in turn hashes that provide parsing options:
|
75
88
|
|
76
|
-
* :alias - Object of any type (though usually a Symbol) that is used
|
89
|
+
* :alias - Object of any type (though usually a Symbol) that is used as a key that represents column name when :row_as_hash is set.
|
77
90
|
* :type - A Ruby Symbol that specifies Ruby data type that CSV cell value should be converted into. Supported types: :int, :float, :string, :bool. :string is the default.
|
78
91
|
* :default - Object of any type (though usually of the same type that is specified by :type option). If CSV doesn't have any value for a cell, this default value is used.
|
79
92
|
* :match - A string. If set, makes Rcsv skip all the rows where any column doesn't match its :match value. Useful for filtering data.
|
@@ -96,6 +109,10 @@ When :row_as_hash is disabled, return value is represented as array of arrays.
|
|
96
109
|
### :only_listed_columns
|
97
110
|
A boolean flag. If enabled, only parses columns that are listed in :columns. Disabled by default.
|
98
111
|
|
112
|
+
### :buffer_size
|
113
|
+
An integer. Default is 1MiB (1024 * 1024).
|
114
|
+
Specifies a number of bytes that are read at once, thus allowing to read drectly from IO-like objects (files, sockets etc).
|
115
|
+
|
99
116
|
|
100
117
|
## Examples
|
101
118
|
|
@@ -132,12 +149,29 @@ The result would look like this:
|
|
132
149
|
[ nil, 0, "Vacuum" ]
|
133
150
|
]
|
134
151
|
|
152
|
+
And here is an example of passing a block:
|
153
|
+
|
154
|
+
Rcsv.parse(some_csv) { |row|
|
155
|
+
puts row.inspect
|
156
|
+
}
|
157
|
+
|
158
|
+
That would display contents of each row without needing to put the whole parsed result array to memory:
|
159
|
+
|
160
|
+
["a", "b", "c", "d", "e", "f"]
|
161
|
+
["1", "2", "3", "4", "5", "6"]
|
162
|
+
|
163
|
+
|
164
|
+
This way it is possible to read from a File directly, with a 20MiB buffer and parse lines one by one:
|
165
|
+
|
166
|
+
Rcsv.parse(File.open('/some/file.csv'), :buffer_size => 20 * 1024 * 1024) { |row|
|
167
|
+
puts row.inspect
|
168
|
+
}
|
169
|
+
|
135
170
|
|
136
171
|
## To do
|
137
172
|
|
138
173
|
* More specs for boolean values
|
139
174
|
* Specs for Ruby parse
|
140
|
-
* Add custom Ruby callbacks (if block is passed)
|
141
175
|
* Add CSV write support
|
142
176
|
|
143
177
|
|
data/RELNOTES
ADDED
data/ext/rcsv/csv.h
CHANGED
@@ -8,8 +8,8 @@ extern "C" {
|
|
8
8
|
#endif
|
9
9
|
|
10
10
|
#define CSV_MAJOR 3
|
11
|
-
#define CSV_MINOR
|
12
|
-
#define CSV_RELEASE
|
11
|
+
#define CSV_MINOR 0
|
12
|
+
#define CSV_RELEASE 3
|
13
13
|
|
14
14
|
/* Error Codes */
|
15
15
|
#define CSV_SUCCESS 0
|
@@ -25,7 +25,9 @@ extern "C" {
|
|
25
25
|
#define CSV_STRICT_FINI 4 /* causes csv_fini to return CSV_EPARSE if last
|
26
26
|
field is quoted and doesn't containg ending
|
27
27
|
quote */
|
28
|
-
#define CSV_APPEND_NULL 8 /* Ensure that all fields are null-
|
28
|
+
#define CSV_APPEND_NULL 8 /* Ensure that all fields are null-terminated */
|
29
|
+
#define CSV_EMPTY_IS_NULL 16 /* Pass null pointer to cb1 function when
|
30
|
+
empty, unquoted fields are encountered */
|
29
31
|
|
30
32
|
|
31
33
|
/* Character values */
|
data/ext/rcsv/libcsv.c
CHANGED
@@ -25,7 +25,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
|
|
25
25
|
|
26
26
|
#include "csv.h"
|
27
27
|
|
28
|
-
#define VERSION "3.0.
|
28
|
+
#define VERSION "3.0.3"
|
29
29
|
|
30
30
|
#define ROW_NOT_BEGUN 0
|
31
31
|
#define FIELD_NOT_BEGUN 1
|
@@ -50,7 +50,9 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
|
|
50
50
|
entry_pos -= spaces; \
|
51
51
|
if (p->options & CSV_APPEND_NULL) \
|
52
52
|
((p)->entry_buf[entry_pos]) = '\0'; \
|
53
|
-
if (cb1) \
|
53
|
+
if (cb1 && (p->options & CSV_EMPTY_IS_NULL) && !quoted && entry_pos == 0) \
|
54
|
+
cb1(NULL, entry_pos, data); \
|
55
|
+
else if (cb1) \
|
54
56
|
cb1(p->entry_buf, entry_pos, data); \
|
55
57
|
pstate = FIELD_NOT_BEGUN; \
|
56
58
|
entry_pos = quoted = spaces = 0; \
|
data/ext/rcsv/rcsv.c
CHANGED
@@ -12,6 +12,7 @@ static VALUE rcsv_parse_error; /* class Rcsv::ParseError << StandardError; end *
|
|
12
12
|
struct rcsv_metadata {
|
13
13
|
/* Derived from user-specified options */
|
14
14
|
bool row_as_hash; /* Used to return array of hashes rather than array of arrays */
|
15
|
+
bool empty_field_is_nil; /* Do we convert empty fields to nils? */
|
15
16
|
size_t offset_rows; /* Number of rows to skip before parsing */
|
16
17
|
|
17
18
|
char * row_conversions; /* A pointer to string/array of row conversions char specifiers */
|
@@ -30,6 +31,7 @@ struct rcsv_metadata {
|
|
30
31
|
size_t current_col; /* Current column's index */
|
31
32
|
size_t current_row; /* Current row's index */
|
32
33
|
|
34
|
+
VALUE last_entry; /* A pointer to the last entry that's going to be appended to result */
|
33
35
|
VALUE * result; /* A pointer to the parsed data */
|
34
36
|
};
|
35
37
|
|
@@ -41,7 +43,6 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
|
|
41
43
|
struct rcsv_metadata * meta = (struct rcsv_metadata *) data;
|
42
44
|
char row_conversion = 0;
|
43
45
|
VALUE parsed_field;
|
44
|
-
VALUE last_entry = rb_ary_entry(*(meta->result), -1); /* result.last */
|
45
46
|
|
46
47
|
/* No need to parse anything until the end of the line if skip_current_row is set */
|
47
48
|
if (meta->skip_current_row) {
|
@@ -56,6 +57,7 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
|
|
56
57
|
|
57
58
|
/* Filter by string row values listed in meta->only_rows */
|
58
59
|
if ((meta->only_rows != NULL) &&
|
60
|
+
(field_str != NULL) && /* TODO: What if we want to filter out NULLs? */
|
59
61
|
(meta->current_col < meta->num_only_rows) &&
|
60
62
|
(meta->only_rows[meta->current_col] != NULL) &&
|
61
63
|
(strcmp(meta->only_rows[meta->current_col], field_str))) {
|
@@ -74,8 +76,12 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
|
|
74
76
|
/* Assigning appropriate default value if applicable. */
|
75
77
|
if (meta->current_col < meta->num_row_defaults) {
|
76
78
|
parsed_field = meta->row_defaults[meta->current_col];
|
77
|
-
} else { /*
|
78
|
-
|
79
|
+
} else { /* It depends on empty_field_is_nil if we convert empty strings to nils */
|
80
|
+
if (meta->empty_field_is_nil || field_str == NULL) {
|
81
|
+
parsed_field = Qnil;
|
82
|
+
} else {
|
83
|
+
parsed_field = rb_str_new2("");
|
84
|
+
}
|
79
85
|
}
|
80
86
|
} else {
|
81
87
|
if (meta->current_col < meta->num_row_conversions) {
|
@@ -136,10 +142,10 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
|
|
136
142
|
(int)meta->num_columns
|
137
143
|
);
|
138
144
|
} else {
|
139
|
-
rb_hash_aset(last_entry, meta->column_names[meta->current_col], parsed_field);
|
145
|
+
rb_hash_aset(meta->last_entry, meta->column_names[meta->current_col], parsed_field); /* last_entry[column_names[current_col]] = field */
|
140
146
|
}
|
141
147
|
} else { /* Parse into Array */
|
142
|
-
rb_ary_push(last_entry, parsed_field); /*
|
148
|
+
rb_ary_push(meta->last_entry, parsed_field); /* last_entry << field */
|
143
149
|
}
|
144
150
|
}
|
145
151
|
|
@@ -154,16 +160,22 @@ void end_of_line_callback(int last_char, void * data) {
|
|
154
160
|
|
155
161
|
/* If filters didn't match, current row parsing is reverted */
|
156
162
|
if (meta->skip_current_row) {
|
157
|
-
|
163
|
+
/* Do we wanna GC? */
|
158
164
|
meta->skip_current_row = false;
|
165
|
+
} else {
|
166
|
+
if (rb_block_given_p()) { /* STREAMING */
|
167
|
+
rb_yield(meta->last_entry);
|
168
|
+
} else {
|
169
|
+
rb_ary_push(*(meta->result), meta->last_entry);
|
170
|
+
}
|
159
171
|
}
|
160
172
|
|
161
|
-
/*
|
173
|
+
/* Re-initialize last_entry unless EOF reached */
|
162
174
|
if (last_char != -1) {
|
163
175
|
if (meta->row_as_hash) {
|
164
|
-
|
176
|
+
meta->last_entry = rb_hash_new(); /* {} */
|
165
177
|
} else {
|
166
|
-
|
178
|
+
meta->last_entry = rb_ary_new(); /* [] */
|
167
179
|
}
|
168
180
|
}
|
169
181
|
|
@@ -175,12 +187,19 @@ void end_of_line_callback(int last_char, void * data) {
|
|
175
187
|
return;
|
176
188
|
}
|
177
189
|
|
190
|
+
void custom_end_of_line_callback(int last_char, void * data) {
|
191
|
+
struct rcsv_metadata * meta = (struct rcsv_metadata *) data;
|
192
|
+
|
193
|
+
if (!meta->skip_current_row) {
|
194
|
+
}
|
195
|
+
}
|
196
|
+
|
178
197
|
/* C API */
|
179
198
|
|
180
199
|
/* The main method that handles parsing */
|
181
200
|
static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
182
201
|
struct rcsv_metadata meta;
|
183
|
-
VALUE
|
202
|
+
VALUE csvio, csvstr, buffer_size, options, option;
|
184
203
|
|
185
204
|
struct csv_parser cp;
|
186
205
|
unsigned char csv_options = CSV_STRICT_FINI | CSV_APPEND_NULL;
|
@@ -191,6 +210,7 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
|
191
210
|
|
192
211
|
/* Setting up some sane defaults */
|
193
212
|
meta.row_as_hash = false;
|
213
|
+
meta.empty_field_is_nil = false;
|
194
214
|
meta.skip_current_row = false;
|
195
215
|
meta.num_columns = 0;
|
196
216
|
meta.current_col = 0;
|
@@ -205,22 +225,34 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
|
205
225
|
meta.column_names = NULL;
|
206
226
|
meta.result = (VALUE[]){rb_ary_new()}; /* [] */
|
207
227
|
|
208
|
-
/*
|
209
|
-
rb_scan_args(argc, argv, "11", &
|
210
|
-
csv_string = StringValuePtr(str);
|
211
|
-
csv_string_len = strlen(csv_string);
|
228
|
+
/* csvio is required, options is optional (pun intended) */
|
229
|
+
rb_scan_args(argc, argv, "11", &csvio, &options);
|
212
230
|
|
213
231
|
/* options ||= nil */
|
214
232
|
if (NIL_P(options)) {
|
215
233
|
options = rb_hash_new();
|
216
234
|
}
|
217
235
|
|
236
|
+
buffer_size = rb_hash_aref(options, ID2SYM(rb_intern("buffer_size")));
|
237
|
+
|
218
238
|
/* By default, parsing is strict */
|
219
239
|
option = rb_hash_aref(options, ID2SYM(rb_intern("nostrict")));
|
220
240
|
if (!option || (option == Qnil)) {
|
221
241
|
csv_options |= CSV_STRICT;
|
222
242
|
}
|
223
243
|
|
244
|
+
/* By default, empty strings are treated as Nils and quoted empty strings are treated as empty Ruby strings */
|
245
|
+
option = rb_hash_aref(options, ID2SYM(rb_intern("parse_empty_fields_as")));
|
246
|
+
if ((option == Qnil) || (option == ID2SYM(rb_intern("nil_or_string")))) {
|
247
|
+
csv_options |= CSV_EMPTY_IS_NULL;
|
248
|
+
} else if (option == ID2SYM(rb_intern("nil"))) {
|
249
|
+
meta.empty_field_is_nil = true;
|
250
|
+
} else if (option == ID2SYM(rb_intern("string"))) {
|
251
|
+
meta.empty_field_is_nil = false;
|
252
|
+
} else {
|
253
|
+
rb_raise(rcsv_parse_error, "The only valid options for :parse_empty_fields_as are :nil, :string and :nil_or_string, but %s was supplied.", RSTRING_PTR(rb_inspect(option)));
|
254
|
+
}
|
255
|
+
|
224
256
|
/* Try to initialize libcsv */
|
225
257
|
if (csv_init(&cp, csv_options) == -1) {
|
226
258
|
rb_raise(rcsv_parse_error, "Couldn't initialize libcsv");
|
@@ -283,12 +315,14 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
|
283
315
|
meta.row_conversions = StringValuePtr(option);
|
284
316
|
}
|
285
317
|
|
286
|
-
|
318
|
+
/* Column names should be declared explicitly when parsing fields as Hashes */
|
287
319
|
if (meta.row_as_hash) { /* Only matters for hash results */
|
288
320
|
option = rb_hash_aref(options, ID2SYM(rb_intern("column_names")));
|
289
321
|
if (option == Qnil) {
|
290
322
|
rb_raise(rcsv_parse_error, ":row_as_hash requires :column_names to be set.");
|
291
323
|
} else {
|
324
|
+
meta.last_entry = rb_hash_new();
|
325
|
+
|
292
326
|
meta.num_columns = (size_t)RARRAY_LEN(option);
|
293
327
|
meta.column_names = (VALUE*)malloc(meta.num_columns * sizeof(VALUE*));
|
294
328
|
|
@@ -296,34 +330,37 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
|
296
330
|
meta.column_names[i] = rb_ary_entry(option, i);
|
297
331
|
}
|
298
332
|
}
|
299
|
-
}
|
300
|
-
|
301
|
-
/* Initializing result with empty Array */
|
302
|
-
if (meta.row_as_hash) {
|
303
|
-
rb_ary_push(*(meta.result), rb_hash_new()); /* [{}] */
|
304
333
|
} else {
|
305
|
-
|
334
|
+
meta.last_entry = rb_ary_new();
|
306
335
|
}
|
307
336
|
|
308
|
-
|
309
|
-
|
310
|
-
|
311
|
-
|
312
|
-
|
313
|
-
|
314
|
-
|
337
|
+
while(true) {
|
338
|
+
csvstr = rb_funcall(csvio, rb_intern("read"), 1, buffer_size);
|
339
|
+
if ((csvstr == Qnil) || (RSTRING_LEN(csvstr) == 0)) { break; }
|
340
|
+
|
341
|
+
csv_string = StringValuePtr(csvstr);
|
342
|
+
csv_string_len = strlen(csv_string);
|
343
|
+
|
344
|
+
/* Actual parsing and error handling */
|
345
|
+
if (csv_string_len != csv_parse(&cp, csv_string, csv_string_len,
|
346
|
+
&end_of_field_callback, &end_of_line_callback, &meta)) {
|
347
|
+
error = csv_error(&cp);
|
348
|
+
switch(error) {
|
349
|
+
case CSV_EPARSE:
|
350
|
+
rb_raise(rcsv_parse_error, "Error when parsing malformed data");
|
351
|
+
break;
|
352
|
+
case CSV_ENOMEM:
|
353
|
+
rb_raise(rcsv_parse_error, "No memory");
|
354
|
+
break;
|
355
|
+
case CSV_ETOOBIG:
|
356
|
+
rb_raise(rcsv_parse_error, "Field data is too large");
|
357
|
+
break;
|
358
|
+
case CSV_EINVALID:
|
359
|
+
rb_raise(rcsv_parse_error, "%s", (const char *)csv_strerror(error));
|
315
360
|
break;
|
316
|
-
|
317
|
-
|
318
|
-
|
319
|
-
case CSV_ETOOBIG:
|
320
|
-
rb_raise(rcsv_parse_error, "Field data is too large");
|
321
|
-
break;
|
322
|
-
case CSV_EINVALID:
|
323
|
-
rb_raise(rcsv_parse_error, "%s", (const char *)csv_strerror(error));
|
324
|
-
break;
|
325
|
-
default:
|
326
|
-
rb_raise(rcsv_parse_error, "Failed due to unknown reason");
|
361
|
+
default:
|
362
|
+
rb_raise(rcsv_parse_error, "Failed due to unknown reason");
|
363
|
+
}
|
327
364
|
}
|
328
365
|
}
|
329
366
|
|
@@ -344,12 +381,16 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
|
|
344
381
|
}
|
345
382
|
|
346
383
|
/* Remove the last row if it's empty. That happens if CSV file ends with a newline. */
|
347
|
-
if (RARRAY_LEN(
|
384
|
+
if (RARRAY_LEN(*(meta.result)) && /* meta.result.size != 0 */
|
385
|
+
RARRAY_LEN(rb_ary_entry(*(meta.result), -1)) == 0) {
|
348
386
|
rb_ary_pop(*(meta.result));
|
349
387
|
}
|
350
388
|
|
351
|
-
|
352
|
-
|
389
|
+
if (rb_block_given_p()) {
|
390
|
+
return Qnil; /* STREAMING */
|
391
|
+
} else {
|
392
|
+
return *(meta.result); /* Return accumulated result */
|
393
|
+
}
|
353
394
|
}
|
354
395
|
|
355
396
|
|
data/lib/rcsv/version.rb
CHANGED
data/lib/rcsv.rb
CHANGED
@@ -1,8 +1,10 @@
|
|
1
1
|
require "rcsv/rcsv"
|
2
2
|
require "rcsv/version"
|
3
3
|
|
4
|
+
require "stringio"
|
5
|
+
|
4
6
|
class Rcsv
|
5
|
-
def self.parse(csv_data, options = {})
|
7
|
+
def self.parse(csv_data, options = {}, &block)
|
6
8
|
#options = {
|
7
9
|
#:column_separator => "\t",
|
8
10
|
#:only_listed_columns => true,
|
@@ -25,16 +27,27 @@ class Rcsv
|
|
25
27
|
raw_options[:col_sep] = options[:column_separator] && options[:column_separator][0] || ','
|
26
28
|
raw_options[:offset_rows] = options[:offset_rows] || 0
|
27
29
|
raw_options[:nostrict] = options[:nostrict]
|
30
|
+
raw_options[:parse_empty_fields_as] = options[:parse_empty_fields_as]
|
31
|
+
raw_options[:buffer_size] = options[:buffer_size] || 1024 * 1024 # 1 MiB
|
32
|
+
|
33
|
+
if csv_data.is_a?(String)
|
34
|
+
csv_data = StringIO.new(csv_data)
|
35
|
+
elsif !(csv_data.respond_to?(:lines) && csv_data.respond_to?(:read))
|
36
|
+
inspected_csv_data = csv_data.inspect
|
37
|
+
raise ParseError.new("Supplied CSV object #{inspected_csv_data[0..127]}#{inspected_csv_data.size > 128 ? '...' : ''} is neither String nor looks like IO object.")
|
38
|
+
end
|
39
|
+
|
40
|
+
initial_position = csv_data.pos
|
28
41
|
|
29
42
|
case options[:header]
|
30
43
|
when :use
|
31
|
-
header = self.raw_parse(csv_data.lines.first, raw_options).first
|
44
|
+
header = self.raw_parse(StringIO.new(csv_data.lines.first), raw_options).first
|
32
45
|
raw_options[:offset_rows] += 1
|
33
46
|
when :skip
|
34
47
|
header = (0..(csv_data.lines.first.split(raw_options[:col_sep]).count)).to_a
|
35
48
|
raw_options[:offset_rows] += 1
|
36
49
|
when :none
|
37
|
-
|
50
|
+
header = (0..(csv_data.lines.first.split(raw_options[:col_sep]).count)).to_a
|
38
51
|
end
|
39
52
|
|
40
53
|
raw_options[:row_as_hash] = options[:row_as_hash] # Setting after header parsing
|
@@ -86,6 +99,7 @@ class Rcsv
|
|
86
99
|
raw_options[:row_conversions] = row_conversions
|
87
100
|
end
|
88
101
|
|
89
|
-
|
102
|
+
csv_data.pos = initial_position
|
103
|
+
return self.raw_parse(csv_data, raw_options, &block)
|
90
104
|
end
|
91
105
|
end
|
data/test/test_rcsv_raw_parse.rb
CHANGED
@@ -3,7 +3,7 @@ require 'rcsv'
|
|
3
3
|
|
4
4
|
class RcsvTest < Test::Unit::TestCase
|
5
5
|
def setup
|
6
|
-
@csv_data = File.
|
6
|
+
@csv_data = File.open('test/test_rcsv.csv')
|
7
7
|
end
|
8
8
|
|
9
9
|
def test_rcsv
|
@@ -20,7 +20,7 @@ class RcsvTest < Test::Unit::TestCase
|
|
20
20
|
end
|
21
21
|
|
22
22
|
def test_rcsv_col_sep
|
23
|
-
tsv_data = @csv_data.tr(",", "\t")
|
23
|
+
tsv_data = StringIO.new(@csv_data.read.tr(",", "\t"))
|
24
24
|
raw_parsed_tsv_data = Rcsv.raw_parse(tsv_data, :col_sep => "\t")
|
25
25
|
|
26
26
|
assert_equal(raw_parsed_tsv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
|
@@ -33,8 +33,27 @@ class RcsvTest < Test::Unit::TestCase
|
|
33
33
|
assert_equal(raw_parsed_tsv_data[888][13], "Dallas\t TX")
|
34
34
|
end
|
35
35
|
|
36
|
+
def test_buffer_size
|
37
|
+
raw_parsed_csv_data = Rcsv.raw_parse(@csv_data, :buffer_size => 10)
|
38
|
+
|
39
|
+
assert_equal(raw_parsed_csv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
|
40
|
+
assert_equal(raw_parsed_csv_data[0][13], '$$$908080')
|
41
|
+
assert_equal(raw_parsed_csv_data[0][14], '"')
|
42
|
+
assert_equal(raw_parsed_csv_data[0][15], 'true/false')
|
43
|
+
assert_equal(raw_parsed_csv_data[0][16], nil)
|
44
|
+
assert_equal(raw_parsed_csv_data[9][2], nil)
|
45
|
+
assert_equal(raw_parsed_csv_data[3][6], '""C81E-=; **ECCB; .. 89')
|
46
|
+
assert_equal(raw_parsed_csv_data[888][13], 'Dallas, TX')
|
47
|
+
end
|
48
|
+
|
49
|
+
def test_single_item_csv
|
50
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new("Foo"))
|
51
|
+
|
52
|
+
assert_equal(raw_parsed_csv_data, [["Foo"]])
|
53
|
+
end
|
54
|
+
|
36
55
|
def test_broken_data
|
37
|
-
broken_data = @csv_data.sub(/"/, '')
|
56
|
+
broken_data = StringIO.new(@csv_data.read.sub(/"/, ''))
|
38
57
|
|
39
58
|
assert_raise(Rcsv::ParseError) do
|
40
59
|
Rcsv.raw_parse(broken_data)
|
@@ -42,7 +61,7 @@ class RcsvTest < Test::Unit::TestCase
|
|
42
61
|
end
|
43
62
|
|
44
63
|
def test_broken_data_without_strict
|
45
|
-
broken_data = @csv_data.sub(/"/, '')
|
64
|
+
broken_data = StringIO.new(@csv_data.read.sub(/"/, ''))
|
46
65
|
|
47
66
|
raw_parsed_csv_data = Rcsv.raw_parse(broken_data, :nostrict => true)
|
48
67
|
assert_equal(["DSAdsfksjh", "iii ooo iii", "EDADEDADEDADEDADEDADEDAD", "111 333 555", "NMLKTF", "---==---", "//", "###", "0000000000", "Asdad bvd qwert", ";'''sd", "@@@", "OCTZ", "$$$908080", "\",true/false\nC85A5B9F,85259637,,96,6838,1983-06-14,\"\"\"C4CA-=; **1679; .. 79", "210,11", "908e", "1281-03-09", "7257.4654049904275", "20efe749-50fe-4b6a-a603-7f9cd1dc6c6d", "3", "New York, NY", "u", "2.228169203286535", "t"], raw_parsed_csv_data.first)
|
@@ -83,7 +102,7 @@ class RcsvTest < Test::Unit::TestCase
|
|
83
102
|
end
|
84
103
|
|
85
104
|
def test_row_conversions
|
86
|
-
raw_parsed_csv_data = Rcsv.raw_parse(@csv_data.each_line.to_a[1..-1].join, # skipping string headers
|
105
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(@csv_data.each_line.to_a[1..-1].join), # skipping string headers
|
87
106
|
:row_conversions => 'sisiisssssfsissf')
|
88
107
|
|
89
108
|
assert_equal(raw_parsed_csv_data[0][2], nil)
|
@@ -94,7 +113,7 @@ class RcsvTest < Test::Unit::TestCase
|
|
94
113
|
end
|
95
114
|
|
96
115
|
def test_row_conversions_with_column_exclusions
|
97
|
-
raw_parsed_csv_data = Rcsv.raw_parse(@csv_data.each_line.to_a[1..-1].join, # skipping string headers
|
116
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(@csv_data.each_line.to_a[1..-1].join), # skipping string headers
|
98
117
|
:row_conversions => 's f issss fsis fb')
|
99
118
|
|
100
119
|
assert_equal(raw_parsed_csv_data[0][1], nil)
|
@@ -153,4 +172,96 @@ class RcsvTest < Test::Unit::TestCase
|
|
153
172
|
'booleator' => 't'
|
154
173
|
}, raw_parsed_csv_data[1])
|
155
174
|
end
|
175
|
+
|
176
|
+
def test_array_block_streaming
|
177
|
+
raw_parsed_csv_data = []
|
178
|
+
|
179
|
+
result = Rcsv.raw_parse(@csv_data) { |row|
|
180
|
+
raw_parsed_csv_data << row
|
181
|
+
}
|
182
|
+
|
183
|
+
assert_equal(nil, result)
|
184
|
+
assert_equal(raw_parsed_csv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
|
185
|
+
assert_equal(raw_parsed_csv_data[0][13], '$$$908080')
|
186
|
+
assert_equal(raw_parsed_csv_data[0][14], '"')
|
187
|
+
assert_equal(raw_parsed_csv_data[0][15], 'true/false')
|
188
|
+
assert_equal(raw_parsed_csv_data[0][16], nil)
|
189
|
+
assert_equal(raw_parsed_csv_data[9][2], nil)
|
190
|
+
assert_equal(raw_parsed_csv_data[3][6], '""C81E-=; **ECCB; .. 89')
|
191
|
+
assert_equal(raw_parsed_csv_data[888][13], 'Dallas, TX')
|
192
|
+
end
|
193
|
+
|
194
|
+
def test_hash_block_streaming
|
195
|
+
raw_parsed_csv_data = []
|
196
|
+
result = Rcsv.raw_parse(@csv_data, :row_as_hash => true, :column_names => [
|
197
|
+
'DSAdsfksjh',
|
198
|
+
'iii ooo iii',
|
199
|
+
'EDADEDADEDADEDADEDADEDAD',
|
200
|
+
'111 333 555',
|
201
|
+
'NMLKTF',
|
202
|
+
'---==---',
|
203
|
+
'//',
|
204
|
+
'###',
|
205
|
+
'0000000000',
|
206
|
+
'Asdad bvd qwert',
|
207
|
+
";'''sd",
|
208
|
+
'@@@',
|
209
|
+
'OCTZ',
|
210
|
+
'$$$908080',
|
211
|
+
'"',
|
212
|
+
'noname',
|
213
|
+
'booleator'
|
214
|
+
]) { |row|
|
215
|
+
raw_parsed_csv_data << row
|
216
|
+
}
|
217
|
+
|
218
|
+
assert_equal(nil, result)
|
219
|
+
assert_equal({
|
220
|
+
'DSAdsfksjh' => 'C85A5B9F',
|
221
|
+
'iii ooo iii' => '85259637',
|
222
|
+
'EDADEDADEDADEDADEDADEDAD' => nil,
|
223
|
+
'111 333 555' => '96',
|
224
|
+
'NMLKTF' => '6838',
|
225
|
+
'---==---' => '1983-06-14',
|
226
|
+
'//' => '""C4CA-=; **1679; .. 79',
|
227
|
+
'###' => '210,11',
|
228
|
+
'0000000000' => '908e',
|
229
|
+
'Asdad bvd qwert' => '1281-03-09',
|
230
|
+
";'''sd" => '7257.4654049904275',
|
231
|
+
'@@@' => '20efe749-50fe-4b6a-a603-7f9cd1dc6c6d',
|
232
|
+
'OCTZ' => '3',
|
233
|
+
'$$$908080' => "New York, NY",
|
234
|
+
'"' => 'u',
|
235
|
+
'noname' => '2.228169203286535',
|
236
|
+
'booleator' => 't'
|
237
|
+
}, raw_parsed_csv_data[1])
|
238
|
+
end
|
239
|
+
|
240
|
+
def test_nils_and_empty_strings_default
|
241
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"))
|
242
|
+
|
243
|
+
assert_equal([nil, '', nil, nil, nil, nil], raw_parsed_csv_data[0])
|
244
|
+
assert_equal([nil, nil, '', '', nil, nil], raw_parsed_csv_data[1])
|
245
|
+
end
|
246
|
+
|
247
|
+
def test_nils_and_empty_strings_nil
|
248
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :nil)
|
249
|
+
|
250
|
+
assert_equal([nil, nil, nil, nil, nil, nil], raw_parsed_csv_data[0])
|
251
|
+
assert_equal([nil, nil, nil, nil, nil, nil], raw_parsed_csv_data[1])
|
252
|
+
end
|
253
|
+
|
254
|
+
def test_nils_and_empty_strings_string
|
255
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :string)
|
256
|
+
|
257
|
+
assert_equal(['', '', '', '', '', ''], raw_parsed_csv_data[0])
|
258
|
+
assert_equal(['', '', '', '', '', ''], raw_parsed_csv_data[1])
|
259
|
+
end
|
260
|
+
|
261
|
+
def test_nils_and_empty_strings_nil_or_string
|
262
|
+
raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :nil_or_string)
|
263
|
+
|
264
|
+
assert_equal([nil, '', nil, nil, nil, nil], raw_parsed_csv_data[0])
|
265
|
+
assert_equal([nil, nil, '', '', nil, nil], raw_parsed_csv_data[1])
|
266
|
+
end
|
156
267
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: rcsv
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.8
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date:
|
12
|
+
date: 2013-01-11 00:00:00.000000000 Z
|
13
13
|
dependencies: []
|
14
14
|
description: A libcsv-based CSV parser for Ruby
|
15
15
|
email:
|
@@ -26,6 +26,7 @@ files:
|
|
26
26
|
- Gemfile.lock
|
27
27
|
- LICENSE
|
28
28
|
- README.md
|
29
|
+
- RELNOTES
|
29
30
|
- Rakefile
|
30
31
|
- bench.rb
|
31
32
|
- ext/rcsv/csv.h
|