rcsv 0.0.6 → 0.0.8

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- rcsv (0.0.6)
4
+ rcsv (0.0.8)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
data/README.md CHANGED
@@ -4,7 +4,7 @@
4
4
 
5
5
  Rcsv is a fast CSV parsing library for MRI Ruby. Tested on REE 1.8.7 and Ruby 1.9.3.
6
6
 
7
- Contrary to many other gems that implement their own parsers, Rcsv uses libcsv 3.0.2 (http://sourceforge.net/projects/libcsv/). As long as libcsv's API is stable, getting Rcsv to use newer libcsv version is as simple as updating two files (csv.h and libcsv.c).
7
+ Contrary to many other gems that implement their own parsers, Rcsv uses libcsv 3.0.3 (http://sourceforge.net/projects/libcsv/). As long as libcsv's API is stable, getting Rcsv to use newer libcsv version is as simple as updating two files (csv.h and libcsv.c).
8
8
 
9
9
  ## Benchmarks
10
10
  user system total real
@@ -48,6 +48,7 @@ Quickstart:
48
48
 
49
49
 
50
50
  Rcsv class exposes a class method *parse* that accepts a CSV string as its first parameter and options hash as its second parameter.
51
+ If block is passed, Rcsv sequentially yields every parsed line to it and the #parse method itself returns nil.
51
52
 
52
53
 
53
54
  Options supported:
@@ -60,7 +61,19 @@ A single-character string that is used as a separator. Default is ",".
60
61
 
61
62
  A boolean flag. When enabled, allows to parse oddly quoted CSV data without exceptions being raised. Disabled by default.
62
63
 
63
- Anything that does not conform to http://www.ietf.org/rfc/rfc4180.txt should better be parsed with this option enabled.
64
+ Anything that does not conform to http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm should better be parsed with this option enabled.
65
+
66
+ ### :parse_empty_fields_as
67
+
68
+ A Ruby symbol that specifies how empty CSV fields should be processed. Accepted values:
69
+
70
+ * :nil_or_string (default) - If empty field is quoted, it is parsed as empty Ruby string. If empty field is not quoted, it is parsed as Nil.
71
+
72
+ * :nil - Always parse as Nil.
73
+
74
+ * :string - Always parse as empty string.
75
+
76
+ This option doesn't affect defaults processing: all empty fields are replaced with default values if the latter are provided (via per-column :default).
64
77
 
65
78
  ### :offset_rows
66
79
 
@@ -73,7 +86,7 @@ If CSV has a header, :columns keys can be strings that are equal to column names
73
86
 
74
87
  :columns values are in turn hashes that provide parsing options:
75
88
 
76
- * :alias - Object of any type (though usually a Symbol) that is used to as a key that represents column name when :row_as_hash is set.
89
+ * :alias - Object of any type (though usually a Symbol) that is used as a key that represents column name when :row_as_hash is set.
77
90
  * :type - A Ruby Symbol that specifies Ruby data type that CSV cell value should be converted into. Supported types: :int, :float, :string, :bool. :string is the default.
78
91
  * :default - Object of any type (though usually of the same type that is specified by :type option). If CSV doesn't have any value for a cell, this default value is used.
79
92
  * :match - A string. If set, makes Rcsv skip all the rows where any column doesn't match its :match value. Useful for filtering data.
@@ -96,6 +109,10 @@ When :row_as_hash is disabled, return value is represented as array of arrays.
96
109
  ### :only_listed_columns
97
110
  A boolean flag. If enabled, only parses columns that are listed in :columns. Disabled by default.
98
111
 
112
+ ### :buffer_size
113
+ An integer. Default is 1MiB (1024 * 1024).
114
+ Specifies a number of bytes that are read at once, thus allowing to read drectly from IO-like objects (files, sockets etc).
115
+
99
116
 
100
117
  ## Examples
101
118
 
@@ -132,12 +149,29 @@ The result would look like this:
132
149
  [ nil, 0, "Vacuum" ]
133
150
  ]
134
151
 
152
+ And here is an example of passing a block:
153
+
154
+ Rcsv.parse(some_csv) { |row|
155
+ puts row.inspect
156
+ }
157
+
158
+ That would display contents of each row without needing to put the whole parsed result array to memory:
159
+
160
+ ["a", "b", "c", "d", "e", "f"]
161
+ ["1", "2", "3", "4", "5", "6"]
162
+
163
+
164
+ This way it is possible to read from a File directly, with a 20MiB buffer and parse lines one by one:
165
+
166
+ Rcsv.parse(File.open('/some/file.csv'), :buffer_size => 20 * 1024 * 1024) { |row|
167
+ puts row.inspect
168
+ }
169
+
135
170
 
136
171
  ## To do
137
172
 
138
173
  * More specs for boolean values
139
174
  * Specs for Ruby parse
140
- * Add custom Ruby callbacks (if block is passed)
141
175
  * Add CSV write support
142
176
 
143
177
 
data/RELNOTES ADDED
@@ -0,0 +1,5 @@
1
+ Version 0.0.8
2
+ * libcsv upgraded to 3.0.3
3
+ * :parse_empty_fields_as option added
4
+ * README.md updated with a proper reference to CSV parsing guidelines
5
+ * RELNOTES added
data/ext/rcsv/csv.h CHANGED
@@ -8,8 +8,8 @@ extern "C" {
8
8
  #endif
9
9
 
10
10
  #define CSV_MAJOR 3
11
- #define CSV_MINOR 1
12
- #define CSV_RELEASE 0
11
+ #define CSV_MINOR 0
12
+ #define CSV_RELEASE 3
13
13
 
14
14
  /* Error Codes */
15
15
  #define CSV_SUCCESS 0
@@ -25,7 +25,9 @@ extern "C" {
25
25
  #define CSV_STRICT_FINI 4 /* causes csv_fini to return CSV_EPARSE if last
26
26
  field is quoted and doesn't containg ending
27
27
  quote */
28
- #define CSV_APPEND_NULL 8 /* Ensure that all fields are null-ternimated */
28
+ #define CSV_APPEND_NULL 8 /* Ensure that all fields are null-terminated */
29
+ #define CSV_EMPTY_IS_NULL 16 /* Pass null pointer to cb1 function when
30
+ empty, unquoted fields are encountered */
29
31
 
30
32
 
31
33
  /* Character values */
data/ext/rcsv/libcsv.c CHANGED
@@ -25,7 +25,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
25
25
 
26
26
  #include "csv.h"
27
27
 
28
- #define VERSION "3.0.2"
28
+ #define VERSION "3.0.3"
29
29
 
30
30
  #define ROW_NOT_BEGUN 0
31
31
  #define FIELD_NOT_BEGUN 1
@@ -50,7 +50,9 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
50
50
  entry_pos -= spaces; \
51
51
  if (p->options & CSV_APPEND_NULL) \
52
52
  ((p)->entry_buf[entry_pos]) = '\0'; \
53
- if (cb1) \
53
+ if (cb1 && (p->options & CSV_EMPTY_IS_NULL) && !quoted && entry_pos == 0) \
54
+ cb1(NULL, entry_pos, data); \
55
+ else if (cb1) \
54
56
  cb1(p->entry_buf, entry_pos, data); \
55
57
  pstate = FIELD_NOT_BEGUN; \
56
58
  entry_pos = quoted = spaces = 0; \
data/ext/rcsv/rcsv.c CHANGED
@@ -12,6 +12,7 @@ static VALUE rcsv_parse_error; /* class Rcsv::ParseError << StandardError; end *
12
12
  struct rcsv_metadata {
13
13
  /* Derived from user-specified options */
14
14
  bool row_as_hash; /* Used to return array of hashes rather than array of arrays */
15
+ bool empty_field_is_nil; /* Do we convert empty fields to nils? */
15
16
  size_t offset_rows; /* Number of rows to skip before parsing */
16
17
 
17
18
  char * row_conversions; /* A pointer to string/array of row conversions char specifiers */
@@ -30,6 +31,7 @@ struct rcsv_metadata {
30
31
  size_t current_col; /* Current column's index */
31
32
  size_t current_row; /* Current row's index */
32
33
 
34
+ VALUE last_entry; /* A pointer to the last entry that's going to be appended to result */
33
35
  VALUE * result; /* A pointer to the parsed data */
34
36
  };
35
37
 
@@ -41,7 +43,6 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
41
43
  struct rcsv_metadata * meta = (struct rcsv_metadata *) data;
42
44
  char row_conversion = 0;
43
45
  VALUE parsed_field;
44
- VALUE last_entry = rb_ary_entry(*(meta->result), -1); /* result.last */
45
46
 
46
47
  /* No need to parse anything until the end of the line if skip_current_row is set */
47
48
  if (meta->skip_current_row) {
@@ -56,6 +57,7 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
56
57
 
57
58
  /* Filter by string row values listed in meta->only_rows */
58
59
  if ((meta->only_rows != NULL) &&
60
+ (field_str != NULL) && /* TODO: What if we want to filter out NULLs? */
59
61
  (meta->current_col < meta->num_only_rows) &&
60
62
  (meta->only_rows[meta->current_col] != NULL) &&
61
63
  (strcmp(meta->only_rows[meta->current_col], field_str))) {
@@ -74,8 +76,12 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
74
76
  /* Assigning appropriate default value if applicable. */
75
77
  if (meta->current_col < meta->num_row_defaults) {
76
78
  parsed_field = meta->row_defaults[meta->current_col];
77
- } else { /* By default, default is nil */
78
- parsed_field = Qnil;
79
+ } else { /* It depends on empty_field_is_nil if we convert empty strings to nils */
80
+ if (meta->empty_field_is_nil || field_str == NULL) {
81
+ parsed_field = Qnil;
82
+ } else {
83
+ parsed_field = rb_str_new2("");
84
+ }
79
85
  }
80
86
  } else {
81
87
  if (meta->current_col < meta->num_row_conversions) {
@@ -136,10 +142,10 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
136
142
  (int)meta->num_columns
137
143
  );
138
144
  } else {
139
- rb_hash_aset(last_entry, meta->column_names[meta->current_col], parsed_field);
145
+ rb_hash_aset(meta->last_entry, meta->column_names[meta->current_col], parsed_field); /* last_entry[column_names[current_col]] = field */
140
146
  }
141
147
  } else { /* Parse into Array */
142
- rb_ary_push(last_entry, parsed_field); /* result << field */
148
+ rb_ary_push(meta->last_entry, parsed_field); /* last_entry << field */
143
149
  }
144
150
  }
145
151
 
@@ -154,16 +160,22 @@ void end_of_line_callback(int last_char, void * data) {
154
160
 
155
161
  /* If filters didn't match, current row parsing is reverted */
156
162
  if (meta->skip_current_row) {
157
- rb_ary_pop(*(meta->result)); /* result.pop */
163
+ /* Do we wanna GC? */
158
164
  meta->skip_current_row = false;
165
+ } else {
166
+ if (rb_block_given_p()) { /* STREAMING */
167
+ rb_yield(meta->last_entry);
168
+ } else {
169
+ rb_ary_push(*(meta->result), meta->last_entry);
170
+ }
159
171
  }
160
172
 
161
- /* Add a new empty array/hash for the next line unless EOF reached */
173
+ /* Re-initialize last_entry unless EOF reached */
162
174
  if (last_char != -1) {
163
175
  if (meta->row_as_hash) {
164
- rb_ary_push(*(meta->result), rb_hash_new()); /* result << {} */
176
+ meta->last_entry = rb_hash_new(); /* {} */
165
177
  } else {
166
- rb_ary_push(*(meta->result), rb_ary_new()); /* result << [] */
178
+ meta->last_entry = rb_ary_new(); /* [] */
167
179
  }
168
180
  }
169
181
 
@@ -175,12 +187,19 @@ void end_of_line_callback(int last_char, void * data) {
175
187
  return;
176
188
  }
177
189
 
190
+ void custom_end_of_line_callback(int last_char, void * data) {
191
+ struct rcsv_metadata * meta = (struct rcsv_metadata *) data;
192
+
193
+ if (!meta->skip_current_row) {
194
+ }
195
+ }
196
+
178
197
  /* C API */
179
198
 
180
199
  /* The main method that handles parsing */
181
200
  static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
182
201
  struct rcsv_metadata meta;
183
- VALUE str, options, option;
202
+ VALUE csvio, csvstr, buffer_size, options, option;
184
203
 
185
204
  struct csv_parser cp;
186
205
  unsigned char csv_options = CSV_STRICT_FINI | CSV_APPEND_NULL;
@@ -191,6 +210,7 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
191
210
 
192
211
  /* Setting up some sane defaults */
193
212
  meta.row_as_hash = false;
213
+ meta.empty_field_is_nil = false;
194
214
  meta.skip_current_row = false;
195
215
  meta.num_columns = 0;
196
216
  meta.current_col = 0;
@@ -205,22 +225,34 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
205
225
  meta.column_names = NULL;
206
226
  meta.result = (VALUE[]){rb_ary_new()}; /* [] */
207
227
 
208
- /* str is required, options is optional (pun intended) */
209
- rb_scan_args(argc, argv, "11", &str, &options);
210
- csv_string = StringValuePtr(str);
211
- csv_string_len = strlen(csv_string);
228
+ /* csvio is required, options is optional (pun intended) */
229
+ rb_scan_args(argc, argv, "11", &csvio, &options);
212
230
 
213
231
  /* options ||= nil */
214
232
  if (NIL_P(options)) {
215
233
  options = rb_hash_new();
216
234
  }
217
235
 
236
+ buffer_size = rb_hash_aref(options, ID2SYM(rb_intern("buffer_size")));
237
+
218
238
  /* By default, parsing is strict */
219
239
  option = rb_hash_aref(options, ID2SYM(rb_intern("nostrict")));
220
240
  if (!option || (option == Qnil)) {
221
241
  csv_options |= CSV_STRICT;
222
242
  }
223
243
 
244
+ /* By default, empty strings are treated as Nils and quoted empty strings are treated as empty Ruby strings */
245
+ option = rb_hash_aref(options, ID2SYM(rb_intern("parse_empty_fields_as")));
246
+ if ((option == Qnil) || (option == ID2SYM(rb_intern("nil_or_string")))) {
247
+ csv_options |= CSV_EMPTY_IS_NULL;
248
+ } else if (option == ID2SYM(rb_intern("nil"))) {
249
+ meta.empty_field_is_nil = true;
250
+ } else if (option == ID2SYM(rb_intern("string"))) {
251
+ meta.empty_field_is_nil = false;
252
+ } else {
253
+ rb_raise(rcsv_parse_error, "The only valid options for :parse_empty_fields_as are :nil, :string and :nil_or_string, but %s was supplied.", RSTRING_PTR(rb_inspect(option)));
254
+ }
255
+
224
256
  /* Try to initialize libcsv */
225
257
  if (csv_init(&cp, csv_options) == -1) {
226
258
  rb_raise(rcsv_parse_error, "Couldn't initialize libcsv");
@@ -283,12 +315,14 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
283
315
  meta.row_conversions = StringValuePtr(option);
284
316
  }
285
317
 
286
- /* Column names should be declared explicitly when parsing fields as Hashes */
318
+ /* Column names should be declared explicitly when parsing fields as Hashes */
287
319
  if (meta.row_as_hash) { /* Only matters for hash results */
288
320
  option = rb_hash_aref(options, ID2SYM(rb_intern("column_names")));
289
321
  if (option == Qnil) {
290
322
  rb_raise(rcsv_parse_error, ":row_as_hash requires :column_names to be set.");
291
323
  } else {
324
+ meta.last_entry = rb_hash_new();
325
+
292
326
  meta.num_columns = (size_t)RARRAY_LEN(option);
293
327
  meta.column_names = (VALUE*)malloc(meta.num_columns * sizeof(VALUE*));
294
328
 
@@ -296,34 +330,37 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
296
330
  meta.column_names[i] = rb_ary_entry(option, i);
297
331
  }
298
332
  }
299
- }
300
-
301
- /* Initializing result with empty Array */
302
- if (meta.row_as_hash) {
303
- rb_ary_push(*(meta.result), rb_hash_new()); /* [{}] */
304
333
  } else {
305
- rb_ary_push(*(meta.result), rb_ary_new()); /* [[]] */
334
+ meta.last_entry = rb_ary_new();
306
335
  }
307
336
 
308
- /* Actual parsing and error handling */
309
- if (csv_string_len != csv_parse(&cp, csv_string, strlen(csv_string),
310
- &end_of_field_callback, &end_of_line_callback, &meta)) {
311
- error = csv_error(&cp);
312
- switch(error) {
313
- case CSV_EPARSE:
314
- rb_raise(rcsv_parse_error, "Error when parsing malformed data");
337
+ while(true) {
338
+ csvstr = rb_funcall(csvio, rb_intern("read"), 1, buffer_size);
339
+ if ((csvstr == Qnil) || (RSTRING_LEN(csvstr) == 0)) { break; }
340
+
341
+ csv_string = StringValuePtr(csvstr);
342
+ csv_string_len = strlen(csv_string);
343
+
344
+ /* Actual parsing and error handling */
345
+ if (csv_string_len != csv_parse(&cp, csv_string, csv_string_len,
346
+ &end_of_field_callback, &end_of_line_callback, &meta)) {
347
+ error = csv_error(&cp);
348
+ switch(error) {
349
+ case CSV_EPARSE:
350
+ rb_raise(rcsv_parse_error, "Error when parsing malformed data");
351
+ break;
352
+ case CSV_ENOMEM:
353
+ rb_raise(rcsv_parse_error, "No memory");
354
+ break;
355
+ case CSV_ETOOBIG:
356
+ rb_raise(rcsv_parse_error, "Field data is too large");
357
+ break;
358
+ case CSV_EINVALID:
359
+ rb_raise(rcsv_parse_error, "%s", (const char *)csv_strerror(error));
315
360
  break;
316
- case CSV_ENOMEM:
317
- rb_raise(rcsv_parse_error, "No memory");
318
- break;
319
- case CSV_ETOOBIG:
320
- rb_raise(rcsv_parse_error, "Field data is too large");
321
- break;
322
- case CSV_EINVALID:
323
- rb_raise(rcsv_parse_error, "%s", (const char *)csv_strerror(error));
324
- break;
325
- default:
326
- rb_raise(rcsv_parse_error, "Failed due to unknown reason");
361
+ default:
362
+ rb_raise(rcsv_parse_error, "Failed due to unknown reason");
363
+ }
327
364
  }
328
365
  }
329
366
 
@@ -344,12 +381,16 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
344
381
  }
345
382
 
346
383
  /* Remove the last row if it's empty. That happens if CSV file ends with a newline. */
347
- if (RARRAY_LEN(rb_ary_entry(*(meta.result), -1)) == 0) {
384
+ if (RARRAY_LEN(*(meta.result)) && /* meta.result.size != 0 */
385
+ RARRAY_LEN(rb_ary_entry(*(meta.result), -1)) == 0) {
348
386
  rb_ary_pop(*(meta.result));
349
387
  }
350
388
 
351
- /* An array of arrays of strings is returned. */
352
- return *(meta.result);
389
+ if (rb_block_given_p()) {
390
+ return Qnil; /* STREAMING */
391
+ } else {
392
+ return *(meta.result); /* Return accumulated result */
393
+ }
353
394
  }
354
395
 
355
396
 
data/lib/rcsv/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  class Rcsv
2
- VERSION = "0.0.6"
2
+ VERSION = "0.0.8"
3
3
  end
data/lib/rcsv.rb CHANGED
@@ -1,8 +1,10 @@
1
1
  require "rcsv/rcsv"
2
2
  require "rcsv/version"
3
3
 
4
+ require "stringio"
5
+
4
6
  class Rcsv
5
- def self.parse(csv_data, options = {})
7
+ def self.parse(csv_data, options = {}, &block)
6
8
  #options = {
7
9
  #:column_separator => "\t",
8
10
  #:only_listed_columns => true,
@@ -25,16 +27,27 @@ class Rcsv
25
27
  raw_options[:col_sep] = options[:column_separator] && options[:column_separator][0] || ','
26
28
  raw_options[:offset_rows] = options[:offset_rows] || 0
27
29
  raw_options[:nostrict] = options[:nostrict]
30
+ raw_options[:parse_empty_fields_as] = options[:parse_empty_fields_as]
31
+ raw_options[:buffer_size] = options[:buffer_size] || 1024 * 1024 # 1 MiB
32
+
33
+ if csv_data.is_a?(String)
34
+ csv_data = StringIO.new(csv_data)
35
+ elsif !(csv_data.respond_to?(:lines) && csv_data.respond_to?(:read))
36
+ inspected_csv_data = csv_data.inspect
37
+ raise ParseError.new("Supplied CSV object #{inspected_csv_data[0..127]}#{inspected_csv_data.size > 128 ? '...' : ''} is neither String nor looks like IO object.")
38
+ end
39
+
40
+ initial_position = csv_data.pos
28
41
 
29
42
  case options[:header]
30
43
  when :use
31
- header = self.raw_parse(csv_data.lines.first, raw_options).first
44
+ header = self.raw_parse(StringIO.new(csv_data.lines.first), raw_options).first
32
45
  raw_options[:offset_rows] += 1
33
46
  when :skip
34
47
  header = (0..(csv_data.lines.first.split(raw_options[:col_sep]).count)).to_a
35
48
  raw_options[:offset_rows] += 1
36
49
  when :none
37
- header = (0..(csv_data.lines.first.split(raw_options[:col_sep]).count)).to_a
50
+ header = (0..(csv_data.lines.first.split(raw_options[:col_sep]).count)).to_a
38
51
  end
39
52
 
40
53
  raw_options[:row_as_hash] = options[:row_as_hash] # Setting after header parsing
@@ -86,6 +99,7 @@ class Rcsv
86
99
  raw_options[:row_conversions] = row_conversions
87
100
  end
88
101
 
89
- return self.raw_parse(csv_data, raw_options)
102
+ csv_data.pos = initial_position
103
+ return self.raw_parse(csv_data, raw_options, &block)
90
104
  end
91
105
  end
@@ -3,7 +3,7 @@ require 'rcsv'
3
3
 
4
4
  class RcsvTest < Test::Unit::TestCase
5
5
  def setup
6
- @csv_data = File.read('test/test_rcsv.csv')
6
+ @csv_data = File.open('test/test_rcsv.csv')
7
7
  end
8
8
 
9
9
  def test_rcsv
@@ -20,7 +20,7 @@ class RcsvTest < Test::Unit::TestCase
20
20
  end
21
21
 
22
22
  def test_rcsv_col_sep
23
- tsv_data = @csv_data.tr(",", "\t")
23
+ tsv_data = StringIO.new(@csv_data.read.tr(",", "\t"))
24
24
  raw_parsed_tsv_data = Rcsv.raw_parse(tsv_data, :col_sep => "\t")
25
25
 
26
26
  assert_equal(raw_parsed_tsv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
@@ -33,8 +33,27 @@ class RcsvTest < Test::Unit::TestCase
33
33
  assert_equal(raw_parsed_tsv_data[888][13], "Dallas\t TX")
34
34
  end
35
35
 
36
+ def test_buffer_size
37
+ raw_parsed_csv_data = Rcsv.raw_parse(@csv_data, :buffer_size => 10)
38
+
39
+ assert_equal(raw_parsed_csv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
40
+ assert_equal(raw_parsed_csv_data[0][13], '$$$908080')
41
+ assert_equal(raw_parsed_csv_data[0][14], '"')
42
+ assert_equal(raw_parsed_csv_data[0][15], 'true/false')
43
+ assert_equal(raw_parsed_csv_data[0][16], nil)
44
+ assert_equal(raw_parsed_csv_data[9][2], nil)
45
+ assert_equal(raw_parsed_csv_data[3][6], '""C81E-=; **ECCB; .. 89')
46
+ assert_equal(raw_parsed_csv_data[888][13], 'Dallas, TX')
47
+ end
48
+
49
+ def test_single_item_csv
50
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new("Foo"))
51
+
52
+ assert_equal(raw_parsed_csv_data, [["Foo"]])
53
+ end
54
+
36
55
  def test_broken_data
37
- broken_data = @csv_data.sub(/"/, '')
56
+ broken_data = StringIO.new(@csv_data.read.sub(/"/, ''))
38
57
 
39
58
  assert_raise(Rcsv::ParseError) do
40
59
  Rcsv.raw_parse(broken_data)
@@ -42,7 +61,7 @@ class RcsvTest < Test::Unit::TestCase
42
61
  end
43
62
 
44
63
  def test_broken_data_without_strict
45
- broken_data = @csv_data.sub(/"/, '')
64
+ broken_data = StringIO.new(@csv_data.read.sub(/"/, ''))
46
65
 
47
66
  raw_parsed_csv_data = Rcsv.raw_parse(broken_data, :nostrict => true)
48
67
  assert_equal(["DSAdsfksjh", "iii ooo iii", "EDADEDADEDADEDADEDADEDAD", "111 333 555", "NMLKTF", "---==---", "//", "###", "0000000000", "Asdad bvd qwert", ";'''sd", "@@@", "OCTZ", "$$$908080", "\",true/false\nC85A5B9F,85259637,,96,6838,1983-06-14,\"\"\"C4CA-=; **1679; .. 79", "210,11", "908e", "1281-03-09", "7257.4654049904275", "20efe749-50fe-4b6a-a603-7f9cd1dc6c6d", "3", "New York, NY", "u", "2.228169203286535", "t"], raw_parsed_csv_data.first)
@@ -83,7 +102,7 @@ class RcsvTest < Test::Unit::TestCase
83
102
  end
84
103
 
85
104
  def test_row_conversions
86
- raw_parsed_csv_data = Rcsv.raw_parse(@csv_data.each_line.to_a[1..-1].join, # skipping string headers
105
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(@csv_data.each_line.to_a[1..-1].join), # skipping string headers
87
106
  :row_conversions => 'sisiisssssfsissf')
88
107
 
89
108
  assert_equal(raw_parsed_csv_data[0][2], nil)
@@ -94,7 +113,7 @@ class RcsvTest < Test::Unit::TestCase
94
113
  end
95
114
 
96
115
  def test_row_conversions_with_column_exclusions
97
- raw_parsed_csv_data = Rcsv.raw_parse(@csv_data.each_line.to_a[1..-1].join, # skipping string headers
116
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(@csv_data.each_line.to_a[1..-1].join), # skipping string headers
98
117
  :row_conversions => 's f issss fsis fb')
99
118
 
100
119
  assert_equal(raw_parsed_csv_data[0][1], nil)
@@ -153,4 +172,96 @@ class RcsvTest < Test::Unit::TestCase
153
172
  'booleator' => 't'
154
173
  }, raw_parsed_csv_data[1])
155
174
  end
175
+
176
+ def test_array_block_streaming
177
+ raw_parsed_csv_data = []
178
+
179
+ result = Rcsv.raw_parse(@csv_data) { |row|
180
+ raw_parsed_csv_data << row
181
+ }
182
+
183
+ assert_equal(nil, result)
184
+ assert_equal(raw_parsed_csv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
185
+ assert_equal(raw_parsed_csv_data[0][13], '$$$908080')
186
+ assert_equal(raw_parsed_csv_data[0][14], '"')
187
+ assert_equal(raw_parsed_csv_data[0][15], 'true/false')
188
+ assert_equal(raw_parsed_csv_data[0][16], nil)
189
+ assert_equal(raw_parsed_csv_data[9][2], nil)
190
+ assert_equal(raw_parsed_csv_data[3][6], '""C81E-=; **ECCB; .. 89')
191
+ assert_equal(raw_parsed_csv_data[888][13], 'Dallas, TX')
192
+ end
193
+
194
+ def test_hash_block_streaming
195
+ raw_parsed_csv_data = []
196
+ result = Rcsv.raw_parse(@csv_data, :row_as_hash => true, :column_names => [
197
+ 'DSAdsfksjh',
198
+ 'iii ooo iii',
199
+ 'EDADEDADEDADEDADEDADEDAD',
200
+ '111 333 555',
201
+ 'NMLKTF',
202
+ '---==---',
203
+ '//',
204
+ '###',
205
+ '0000000000',
206
+ 'Asdad bvd qwert',
207
+ ";'''sd",
208
+ '@@@',
209
+ 'OCTZ',
210
+ '$$$908080',
211
+ '"',
212
+ 'noname',
213
+ 'booleator'
214
+ ]) { |row|
215
+ raw_parsed_csv_data << row
216
+ }
217
+
218
+ assert_equal(nil, result)
219
+ assert_equal({
220
+ 'DSAdsfksjh' => 'C85A5B9F',
221
+ 'iii ooo iii' => '85259637',
222
+ 'EDADEDADEDADEDADEDADEDAD' => nil,
223
+ '111 333 555' => '96',
224
+ 'NMLKTF' => '6838',
225
+ '---==---' => '1983-06-14',
226
+ '//' => '""C4CA-=; **1679; .. 79',
227
+ '###' => '210,11',
228
+ '0000000000' => '908e',
229
+ 'Asdad bvd qwert' => '1281-03-09',
230
+ ";'''sd" => '7257.4654049904275',
231
+ '@@@' => '20efe749-50fe-4b6a-a603-7f9cd1dc6c6d',
232
+ 'OCTZ' => '3',
233
+ '$$$908080' => "New York, NY",
234
+ '"' => 'u',
235
+ 'noname' => '2.228169203286535',
236
+ 'booleator' => 't'
237
+ }, raw_parsed_csv_data[1])
238
+ end
239
+
240
+ def test_nils_and_empty_strings_default
241
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"))
242
+
243
+ assert_equal([nil, '', nil, nil, nil, nil], raw_parsed_csv_data[0])
244
+ assert_equal([nil, nil, '', '', nil, nil], raw_parsed_csv_data[1])
245
+ end
246
+
247
+ def test_nils_and_empty_strings_nil
248
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :nil)
249
+
250
+ assert_equal([nil, nil, nil, nil, nil, nil], raw_parsed_csv_data[0])
251
+ assert_equal([nil, nil, nil, nil, nil, nil], raw_parsed_csv_data[1])
252
+ end
253
+
254
+ def test_nils_and_empty_strings_string
255
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :string)
256
+
257
+ assert_equal(['', '', '', '', '', ''], raw_parsed_csv_data[0])
258
+ assert_equal(['', '', '', '', '', ''], raw_parsed_csv_data[1])
259
+ end
260
+
261
+ def test_nils_and_empty_strings_nil_or_string
262
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :nil_or_string)
263
+
264
+ assert_equal([nil, '', nil, nil, nil, nil], raw_parsed_csv_data[0])
265
+ assert_equal([nil, nil, '', '', nil, nil], raw_parsed_csv_data[1])
266
+ end
156
267
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rcsv
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.6
4
+ version: 0.0.8
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-11-16 00:00:00.000000000 Z
12
+ date: 2013-01-11 00:00:00.000000000 Z
13
13
  dependencies: []
14
14
  description: A libcsv-based CSV parser for Ruby
15
15
  email:
@@ -26,6 +26,7 @@ files:
26
26
  - Gemfile.lock
27
27
  - LICENSE
28
28
  - README.md
29
+ - RELNOTES
29
30
  - Rakefile
30
31
  - bench.rb
31
32
  - ext/rcsv/csv.h