rcsv 0.0.6 → 0.0.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- rcsv (0.0.6)
4
+ rcsv (0.0.8)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
data/README.md CHANGED
@@ -4,7 +4,7 @@
4
4
 
5
5
  Rcsv is a fast CSV parsing library for MRI Ruby. Tested on REE 1.8.7 and Ruby 1.9.3.
6
6
 
7
- Contrary to many other gems that implement their own parsers, Rcsv uses libcsv 3.0.2 (http://sourceforge.net/projects/libcsv/). As long as libcsv's API is stable, getting Rcsv to use newer libcsv version is as simple as updating two files (csv.h and libcsv.c).
7
+ Contrary to many other gems that implement their own parsers, Rcsv uses libcsv 3.0.3 (http://sourceforge.net/projects/libcsv/). As long as libcsv's API is stable, getting Rcsv to use newer libcsv version is as simple as updating two files (csv.h and libcsv.c).
8
8
 
9
9
  ## Benchmarks
10
10
  user system total real
@@ -48,6 +48,7 @@ Quickstart:
48
48
 
49
49
 
50
50
  Rcsv class exposes a class method *parse* that accepts a CSV string as its first parameter and options hash as its second parameter.
51
+ If block is passed, Rcsv sequentially yields every parsed line to it and the #parse method itself returns nil.
51
52
 
52
53
 
53
54
  Options supported:
@@ -60,7 +61,19 @@ A single-character string that is used as a separator. Default is ",".
60
61
 
61
62
  A boolean flag. When enabled, allows to parse oddly quoted CSV data without exceptions being raised. Disabled by default.
62
63
 
63
- Anything that does not conform to http://www.ietf.org/rfc/rfc4180.txt should better be parsed with this option enabled.
64
+ Anything that does not conform to http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm should better be parsed with this option enabled.
65
+
66
+ ### :parse_empty_fields_as
67
+
68
+ A Ruby symbol that specifies how empty CSV fields should be processed. Accepted values:
69
+
70
+ * :nil_or_string (default) - If empty field is quoted, it is parsed as empty Ruby string. If empty field is not quoted, it is parsed as Nil.
71
+
72
+ * :nil - Always parse as Nil.
73
+
74
+ * :string - Always parse as empty string.
75
+
76
+ This option doesn't affect defaults processing: all empty fields are replaced with default values if the latter are provided (via per-column :default).
64
77
 
65
78
  ### :offset_rows
66
79
 
@@ -73,7 +86,7 @@ If CSV has a header, :columns keys can be strings that are equal to column names
73
86
 
74
87
  :columns values are in turn hashes that provide parsing options:
75
88
 
76
- * :alias - Object of any type (though usually a Symbol) that is used to as a key that represents column name when :row_as_hash is set.
89
+ * :alias - Object of any type (though usually a Symbol) that is used as a key that represents column name when :row_as_hash is set.
77
90
  * :type - A Ruby Symbol that specifies Ruby data type that CSV cell value should be converted into. Supported types: :int, :float, :string, :bool. :string is the default.
78
91
  * :default - Object of any type (though usually of the same type that is specified by :type option). If CSV doesn't have any value for a cell, this default value is used.
79
92
  * :match - A string. If set, makes Rcsv skip all the rows where any column doesn't match its :match value. Useful for filtering data.
@@ -96,6 +109,10 @@ When :row_as_hash is disabled, return value is represented as array of arrays.
96
109
  ### :only_listed_columns
97
110
  A boolean flag. If enabled, only parses columns that are listed in :columns. Disabled by default.
98
111
 
112
+ ### :buffer_size
113
+ An integer. Default is 1MiB (1024 * 1024).
114
+ Specifies a number of bytes that are read at once, thus allowing to read drectly from IO-like objects (files, sockets etc).
115
+
99
116
 
100
117
  ## Examples
101
118
 
@@ -132,12 +149,29 @@ The result would look like this:
132
149
  [ nil, 0, "Vacuum" ]
133
150
  ]
134
151
 
152
+ And here is an example of passing a block:
153
+
154
+ Rcsv.parse(some_csv) { |row|
155
+ puts row.inspect
156
+ }
157
+
158
+ That would display contents of each row without needing to put the whole parsed result array to memory:
159
+
160
+ ["a", "b", "c", "d", "e", "f"]
161
+ ["1", "2", "3", "4", "5", "6"]
162
+
163
+
164
+ This way it is possible to read from a File directly, with a 20MiB buffer and parse lines one by one:
165
+
166
+ Rcsv.parse(File.open('/some/file.csv'), :buffer_size => 20 * 1024 * 1024) { |row|
167
+ puts row.inspect
168
+ }
169
+
135
170
 
136
171
  ## To do
137
172
 
138
173
  * More specs for boolean values
139
174
  * Specs for Ruby parse
140
- * Add custom Ruby callbacks (if block is passed)
141
175
  * Add CSV write support
142
176
 
143
177
 
data/RELNOTES ADDED
@@ -0,0 +1,5 @@
1
+ Version 0.0.8
2
+ * libcsv upgraded to 3.0.3
3
+ * :parse_empty_fields_as option added
4
+ * README.md updated with a proper reference to CSV parsing guidelines
5
+ * RELNOTES added
data/ext/rcsv/csv.h CHANGED
@@ -8,8 +8,8 @@ extern "C" {
8
8
  #endif
9
9
 
10
10
  #define CSV_MAJOR 3
11
- #define CSV_MINOR 1
12
- #define CSV_RELEASE 0
11
+ #define CSV_MINOR 0
12
+ #define CSV_RELEASE 3
13
13
 
14
14
  /* Error Codes */
15
15
  #define CSV_SUCCESS 0
@@ -25,7 +25,9 @@ extern "C" {
25
25
  #define CSV_STRICT_FINI 4 /* causes csv_fini to return CSV_EPARSE if last
26
26
  field is quoted and doesn't containg ending
27
27
  quote */
28
- #define CSV_APPEND_NULL 8 /* Ensure that all fields are null-ternimated */
28
+ #define CSV_APPEND_NULL 8 /* Ensure that all fields are null-terminated */
29
+ #define CSV_EMPTY_IS_NULL 16 /* Pass null pointer to cb1 function when
30
+ empty, unquoted fields are encountered */
29
31
 
30
32
 
31
33
  /* Character values */
data/ext/rcsv/libcsv.c CHANGED
@@ -25,7 +25,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
25
25
 
26
26
  #include "csv.h"
27
27
 
28
- #define VERSION "3.0.2"
28
+ #define VERSION "3.0.3"
29
29
 
30
30
  #define ROW_NOT_BEGUN 0
31
31
  #define FIELD_NOT_BEGUN 1
@@ -50,7 +50,9 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
50
50
  entry_pos -= spaces; \
51
51
  if (p->options & CSV_APPEND_NULL) \
52
52
  ((p)->entry_buf[entry_pos]) = '\0'; \
53
- if (cb1) \
53
+ if (cb1 && (p->options & CSV_EMPTY_IS_NULL) && !quoted && entry_pos == 0) \
54
+ cb1(NULL, entry_pos, data); \
55
+ else if (cb1) \
54
56
  cb1(p->entry_buf, entry_pos, data); \
55
57
  pstate = FIELD_NOT_BEGUN; \
56
58
  entry_pos = quoted = spaces = 0; \
data/ext/rcsv/rcsv.c CHANGED
@@ -12,6 +12,7 @@ static VALUE rcsv_parse_error; /* class Rcsv::ParseError << StandardError; end *
12
12
  struct rcsv_metadata {
13
13
  /* Derived from user-specified options */
14
14
  bool row_as_hash; /* Used to return array of hashes rather than array of arrays */
15
+ bool empty_field_is_nil; /* Do we convert empty fields to nils? */
15
16
  size_t offset_rows; /* Number of rows to skip before parsing */
16
17
 
17
18
  char * row_conversions; /* A pointer to string/array of row conversions char specifiers */
@@ -30,6 +31,7 @@ struct rcsv_metadata {
30
31
  size_t current_col; /* Current column's index */
31
32
  size_t current_row; /* Current row's index */
32
33
 
34
+ VALUE last_entry; /* A pointer to the last entry that's going to be appended to result */
33
35
  VALUE * result; /* A pointer to the parsed data */
34
36
  };
35
37
 
@@ -41,7 +43,6 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
41
43
  struct rcsv_metadata * meta = (struct rcsv_metadata *) data;
42
44
  char row_conversion = 0;
43
45
  VALUE parsed_field;
44
- VALUE last_entry = rb_ary_entry(*(meta->result), -1); /* result.last */
45
46
 
46
47
  /* No need to parse anything until the end of the line if skip_current_row is set */
47
48
  if (meta->skip_current_row) {
@@ -56,6 +57,7 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
56
57
 
57
58
  /* Filter by string row values listed in meta->only_rows */
58
59
  if ((meta->only_rows != NULL) &&
60
+ (field_str != NULL) && /* TODO: What if we want to filter out NULLs? */
59
61
  (meta->current_col < meta->num_only_rows) &&
60
62
  (meta->only_rows[meta->current_col] != NULL) &&
61
63
  (strcmp(meta->only_rows[meta->current_col], field_str))) {
@@ -74,8 +76,12 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
74
76
  /* Assigning appropriate default value if applicable. */
75
77
  if (meta->current_col < meta->num_row_defaults) {
76
78
  parsed_field = meta->row_defaults[meta->current_col];
77
- } else { /* By default, default is nil */
78
- parsed_field = Qnil;
79
+ } else { /* It depends on empty_field_is_nil if we convert empty strings to nils */
80
+ if (meta->empty_field_is_nil || field_str == NULL) {
81
+ parsed_field = Qnil;
82
+ } else {
83
+ parsed_field = rb_str_new2("");
84
+ }
79
85
  }
80
86
  } else {
81
87
  if (meta->current_col < meta->num_row_conversions) {
@@ -136,10 +142,10 @@ void end_of_field_callback(void * field, size_t field_size, void * data) {
136
142
  (int)meta->num_columns
137
143
  );
138
144
  } else {
139
- rb_hash_aset(last_entry, meta->column_names[meta->current_col], parsed_field);
145
+ rb_hash_aset(meta->last_entry, meta->column_names[meta->current_col], parsed_field); /* last_entry[column_names[current_col]] = field */
140
146
  }
141
147
  } else { /* Parse into Array */
142
- rb_ary_push(last_entry, parsed_field); /* result << field */
148
+ rb_ary_push(meta->last_entry, parsed_field); /* last_entry << field */
143
149
  }
144
150
  }
145
151
 
@@ -154,16 +160,22 @@ void end_of_line_callback(int last_char, void * data) {
154
160
 
155
161
  /* If filters didn't match, current row parsing is reverted */
156
162
  if (meta->skip_current_row) {
157
- rb_ary_pop(*(meta->result)); /* result.pop */
163
+ /* Do we wanna GC? */
158
164
  meta->skip_current_row = false;
165
+ } else {
166
+ if (rb_block_given_p()) { /* STREAMING */
167
+ rb_yield(meta->last_entry);
168
+ } else {
169
+ rb_ary_push(*(meta->result), meta->last_entry);
170
+ }
159
171
  }
160
172
 
161
- /* Add a new empty array/hash for the next line unless EOF reached */
173
+ /* Re-initialize last_entry unless EOF reached */
162
174
  if (last_char != -1) {
163
175
  if (meta->row_as_hash) {
164
- rb_ary_push(*(meta->result), rb_hash_new()); /* result << {} */
176
+ meta->last_entry = rb_hash_new(); /* {} */
165
177
  } else {
166
- rb_ary_push(*(meta->result), rb_ary_new()); /* result << [] */
178
+ meta->last_entry = rb_ary_new(); /* [] */
167
179
  }
168
180
  }
169
181
 
@@ -175,12 +187,19 @@ void end_of_line_callback(int last_char, void * data) {
175
187
  return;
176
188
  }
177
189
 
190
+ void custom_end_of_line_callback(int last_char, void * data) {
191
+ struct rcsv_metadata * meta = (struct rcsv_metadata *) data;
192
+
193
+ if (!meta->skip_current_row) {
194
+ }
195
+ }
196
+
178
197
  /* C API */
179
198
 
180
199
  /* The main method that handles parsing */
181
200
  static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
182
201
  struct rcsv_metadata meta;
183
- VALUE str, options, option;
202
+ VALUE csvio, csvstr, buffer_size, options, option;
184
203
 
185
204
  struct csv_parser cp;
186
205
  unsigned char csv_options = CSV_STRICT_FINI | CSV_APPEND_NULL;
@@ -191,6 +210,7 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
191
210
 
192
211
  /* Setting up some sane defaults */
193
212
  meta.row_as_hash = false;
213
+ meta.empty_field_is_nil = false;
194
214
  meta.skip_current_row = false;
195
215
  meta.num_columns = 0;
196
216
  meta.current_col = 0;
@@ -205,22 +225,34 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
205
225
  meta.column_names = NULL;
206
226
  meta.result = (VALUE[]){rb_ary_new()}; /* [] */
207
227
 
208
- /* str is required, options is optional (pun intended) */
209
- rb_scan_args(argc, argv, "11", &str, &options);
210
- csv_string = StringValuePtr(str);
211
- csv_string_len = strlen(csv_string);
228
+ /* csvio is required, options is optional (pun intended) */
229
+ rb_scan_args(argc, argv, "11", &csvio, &options);
212
230
 
213
231
  /* options ||= nil */
214
232
  if (NIL_P(options)) {
215
233
  options = rb_hash_new();
216
234
  }
217
235
 
236
+ buffer_size = rb_hash_aref(options, ID2SYM(rb_intern("buffer_size")));
237
+
218
238
  /* By default, parsing is strict */
219
239
  option = rb_hash_aref(options, ID2SYM(rb_intern("nostrict")));
220
240
  if (!option || (option == Qnil)) {
221
241
  csv_options |= CSV_STRICT;
222
242
  }
223
243
 
244
+ /* By default, empty strings are treated as Nils and quoted empty strings are treated as empty Ruby strings */
245
+ option = rb_hash_aref(options, ID2SYM(rb_intern("parse_empty_fields_as")));
246
+ if ((option == Qnil) || (option == ID2SYM(rb_intern("nil_or_string")))) {
247
+ csv_options |= CSV_EMPTY_IS_NULL;
248
+ } else if (option == ID2SYM(rb_intern("nil"))) {
249
+ meta.empty_field_is_nil = true;
250
+ } else if (option == ID2SYM(rb_intern("string"))) {
251
+ meta.empty_field_is_nil = false;
252
+ } else {
253
+ rb_raise(rcsv_parse_error, "The only valid options for :parse_empty_fields_as are :nil, :string and :nil_or_string, but %s was supplied.", RSTRING_PTR(rb_inspect(option)));
254
+ }
255
+
224
256
  /* Try to initialize libcsv */
225
257
  if (csv_init(&cp, csv_options) == -1) {
226
258
  rb_raise(rcsv_parse_error, "Couldn't initialize libcsv");
@@ -283,12 +315,14 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
283
315
  meta.row_conversions = StringValuePtr(option);
284
316
  }
285
317
 
286
- /* Column names should be declared explicitly when parsing fields as Hashes */
318
+ /* Column names should be declared explicitly when parsing fields as Hashes */
287
319
  if (meta.row_as_hash) { /* Only matters for hash results */
288
320
  option = rb_hash_aref(options, ID2SYM(rb_intern("column_names")));
289
321
  if (option == Qnil) {
290
322
  rb_raise(rcsv_parse_error, ":row_as_hash requires :column_names to be set.");
291
323
  } else {
324
+ meta.last_entry = rb_hash_new();
325
+
292
326
  meta.num_columns = (size_t)RARRAY_LEN(option);
293
327
  meta.column_names = (VALUE*)malloc(meta.num_columns * sizeof(VALUE*));
294
328
 
@@ -296,34 +330,37 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
296
330
  meta.column_names[i] = rb_ary_entry(option, i);
297
331
  }
298
332
  }
299
- }
300
-
301
- /* Initializing result with empty Array */
302
- if (meta.row_as_hash) {
303
- rb_ary_push(*(meta.result), rb_hash_new()); /* [{}] */
304
333
  } else {
305
- rb_ary_push(*(meta.result), rb_ary_new()); /* [[]] */
334
+ meta.last_entry = rb_ary_new();
306
335
  }
307
336
 
308
- /* Actual parsing and error handling */
309
- if (csv_string_len != csv_parse(&cp, csv_string, strlen(csv_string),
310
- &end_of_field_callback, &end_of_line_callback, &meta)) {
311
- error = csv_error(&cp);
312
- switch(error) {
313
- case CSV_EPARSE:
314
- rb_raise(rcsv_parse_error, "Error when parsing malformed data");
337
+ while(true) {
338
+ csvstr = rb_funcall(csvio, rb_intern("read"), 1, buffer_size);
339
+ if ((csvstr == Qnil) || (RSTRING_LEN(csvstr) == 0)) { break; }
340
+
341
+ csv_string = StringValuePtr(csvstr);
342
+ csv_string_len = strlen(csv_string);
343
+
344
+ /* Actual parsing and error handling */
345
+ if (csv_string_len != csv_parse(&cp, csv_string, csv_string_len,
346
+ &end_of_field_callback, &end_of_line_callback, &meta)) {
347
+ error = csv_error(&cp);
348
+ switch(error) {
349
+ case CSV_EPARSE:
350
+ rb_raise(rcsv_parse_error, "Error when parsing malformed data");
351
+ break;
352
+ case CSV_ENOMEM:
353
+ rb_raise(rcsv_parse_error, "No memory");
354
+ break;
355
+ case CSV_ETOOBIG:
356
+ rb_raise(rcsv_parse_error, "Field data is too large");
357
+ break;
358
+ case CSV_EINVALID:
359
+ rb_raise(rcsv_parse_error, "%s", (const char *)csv_strerror(error));
315
360
  break;
316
- case CSV_ENOMEM:
317
- rb_raise(rcsv_parse_error, "No memory");
318
- break;
319
- case CSV_ETOOBIG:
320
- rb_raise(rcsv_parse_error, "Field data is too large");
321
- break;
322
- case CSV_EINVALID:
323
- rb_raise(rcsv_parse_error, "%s", (const char *)csv_strerror(error));
324
- break;
325
- default:
326
- rb_raise(rcsv_parse_error, "Failed due to unknown reason");
361
+ default:
362
+ rb_raise(rcsv_parse_error, "Failed due to unknown reason");
363
+ }
327
364
  }
328
365
  }
329
366
 
@@ -344,12 +381,16 @@ static VALUE rb_rcsv_raw_parse(int argc, VALUE * argv, VALUE self) {
344
381
  }
345
382
 
346
383
  /* Remove the last row if it's empty. That happens if CSV file ends with a newline. */
347
- if (RARRAY_LEN(rb_ary_entry(*(meta.result), -1)) == 0) {
384
+ if (RARRAY_LEN(*(meta.result)) && /* meta.result.size != 0 */
385
+ RARRAY_LEN(rb_ary_entry(*(meta.result), -1)) == 0) {
348
386
  rb_ary_pop(*(meta.result));
349
387
  }
350
388
 
351
- /* An array of arrays of strings is returned. */
352
- return *(meta.result);
389
+ if (rb_block_given_p()) {
390
+ return Qnil; /* STREAMING */
391
+ } else {
392
+ return *(meta.result); /* Return accumulated result */
393
+ }
353
394
  }
354
395
 
355
396
 
data/lib/rcsv/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  class Rcsv
2
- VERSION = "0.0.6"
2
+ VERSION = "0.0.8"
3
3
  end
data/lib/rcsv.rb CHANGED
@@ -1,8 +1,10 @@
1
1
  require "rcsv/rcsv"
2
2
  require "rcsv/version"
3
3
 
4
+ require "stringio"
5
+
4
6
  class Rcsv
5
- def self.parse(csv_data, options = {})
7
+ def self.parse(csv_data, options = {}, &block)
6
8
  #options = {
7
9
  #:column_separator => "\t",
8
10
  #:only_listed_columns => true,
@@ -25,16 +27,27 @@ class Rcsv
25
27
  raw_options[:col_sep] = options[:column_separator] && options[:column_separator][0] || ','
26
28
  raw_options[:offset_rows] = options[:offset_rows] || 0
27
29
  raw_options[:nostrict] = options[:nostrict]
30
+ raw_options[:parse_empty_fields_as] = options[:parse_empty_fields_as]
31
+ raw_options[:buffer_size] = options[:buffer_size] || 1024 * 1024 # 1 MiB
32
+
33
+ if csv_data.is_a?(String)
34
+ csv_data = StringIO.new(csv_data)
35
+ elsif !(csv_data.respond_to?(:lines) && csv_data.respond_to?(:read))
36
+ inspected_csv_data = csv_data.inspect
37
+ raise ParseError.new("Supplied CSV object #{inspected_csv_data[0..127]}#{inspected_csv_data.size > 128 ? '...' : ''} is neither String nor looks like IO object.")
38
+ end
39
+
40
+ initial_position = csv_data.pos
28
41
 
29
42
  case options[:header]
30
43
  when :use
31
- header = self.raw_parse(csv_data.lines.first, raw_options).first
44
+ header = self.raw_parse(StringIO.new(csv_data.lines.first), raw_options).first
32
45
  raw_options[:offset_rows] += 1
33
46
  when :skip
34
47
  header = (0..(csv_data.lines.first.split(raw_options[:col_sep]).count)).to_a
35
48
  raw_options[:offset_rows] += 1
36
49
  when :none
37
- header = (0..(csv_data.lines.first.split(raw_options[:col_sep]).count)).to_a
50
+ header = (0..(csv_data.lines.first.split(raw_options[:col_sep]).count)).to_a
38
51
  end
39
52
 
40
53
  raw_options[:row_as_hash] = options[:row_as_hash] # Setting after header parsing
@@ -86,6 +99,7 @@ class Rcsv
86
99
  raw_options[:row_conversions] = row_conversions
87
100
  end
88
101
 
89
- return self.raw_parse(csv_data, raw_options)
102
+ csv_data.pos = initial_position
103
+ return self.raw_parse(csv_data, raw_options, &block)
90
104
  end
91
105
  end
@@ -3,7 +3,7 @@ require 'rcsv'
3
3
 
4
4
  class RcsvTest < Test::Unit::TestCase
5
5
  def setup
6
- @csv_data = File.read('test/test_rcsv.csv')
6
+ @csv_data = File.open('test/test_rcsv.csv')
7
7
  end
8
8
 
9
9
  def test_rcsv
@@ -20,7 +20,7 @@ class RcsvTest < Test::Unit::TestCase
20
20
  end
21
21
 
22
22
  def test_rcsv_col_sep
23
- tsv_data = @csv_data.tr(",", "\t")
23
+ tsv_data = StringIO.new(@csv_data.read.tr(",", "\t"))
24
24
  raw_parsed_tsv_data = Rcsv.raw_parse(tsv_data, :col_sep => "\t")
25
25
 
26
26
  assert_equal(raw_parsed_tsv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
@@ -33,8 +33,27 @@ class RcsvTest < Test::Unit::TestCase
33
33
  assert_equal(raw_parsed_tsv_data[888][13], "Dallas\t TX")
34
34
  end
35
35
 
36
+ def test_buffer_size
37
+ raw_parsed_csv_data = Rcsv.raw_parse(@csv_data, :buffer_size => 10)
38
+
39
+ assert_equal(raw_parsed_csv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
40
+ assert_equal(raw_parsed_csv_data[0][13], '$$$908080')
41
+ assert_equal(raw_parsed_csv_data[0][14], '"')
42
+ assert_equal(raw_parsed_csv_data[0][15], 'true/false')
43
+ assert_equal(raw_parsed_csv_data[0][16], nil)
44
+ assert_equal(raw_parsed_csv_data[9][2], nil)
45
+ assert_equal(raw_parsed_csv_data[3][6], '""C81E-=; **ECCB; .. 89')
46
+ assert_equal(raw_parsed_csv_data[888][13], 'Dallas, TX')
47
+ end
48
+
49
+ def test_single_item_csv
50
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new("Foo"))
51
+
52
+ assert_equal(raw_parsed_csv_data, [["Foo"]])
53
+ end
54
+
36
55
  def test_broken_data
37
- broken_data = @csv_data.sub(/"/, '')
56
+ broken_data = StringIO.new(@csv_data.read.sub(/"/, ''))
38
57
 
39
58
  assert_raise(Rcsv::ParseError) do
40
59
  Rcsv.raw_parse(broken_data)
@@ -42,7 +61,7 @@ class RcsvTest < Test::Unit::TestCase
42
61
  end
43
62
 
44
63
  def test_broken_data_without_strict
45
- broken_data = @csv_data.sub(/"/, '')
64
+ broken_data = StringIO.new(@csv_data.read.sub(/"/, ''))
46
65
 
47
66
  raw_parsed_csv_data = Rcsv.raw_parse(broken_data, :nostrict => true)
48
67
  assert_equal(["DSAdsfksjh", "iii ooo iii", "EDADEDADEDADEDADEDADEDAD", "111 333 555", "NMLKTF", "---==---", "//", "###", "0000000000", "Asdad bvd qwert", ";'''sd", "@@@", "OCTZ", "$$$908080", "\",true/false\nC85A5B9F,85259637,,96,6838,1983-06-14,\"\"\"C4CA-=; **1679; .. 79", "210,11", "908e", "1281-03-09", "7257.4654049904275", "20efe749-50fe-4b6a-a603-7f9cd1dc6c6d", "3", "New York, NY", "u", "2.228169203286535", "t"], raw_parsed_csv_data.first)
@@ -83,7 +102,7 @@ class RcsvTest < Test::Unit::TestCase
83
102
  end
84
103
 
85
104
  def test_row_conversions
86
- raw_parsed_csv_data = Rcsv.raw_parse(@csv_data.each_line.to_a[1..-1].join, # skipping string headers
105
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(@csv_data.each_line.to_a[1..-1].join), # skipping string headers
87
106
  :row_conversions => 'sisiisssssfsissf')
88
107
 
89
108
  assert_equal(raw_parsed_csv_data[0][2], nil)
@@ -94,7 +113,7 @@ class RcsvTest < Test::Unit::TestCase
94
113
  end
95
114
 
96
115
  def test_row_conversions_with_column_exclusions
97
- raw_parsed_csv_data = Rcsv.raw_parse(@csv_data.each_line.to_a[1..-1].join, # skipping string headers
116
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(@csv_data.each_line.to_a[1..-1].join), # skipping string headers
98
117
  :row_conversions => 's f issss fsis fb')
99
118
 
100
119
  assert_equal(raw_parsed_csv_data[0][1], nil)
@@ -153,4 +172,96 @@ class RcsvTest < Test::Unit::TestCase
153
172
  'booleator' => 't'
154
173
  }, raw_parsed_csv_data[1])
155
174
  end
175
+
176
+ def test_array_block_streaming
177
+ raw_parsed_csv_data = []
178
+
179
+ result = Rcsv.raw_parse(@csv_data) { |row|
180
+ raw_parsed_csv_data << row
181
+ }
182
+
183
+ assert_equal(nil, result)
184
+ assert_equal(raw_parsed_csv_data[0][2], 'EDADEDADEDADEDADEDADEDAD')
185
+ assert_equal(raw_parsed_csv_data[0][13], '$$$908080')
186
+ assert_equal(raw_parsed_csv_data[0][14], '"')
187
+ assert_equal(raw_parsed_csv_data[0][15], 'true/false')
188
+ assert_equal(raw_parsed_csv_data[0][16], nil)
189
+ assert_equal(raw_parsed_csv_data[9][2], nil)
190
+ assert_equal(raw_parsed_csv_data[3][6], '""C81E-=; **ECCB; .. 89')
191
+ assert_equal(raw_parsed_csv_data[888][13], 'Dallas, TX')
192
+ end
193
+
194
+ def test_hash_block_streaming
195
+ raw_parsed_csv_data = []
196
+ result = Rcsv.raw_parse(@csv_data, :row_as_hash => true, :column_names => [
197
+ 'DSAdsfksjh',
198
+ 'iii ooo iii',
199
+ 'EDADEDADEDADEDADEDADEDAD',
200
+ '111 333 555',
201
+ 'NMLKTF',
202
+ '---==---',
203
+ '//',
204
+ '###',
205
+ '0000000000',
206
+ 'Asdad bvd qwert',
207
+ ";'''sd",
208
+ '@@@',
209
+ 'OCTZ',
210
+ '$$$908080',
211
+ '"',
212
+ 'noname',
213
+ 'booleator'
214
+ ]) { |row|
215
+ raw_parsed_csv_data << row
216
+ }
217
+
218
+ assert_equal(nil, result)
219
+ assert_equal({
220
+ 'DSAdsfksjh' => 'C85A5B9F',
221
+ 'iii ooo iii' => '85259637',
222
+ 'EDADEDADEDADEDADEDADEDAD' => nil,
223
+ '111 333 555' => '96',
224
+ 'NMLKTF' => '6838',
225
+ '---==---' => '1983-06-14',
226
+ '//' => '""C4CA-=; **1679; .. 79',
227
+ '###' => '210,11',
228
+ '0000000000' => '908e',
229
+ 'Asdad bvd qwert' => '1281-03-09',
230
+ ";'''sd" => '7257.4654049904275',
231
+ '@@@' => '20efe749-50fe-4b6a-a603-7f9cd1dc6c6d',
232
+ 'OCTZ' => '3',
233
+ '$$$908080' => "New York, NY",
234
+ '"' => 'u',
235
+ 'noname' => '2.228169203286535',
236
+ 'booleator' => 't'
237
+ }, raw_parsed_csv_data[1])
238
+ end
239
+
240
+ def test_nils_and_empty_strings_default
241
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"))
242
+
243
+ assert_equal([nil, '', nil, nil, nil, nil], raw_parsed_csv_data[0])
244
+ assert_equal([nil, nil, '', '', nil, nil], raw_parsed_csv_data[1])
245
+ end
246
+
247
+ def test_nils_and_empty_strings_nil
248
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :nil)
249
+
250
+ assert_equal([nil, nil, nil, nil, nil, nil], raw_parsed_csv_data[0])
251
+ assert_equal([nil, nil, nil, nil, nil, nil], raw_parsed_csv_data[1])
252
+ end
253
+
254
+ def test_nils_and_empty_strings_string
255
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :string)
256
+
257
+ assert_equal(['', '', '', '', '', ''], raw_parsed_csv_data[0])
258
+ assert_equal(['', '', '', '', '', ''], raw_parsed_csv_data[1])
259
+ end
260
+
261
+ def test_nils_and_empty_strings_nil_or_string
262
+ raw_parsed_csv_data = Rcsv.raw_parse(StringIO.new(",\"\",, ,,\n,, \"\", \"\" ,,"), :parse_empty_fields_as => :nil_or_string)
263
+
264
+ assert_equal([nil, '', nil, nil, nil, nil], raw_parsed_csv_data[0])
265
+ assert_equal([nil, nil, '', '', nil, nil], raw_parsed_csv_data[1])
266
+ end
156
267
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rcsv
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.6
4
+ version: 0.0.8
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-11-16 00:00:00.000000000 Z
12
+ date: 2013-01-11 00:00:00.000000000 Z
13
13
  dependencies: []
14
14
  description: A libcsv-based CSV parser for Ruby
15
15
  email:
@@ -26,6 +26,7 @@ files:
26
26
  - Gemfile.lock
27
27
  - LICENSE
28
28
  - README.md
29
+ - RELNOTES
29
30
  - Rakefile
30
31
  - bench.rb
31
32
  - ext/rcsv/csv.h