fastcsv 0.0.2 → 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 129c6ed1d3b30a44456108f280a582ccdaac96e9
4
- data.tar.gz: 4d819f3bb6e637cb5fb3e130c378583202f8d3ee
3
+ metadata.gz: 1f1c56bdc3a7600dfb311eee7a98f07fe0d0e575
4
+ data.tar.gz: 2fae9ec519e877178a81dfb6f139c0ce94c974d2
5
5
  SHA512:
6
- metadata.gz: 8a960b458260e864346755a7b00afca9735e8851f70b9ebe3c4d95e1c5300c016fda9b1db5ff7c39cfcc288fdfa51ec038b5796eb12d01c8a7a7cb0d24ae1fe3
7
- data.tar.gz: 76612ddd0aedef55ca914a5de6b141d9d274c395ec3b9fcc28897e4ca2762ade22759e078446f6ef8ee673ff96954f328e9d23a645e7ba89fe0168ae75e6e7dc
6
+ metadata.gz: d9063634cca29ad95e961ee7bfa7269dc906c39a70a8317e35ea6ffc5991bd0caadf5b985040638d392bcbd49f1931d991261377c9a94c04d31b0147a8e9d721
7
+ data.tar.gz: 51314a4e948a996ad1546097ac7ad5c9d9d72eb6e8ef38a8874039920814445bee5de121225b81d5e2d461037b7fe23b1595c6be1f54dc93576aa7497681d91f
data/.travis.yml ADDED
@@ -0,0 +1,11 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.3
4
+ - 2.0.0
5
+ - 2.1.0
6
+ before_script:
7
+ - rake compile
8
+ script:
9
+ - rake
10
+ # The CSV tests in test/ are specific to Ruby 2.1.0.
11
+ - if [ $TRAVIS_RUBY_VERSION = '2.1.0' ]; then rspec test/runner.rb test/csv; fi
data/README.md CHANGED
@@ -1,12 +1,19 @@
1
1
  # FastCSV
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/fastcsv.svg)](http://badge.fury.io/rb/fastcsv)
4
+ [![Build Status](https://secure.travis-ci.org/opennorth/fastcsv.png)](http://travis-ci.org/opennorth/fastcsv)
4
5
  [![Dependency Status](https://gemnasium.com/opennorth/fastcsv.png)](https://gemnasium.com/opennorth/fastcsv)
6
+ [![Coverage Status](https://coveralls.io/repos/opennorth/fastcsv/badge.png?branch=master)](https://coveralls.io/r/opennorth/fastcsv)
7
+ [![Code Climate](https://codeclimate.com/github/opennorth/fastcsv.png)](https://codeclimate.com/github/opennorth/fastcsv)
5
8
 
6
9
  A fast [Ragel](http://www.colm.net/open-source/ragel/)-based CSV parser.
7
10
 
11
+ **Only reads CSVs using `"` as the quote character, `,` as the delimiter and `\r`, `\n` or `\r\n` as the line terminator.**
12
+
8
13
  ## Usage
9
14
 
15
+ `FastCSV.raw_parse` is implemented in C and is the fastest way to read CSVs with FastCSV.
16
+
10
17
  ```ruby
11
18
  require 'fastcsv'
12
19
 
@@ -33,6 +40,18 @@ FastCSV.raw_parse("\xF1\n", encoding: 'iso-8859-1:utf-8') do |row|
33
40
  end
34
41
  ```
35
42
 
43
+ FastCSV can be used as a drop-in replacement for [CSV](http://ruby-doc.org/stdlib-2.1.1/libdoc/csv/rdoc/CSV.html) (replace `CSV` with `FastCSV`) except:
44
+
45
+ * The `:quote_char` (`"`), `:col_sep` (`,`) and `:row_sep` (`:auto`) options are ignored. [#2](https://github.com/opennorth/fastcsv/issues/2)
46
+ * If FastCSV raises an error, you can't continue reading. [#3](https://github.com/opennorth/fastcsv/issues/3) Its error messages don't perfectly match those of CSV.
47
+
48
+ A few minor caveats:
49
+
50
+ * Use `FastCSV.parse_line(string, options)` instead of `string.parse_csv(options)`.
51
+ * If you were passing CSV an IO object on which you had wrapped `#gets` (for example, as described in [this article](http://graysoftinc.com/rubies-in-the-rough/decorators-verses-the-mix-in)), `#gets` will not be called.
52
+ * The `:field_size_limit` option is ignored. If you need to prevent DoS attacks – the [ostensible reason](http://ruby-doc.org/stdlib-2.1.1/libdoc/csv/rdoc/CSV.html#new-method) for this option – limit the size of the input, not the size of quoted fields.
53
+ * FastCSV doesn't support UTF-16 or UTF-32. See [UTF-8 Everywhere](http://utf8everywhere.org/).
54
+
36
55
  ## Development
37
56
 
38
57
  ragel -G2 ext/fastcsv/fastcsv.rl
@@ -40,10 +59,26 @@ end
40
59
  rake compile
41
60
  gem uninstall fastcsv
42
61
  rake install
62
+ rake
63
+ rspec test/runner.rb test/csv
64
+
65
+ ### Implementation
66
+
67
+ FastCSV implements its Ragel-based CSV parser in C at `FastCSV::Parser`.
68
+
69
+ FastCSV is a subclass of [CSV](http://ruby-doc.org/stdlib-2.1.1/libdoc/csv/rdoc/CSV.html). It overrides `#shift`, replacing the parsing code, in order to act as a drop-in replacement.
70
+
71
+ FastCSV's `raw_parse` requires a block to which it yields one row at a time. FastCSV uses [Fiber](http://www.ruby-doc.org/core-2.1.1/Fiber.html)s to pass control back to `#shift` while parsing.
72
+
73
+ CSV delegates IO methods to the IO object it's reading. IO methods that move the pointer within the file like `rewind` changes the behavior of CSV's `#shift`. However, FastCSV's C code won't take notice. We therefore null the Fiber whenever the pointer is moved, so that `#shift` uses a new Fiber.
74
+
75
+ CSV's `#shift` runs the regular expression in the `:skip_lines` option against a row's raw text. `FastCSV::Parser` implements a `row` method, which returns the most recently parsed row's raw text.
76
+
77
+ FastCSV is tested against the same tests as CSV. See [TESTS.md](https://github.com/opennorth/fastcsv/blob/master/TESTS.md) for details.
43
78
 
44
79
  ## Why?
45
80
 
46
- We evaluated [many CSV Ruby gems](https://github.com/jpmckinney/csv-benchmark#benchmark), and they were either too slow or had implementation errors. [rcsv](https://github.com/fiksu/rcsv) is fast and [libcsv](http://sourceforge.net/projects/libcsv/)-based, but it skips blank rows (Ruby's CSV module returns an empty array) and silently fails on input with an unclosed quote; nonetheless, it's an excellent alternative if you find errors in FastCSV! We looked for Ragel-based CSV parsers to copy, but they either had implementation errors or could not handle large inputs. [commas](https://github.com/aklt/commas/blob/master/csv.rl) looks good, but it performs a memory check on each character, which is overkill.
81
+ We evaluated [many CSV Ruby gems](https://github.com/jpmckinney/csv-benchmark#benchmark), and they were either too slow or had implementation errors. [rcsv](https://github.com/fiksu/rcsv) is fast and [libcsv](http://sourceforge.net/projects/libcsv/)-based, but it skips blank rows (Ruby's CSV module returns an empty array) and silently fails on input with an unclosed quote. [bamfcsv](https://github.com/jondistad/bamfcsv) is well implemented, but it's considerably slower on large files. We looked for Ragel-based CSV parsers to copy, but they either had implementation errors or could not handle large files. [commas](https://github.com/aklt/commas/blob/master/csv.rl) looks good, but it performs a memory check on each character, which is overkill.
47
82
 
48
83
  ## Bugs? Questions?
49
84
 
@@ -51,6 +86,6 @@ This project's main repository is on GitHub: [http://github.com/opennorth/fastcs
51
86
 
52
87
  ## Acknowledgements
53
88
 
54
- Started as a Ruby 2.1 fork of MoonWolf <moonwolf@moonwolf.com>'s CSVScan, found in [this commit](https://github.com/nickstenning/csvscan/commit/11ec30f71a27cc673bca09738ee8a63942f416f0.patch). CSVScan uses Ragel code from [HPricot](https://github.com/hpricot/hpricot/blob/master/ext/hpricot_scan/hpricot_scan.rl) from [this commit](https://github.com/hpricot/hpricot/blob/908a4ae64bc8b935c4415c47ca6aea6492c6ce0a/ext/hpricot_scan/hpricot_scan.rl).
89
+ Started as a Ruby 2.1 fork of MoonWolf <moonwolf@moonwolf.com>'s CSVScan, found in [this commit](https://github.com/nickstenning/csvscan/commit/11ec30f71a27cc673bca09738ee8a63942f416f0.patch). CSVScan uses Ragel code from [HPricot](https://github.com/hpricot/hpricot/blob/master/ext/hpricot_scan/hpricot_scan.rl) from [this commit](https://github.com/hpricot/hpricot/blob/908a4ae64bc8b935c4415c47ca6aea6492c6ce0a/ext/hpricot_scan/hpricot_scan.rl). Most of the Ruby (i.e. non-C, non-Ragel) methods are copied from [CSV](https://github.com/ruby/ruby/blob/ab337e61ecb5f42384ba7d710c36faf96a454e5c/lib/csv.rb).
55
90
 
56
91
  Copyright (c) 2014 Open North Inc., released under the MIT license
data/TESTS.md ADDED
@@ -0,0 +1,42 @@
1
+ Here are some notes on maintaining the `test/` directory.
2
+
3
+ 1. Download Ruby and [test CSV](http://ruby-doc.org/core-2.1.0/doc/contributing_rdoc.html#label-Running+tests).
4
+
5
+ git clone https://github.com/ruby/ruby.git
6
+ cd ruby
7
+ git co v2_1_2
8
+ gem uninstall minitest
9
+ gem install minitest --version 4.7.5
10
+ ruby test/runner.rb test/csv
11
+
12
+ 1. Copy the tests into the project. All the tests should pass.
13
+
14
+ cd PROJECT
15
+ mkdir test
16
+ cp path/to/ruby/test/runner.rb test
17
+ cp path/to/ruby/test/with_different_ofs.rb test
18
+ cp -r path/to/ruby/test/csv test/csv
19
+ ruby test/runner.rb test/csv
20
+
21
+ 1. Replace `\bCSV\b` with `FastCSV`. And run:
22
+
23
+ sed -i.bak '1s;^;require "fastcsv"\
24
+ ;' test/runner.rb
25
+
26
+ 1. In `test_interface.rb`, replace `\\t|;|(?<=\S)\|(?=\S)` with `,`. In `test_encodings.rb`, replace `(?<=[^\s{])\|(?=\S)` with `,` and replace `Encoding.list` with `Encoding.list.reject{|e| e.name[/\AUTF-\d\d/]}`. These changes are because `:col_sep`, `:row_sep` and `:quote_char` are ignored and because UTF-16 and UTF-32 aren't supported.
27
+
28
+ 1. Comment these tests because `:col_sep`, `:row_sep` and `:quote_char` are ignored:
29
+
30
+ * `test_csv_parsing.rb`: the first part of `test_malformed_csv`
31
+ * `test_features.rb`: `test_col_sep`, `test_row_sep`, `test_quote_char`, `test_leading_empty_fields_with_multibyte_col_sep_bug_fix`
32
+ * `test_headers.rb`: `test_csv_header_string_inherits_separators`
33
+
34
+ 1. Comment these tests in `test_csv_encoding.rb` because UTF-16 and UTF-32 aren't supported:
35
+
36
+ * `test_parses_utf16be_encoding`
37
+ * the second part of `test_open_allows_you_to_set_encodings`
38
+ * the second part of `test_foreach_allows_you_to_set_encodings`
39
+ * the second part of `test_read_allows_you_to_set_encodings`
40
+ * the second line of `encode_for_tests`
41
+
42
+ 1. Comment `test_field_size_limit_controls_lookahead` in `test_csv_parsing.rb` (`:field_size_limit` not supported). FastCSV reads one more line than CSV in `test_malformed_csv`, but not sure that's worth mirroring.