fastcsv 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 129c6ed1d3b30a44456108f280a582ccdaac96e9
4
- data.tar.gz: 4d819f3bb6e637cb5fb3e130c378583202f8d3ee
3
+ metadata.gz: 1f1c56bdc3a7600dfb311eee7a98f07fe0d0e575
4
+ data.tar.gz: 2fae9ec519e877178a81dfb6f139c0ce94c974d2
5
5
  SHA512:
6
- metadata.gz: 8a960b458260e864346755a7b00afca9735e8851f70b9ebe3c4d95e1c5300c016fda9b1db5ff7c39cfcc288fdfa51ec038b5796eb12d01c8a7a7cb0d24ae1fe3
7
- data.tar.gz: 76612ddd0aedef55ca914a5de6b141d9d274c395ec3b9fcc28897e4ca2762ade22759e078446f6ef8ee673ff96954f328e9d23a645e7ba89fe0168ae75e6e7dc
6
+ metadata.gz: d9063634cca29ad95e961ee7bfa7269dc906c39a70a8317e35ea6ffc5991bd0caadf5b985040638d392bcbd49f1931d991261377c9a94c04d31b0147a8e9d721
7
+ data.tar.gz: 51314a4e948a996ad1546097ac7ad5c9d9d72eb6e8ef38a8874039920814445bee5de121225b81d5e2d461037b7fe23b1595c6be1f54dc93576aa7497681d91f
data/.travis.yml ADDED
@@ -0,0 +1,11 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.3
4
+ - 2.0.0
5
+ - 2.1.0
6
+ before_script:
7
+ - rake compile
8
+ script:
9
+ - rake
10
+ # The CSV tests in test/ are specific to Ruby 2.1.0.
11
+ - if [ $TRAVIS_RUBY_VERSION = '2.1.0' ]; then rspec test/runner.rb test/csv; fi
data/README.md CHANGED
@@ -1,12 +1,19 @@
1
1
  # FastCSV
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/fastcsv.svg)](http://badge.fury.io/rb/fastcsv)
4
+ [![Build Status](https://secure.travis-ci.org/opennorth/fastcsv.png)](http://travis-ci.org/opennorth/fastcsv)
4
5
  [![Dependency Status](https://gemnasium.com/opennorth/fastcsv.png)](https://gemnasium.com/opennorth/fastcsv)
6
+ [![Coverage Status](https://coveralls.io/repos/opennorth/fastcsv/badge.png?branch=master)](https://coveralls.io/r/opennorth/fastcsv)
7
+ [![Code Climate](https://codeclimate.com/github/opennorth/fastcsv.png)](https://codeclimate.com/github/opennorth/fastcsv)
5
8
 
6
9
  A fast [Ragel](http://www.colm.net/open-source/ragel/)-based CSV parser.
7
10
 
11
+ **Only reads CSVs using `"` as the quote character, `,` as the delimiter and `\r`, `\n` or `\r\n` as the line terminator.**
12
+
8
13
  ## Usage
9
14
 
15
+ `FastCSV.raw_parse` is implemented in C and is the fastest way to read CSVs with FastCSV.
16
+
10
17
  ```ruby
11
18
  require 'fastcsv'
12
19
 
@@ -33,6 +40,18 @@ FastCSV.raw_parse("\xF1\n", encoding: 'iso-8859-1:utf-8') do |row|
33
40
  end
34
41
  ```
35
42
 
43
+ FastCSV can be used as a drop-in replacement for [CSV](http://ruby-doc.org/stdlib-2.1.1/libdoc/csv/rdoc/CSV.html) (replace `CSV` with `FastCSV`) except:
44
+
45
+ * The `:quote_char` (`"`), `:col_sep` (`,`) and `:row_sep` (`:auto`) options are ignored. [#2](https://github.com/opennorth/fastcsv/issues/2)
46
+ * If FastCSV raises an error, you can't continue reading. [#3](https://github.com/opennorth/fastcsv/issues/3) Its error messages don't perfectly match those of CSV.
47
+
48
+ A few minor caveats:
49
+
50
+ * Use `FastCSV.parse_line(string, options)` instead of `string.parse_csv(options)`.
51
+ * If you were passing CSV an IO object on which you had wrapped `#gets` (for example, as described in [this article](http://graysoftinc.com/rubies-in-the-rough/decorators-verses-the-mix-in)), `#gets` will not be called.
52
+ * The `:field_size_limit` option is ignored. If you need to prevent DoS attacks – the [ostensible reason](http://ruby-doc.org/stdlib-2.1.1/libdoc/csv/rdoc/CSV.html#new-method) for this option – limit the size of the input, not the size of quoted fields.
53
+ * FastCSV doesn't support UTF-16 or UTF-32. See [UTF-8 Everywhere](http://utf8everywhere.org/).
54
+
36
55
  ## Development
37
56
 
38
57
  ragel -G2 ext/fastcsv/fastcsv.rl
@@ -40,10 +59,26 @@ end
40
59
  rake compile
41
60
  gem uninstall fastcsv
42
61
  rake install
62
+ rake
63
+ rspec test/runner.rb test/csv
64
+
65
+ ### Implementation
66
+
67
+ FastCSV implements its Ragel-based CSV parser in C at `FastCSV::Parser`.
68
+
69
+ FastCSV is a subclass of [CSV](http://ruby-doc.org/stdlib-2.1.1/libdoc/csv/rdoc/CSV.html). It overrides `#shift`, replacing the parsing code, in order to act as a drop-in replacement.
70
+
71
+ FastCSV's `raw_parse` requires a block to which it yields one row at a time. FastCSV uses [Fiber](http://www.ruby-doc.org/core-2.1.1/Fiber.html)s to pass control back to `#shift` while parsing.
72
+
73
+ CSV delegates IO methods to the IO object it's reading. IO methods that move the pointer within the file like `rewind` changes the behavior of CSV's `#shift`. However, FastCSV's C code won't take notice. We therefore null the Fiber whenever the pointer is moved, so that `#shift` uses a new Fiber.
74
+
75
+ CSV's `#shift` runs the regular expression in the `:skip_lines` option against a row's raw text. `FastCSV::Parser` implements a `row` method, which returns the most recently parsed row's raw text.
76
+
77
+ FastCSV is tested against the same tests as CSV. See [TESTS.md](https://github.com/opennorth/fastcsv/blob/master/TESTS.md) for details.
43
78
 
44
79
  ## Why?
45
80
 
46
- We evaluated [many CSV Ruby gems](https://github.com/jpmckinney/csv-benchmark#benchmark), and they were either too slow or had implementation errors. [rcsv](https://github.com/fiksu/rcsv) is fast and [libcsv](http://sourceforge.net/projects/libcsv/)-based, but it skips blank rows (Ruby's CSV module returns an empty array) and silently fails on input with an unclosed quote; nonetheless, it's an excellent alternative if you find errors in FastCSV! We looked for Ragel-based CSV parsers to copy, but they either had implementation errors or could not handle large inputs. [commas](https://github.com/aklt/commas/blob/master/csv.rl) looks good, but it performs a memory check on each character, which is overkill.
81
+ We evaluated [many CSV Ruby gems](https://github.com/jpmckinney/csv-benchmark#benchmark), and they were either too slow or had implementation errors. [rcsv](https://github.com/fiksu/rcsv) is fast and [libcsv](http://sourceforge.net/projects/libcsv/)-based, but it skips blank rows (Ruby's CSV module returns an empty array) and silently fails on input with an unclosed quote. [bamfcsv](https://github.com/jondistad/bamfcsv) is well implemented, but it's considerably slower on large files. We looked for Ragel-based CSV parsers to copy, but they either had implementation errors or could not handle large files. [commas](https://github.com/aklt/commas/blob/master/csv.rl) looks good, but it performs a memory check on each character, which is overkill.
47
82
 
48
83
  ## Bugs? Questions?
49
84
 
@@ -51,6 +86,6 @@ This project's main repository is on GitHub: [http://github.com/opennorth/fastcs
51
86
 
52
87
  ## Acknowledgements
53
88
 
54
- Started as a Ruby 2.1 fork of MoonWolf <moonwolf@moonwolf.com>'s CSVScan, found in [this commit](https://github.com/nickstenning/csvscan/commit/11ec30f71a27cc673bca09738ee8a63942f416f0.patch). CSVScan uses Ragel code from [HPricot](https://github.com/hpricot/hpricot/blob/master/ext/hpricot_scan/hpricot_scan.rl) from [this commit](https://github.com/hpricot/hpricot/blob/908a4ae64bc8b935c4415c47ca6aea6492c6ce0a/ext/hpricot_scan/hpricot_scan.rl).
89
+ Started as a Ruby 2.1 fork of MoonWolf <moonwolf@moonwolf.com>'s CSVScan, found in [this commit](https://github.com/nickstenning/csvscan/commit/11ec30f71a27cc673bca09738ee8a63942f416f0.patch). CSVScan uses Ragel code from [HPricot](https://github.com/hpricot/hpricot/blob/master/ext/hpricot_scan/hpricot_scan.rl) from [this commit](https://github.com/hpricot/hpricot/blob/908a4ae64bc8b935c4415c47ca6aea6492c6ce0a/ext/hpricot_scan/hpricot_scan.rl). Most of the Ruby (i.e. non-C, non-Ragel) methods are copied from [CSV](https://github.com/ruby/ruby/blob/ab337e61ecb5f42384ba7d710c36faf96a454e5c/lib/csv.rb).
55
90
 
56
91
  Copyright (c) 2014 Open North Inc., released under the MIT license
data/TESTS.md ADDED
@@ -0,0 +1,42 @@
1
+ Here are some notes on maintaining the `test/` directory.
2
+
3
+ 1. Download Ruby and [test CSV](http://ruby-doc.org/core-2.1.0/doc/contributing_rdoc.html#label-Running+tests).
4
+
5
+ git clone https://github.com/ruby/ruby.git
6
+ cd ruby
7
+ git co v2_1_2
8
+ gem uninstall minitest
9
+ gem install minitest --version 4.7.5
10
+ ruby test/runner.rb test/csv
11
+
12
+ 1. Copy the tests into the project. All the tests should pass.
13
+
14
+ cd PROJECT
15
+ mkdir test
16
+ cp path/to/ruby/test/runner.rb test
17
+ cp path/to/ruby/test/with_different_ofs.rb test
18
+ cp -r path/to/ruby/test/csv test/csv
19
+ ruby test/runner.rb test/csv
20
+
21
+ 1. Replace `\bCSV\b` with `FastCSV`. And run:
22
+
23
+ sed -i.bak '1s;^;require "fastcsv"\
24
+ ;' test/runner.rb
25
+
26
+ 1. In `test_interface.rb`, replace `\\t|;|(?<=\S)\|(?=\S)` with `,`. In `test_encodings.rb`, replace `(?<=[^\s{])\|(?=\S)` with `,` and replace `Encoding.list` with `Encoding.list.reject{|e| e.name[/\AUTF-\d\d/]}`. These changes are because `:col_sep`, `:row_sep` and `:quote_char` are ignored and because UTF-16 and UTF-32 aren't supported.
27
+
28
+ 1. Comment these tests because `:col_sep`, `:row_sep` and `:quote_char` are ignored:
29
+
30
+ * `test_csv_parsing.rb`: the first part of `test_malformed_csv`
31
+ * `test_features.rb`: `test_col_sep`, `test_row_sep`, `test_quote_char`, `test_leading_empty_fields_with_multibyte_col_sep_bug_fix`
32
+ * `test_headers.rb`: `test_csv_header_string_inherits_separators`
33
+
34
+ 1. Comment these tests in `test_csv_encoding.rb` because UTF-16 and UTF-32 aren't supported:
35
+
36
+ * `test_parses_utf16be_encoding`
37
+ * the second part of `test_open_allows_you_to_set_encodings`
38
+ * the second part of `test_foreach_allows_you_to_set_encodings`
39
+ * the second part of `test_read_allows_you_to_set_encodings`
40
+ * the second line of `encode_for_tests`
41
+
42
+ 1. Comment `test_field_size_limit_controls_lookahead` in `test_csv_parsing.rb` (`:field_size_limit` not supported). FastCSV reads one more line than CSV in `test_malformed_csv`, but not sure that's worth mirroring.