tsv 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +17 -0
- data/.rspec +1 -0
- data/.travis.yml +13 -0
- data/Gemfile +9 -0
- data/LICENSE.txt +22 -0
- data/README.md +74 -0
- data/Rakefile +6 -0
- data/lib/tsv.rb +20 -0
- data/lib/tsv/cyclist.rb +71 -0
- data/lib/tsv/row.rb +54 -0
- data/lib/tsv/version.rb +3 -0
- data/spec/fixtures/broken.tsv +3 -0
- data/spec/fixtures/empty.tsv +0 -0
- data/spec/fixtures/example.tsv +4 -0
- data/spec/lib/tsv/file_cyclist_spec.rb +34 -0
- data/spec/lib/tsv/row_spec.rb +168 -0
- data/spec/lib/tsv/string_cyclist_spec.rb +13 -0
- data/spec/lib/tsv_spec.rb +59 -0
- data/spec/spec_helper.rb +21 -0
- data/spec/support/cyclist_generic.rb +109 -0
- data/spec/tsv_integration_spec.rb +95 -0
- data/tsv.gemspec +20 -0
- metadata +79 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 9872712bd4a1b57465a813c036ed829be77673b0
|
4
|
+
data.tar.gz: e392e8d086e7d4277f2bbb868cb93a26e6ab07d2
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: b4c7d0043ab7b5ae3a25769e1dca356c4ec672acef4310895c889ac991e3f764137176eb2490373c8028f9ce361e3f7600c8d9ab193fe6b9dbe0fd112b334e9b
|
7
|
+
data.tar.gz: 1a9e12c59e28ebf235591b73f294b2bc02f2118307a6d71d83ada443e6188736d46bef9036e81690b55eed986a5c0d272919c185e48135450ecd9e561eedc3e5
|
data/.gitignore
ADDED
data/.rspec
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--color
|
data/.travis.yml
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2014 Moron Activity
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,74 @@
|
|
1
|
+
# Tsv
|
2
|
+
[![Build Status](https://travis-ci.org/mimimi/ruby-tsv.svg?branch=master)](https://travis-ci.org/mimimi/ruby-tsv)
|
3
|
+
|
4
|
+
A simple TSV parser, developed with aim of parsing a ~200Gb TSV dump. As such, no mode of operation, but enumerable is considered sane. Feel free to use `#to_a` on your supercomputer :)
|
5
|
+
|
6
|
+
Does not (yet) provide TSV writing mechanism. Pull requests are welcome :)
|
7
|
+
|
8
|
+
## Installation
|
9
|
+
|
10
|
+
Add this line to your application's Gemfile:
|
11
|
+
|
12
|
+
gem 'tsv'
|
13
|
+
|
14
|
+
And then execute:
|
15
|
+
|
16
|
+
$ bundle
|
17
|
+
|
18
|
+
Or install it yourself as:
|
19
|
+
|
20
|
+
$ gem install tsv
|
21
|
+
|
22
|
+
## Usage
|
23
|
+
|
24
|
+
### High level interfaces
|
25
|
+
|
26
|
+
#### TSV::parse
|
27
|
+
|
28
|
+
`TSV.parse` accepts TSV as a whole string, returning lazy enumerator, yielding TSV::Row objects on demand
|
29
|
+
|
30
|
+
#### TSV::parse_file
|
31
|
+
|
32
|
+
`TSV.parse_file` accepts path to TSV file, returning lazy enumerator, yielding TSV::Row objects on demand
|
33
|
+
`TSV.parse_file` is also aliased as `[]`, allowing for `TSV[filename]` syntax
|
34
|
+
|
35
|
+
#### TSV::Row
|
36
|
+
|
37
|
+
By default TSV::Row behaves like an Array of strings, derived from TSV row. However this similarity is limited to Enumerable methods. In case a real array is needed, `#to_a` will behave as expected.
|
38
|
+
Additionally TSV::Row contains header data, accessible via `#header` reader.
|
39
|
+
|
40
|
+
In case a hash-like behaviour is required, field can be accessed with header string key. Alternatively, `#with_header` and `#to_h` will return hash representation for the row.
|
41
|
+
|
42
|
+
### Examples
|
43
|
+
|
44
|
+
Getting first line from tsv file without headers:
|
45
|
+
```ruby
|
46
|
+
TSV.parse_file("tsv.tsv").without_header.first
|
47
|
+
```
|
48
|
+
|
49
|
+
Mapping name fields from a file:
|
50
|
+
```ruby
|
51
|
+
TSV["tsv.tsv"].map do |row|
|
52
|
+
row['name']
|
53
|
+
end
|
54
|
+
```
|
55
|
+
|
56
|
+
Mapping last and first row elements:
|
57
|
+
```ruby
|
58
|
+
TSV["tsv.tsv"].map do |row|
|
59
|
+
[row[-1], row[1]]
|
60
|
+
end
|
61
|
+
```
|
62
|
+
|
63
|
+
### Nuances
|
64
|
+
|
65
|
+
Range accessor is not implemented for initial version due to authors' lack of need.
|
66
|
+
In addition, accessing tenth element in a row of five is considered an exception from TSV standpoint, which should be represented in range accessor. Such nuance, would it be implemented, will break expectations. Still, if need arises, pull or feature requests with accompanying reasoning (or even without one) are more than welcome.
|
67
|
+
|
68
|
+
## Contributing
|
69
|
+
|
70
|
+
1. Fork it
|
71
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
72
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
73
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
74
|
+
5. Create new Pull Request
|
data/Rakefile
ADDED
data/lib/tsv.rb
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
require "tsv/version"
|
2
|
+
require "tsv/row"
|
3
|
+
require "tsv/cyclist"
|
4
|
+
|
5
|
+
module TSV
|
6
|
+
extend self
|
7
|
+
|
8
|
+
def parse(content, opts = {}, &block)
|
9
|
+
TSV::StringCyclist.new(content, opts, &block)
|
10
|
+
end
|
11
|
+
|
12
|
+
def parse_file(filename, opts = {}, &block)
|
13
|
+
TSV::FileCyclist.new(filename, opts, &block)
|
14
|
+
end
|
15
|
+
|
16
|
+
alias :[] :parse_file
|
17
|
+
|
18
|
+
class ReadOnly < StandardError
|
19
|
+
end
|
20
|
+
end
|
data/lib/tsv/cyclist.rb
ADDED
@@ -0,0 +1,71 @@
|
|
1
|
+
module TSV
|
2
|
+
class Cyclist
|
3
|
+
extend Forwardable
|
4
|
+
|
5
|
+
def_delegators :enumerator, *Enumerator.instance_methods(false)
|
6
|
+
def_delegators :enumerator, *Enumerable.instance_methods(false)
|
7
|
+
|
8
|
+
attr_accessor :source, :header
|
9
|
+
|
10
|
+
def initialize(source, params = {}, &block)
|
11
|
+
self.header = params.fetch(:header, true)
|
12
|
+
self.source = source.to_s
|
13
|
+
self.enumerator.each(&block) if block_given?
|
14
|
+
end
|
15
|
+
|
16
|
+
def with_header
|
17
|
+
self.class.new(self.source, header: true)
|
18
|
+
end
|
19
|
+
|
20
|
+
def without_header
|
21
|
+
self.class.new(self.source, header: false)
|
22
|
+
end
|
23
|
+
|
24
|
+
def enumerator
|
25
|
+
@enumerator ||= ::Enumerator.new do |y|
|
26
|
+
lines = data_enumerator
|
27
|
+
|
28
|
+
first_line = generate_row_from begin
|
29
|
+
lines.next
|
30
|
+
rescue StopIteration => ex
|
31
|
+
''
|
32
|
+
end
|
33
|
+
|
34
|
+
local_header = if self.header
|
35
|
+
first_line
|
36
|
+
else
|
37
|
+
lines.rewind
|
38
|
+
generate_default_header_from first_line
|
39
|
+
end
|
40
|
+
|
41
|
+
loop do
|
42
|
+
y << TSV::Row.new(generate_row_from(lines.next).freeze, local_header.freeze)
|
43
|
+
end
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
protected
|
48
|
+
|
49
|
+
def generate_row_from(str)
|
50
|
+
str.to_s.chomp.split("\t")
|
51
|
+
end
|
52
|
+
|
53
|
+
def generate_default_header_from(example_line)
|
54
|
+
(0...example_line.length).to_a.map(&:to_s)
|
55
|
+
end
|
56
|
+
end
|
57
|
+
|
58
|
+
class FileCyclist < Cyclist
|
59
|
+
alias :filepath :source
|
60
|
+
|
61
|
+
def data_enumerator
|
62
|
+
File.new(self.source).each_line
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
class StringCyclist < Cyclist
|
67
|
+
def data_enumerator
|
68
|
+
source.each_line
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
data/lib/tsv/row.rb
ADDED
@@ -0,0 +1,54 @@
|
|
1
|
+
module TSV
|
2
|
+
class Row
|
3
|
+
extend Forwardable
|
4
|
+
|
5
|
+
def_delegators :data, *Enumerable.instance_methods(false)
|
6
|
+
|
7
|
+
attr_reader :header, :data
|
8
|
+
|
9
|
+
def []=(key, value)
|
10
|
+
raise TSV::ReadOnly.new('TSV data is read only. Export data to modify it.')
|
11
|
+
end
|
12
|
+
|
13
|
+
def [](key)
|
14
|
+
if key.is_a? ::String
|
15
|
+
raise UnknownKey unless header.include?(key)
|
16
|
+
|
17
|
+
data[header.index(key)]
|
18
|
+
elsif key.is_a? ::Numeric
|
19
|
+
raise UnknownKey if data[key].nil?
|
20
|
+
|
21
|
+
data[key]
|
22
|
+
else
|
23
|
+
raise InvalidKey.new
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
def initialize(data, header)
|
28
|
+
@data = data
|
29
|
+
@header = header
|
30
|
+
|
31
|
+
raise InputError if @data.length != @header.length
|
32
|
+
end
|
33
|
+
|
34
|
+
def with_header
|
35
|
+
Hash[header.zip(data)]
|
36
|
+
end
|
37
|
+
alias :to_h :with_header
|
38
|
+
|
39
|
+
def ==(other)
|
40
|
+
other.is_a?(self.class) and
|
41
|
+
header == other.header and
|
42
|
+
data == other.data
|
43
|
+
end
|
44
|
+
|
45
|
+
class InvalidKey < StandardError
|
46
|
+
end
|
47
|
+
|
48
|
+
class UnknownKey < StandardError
|
49
|
+
end
|
50
|
+
|
51
|
+
class InputError < StandardError
|
52
|
+
end
|
53
|
+
end
|
54
|
+
end
|
data/lib/tsv/version.rb
ADDED
File without changes
|
@@ -0,0 +1,34 @@
|
|
1
|
+
require File.join(File.dirname(__FILE__), '..', '..', 'spec_helper.rb')
|
2
|
+
|
3
|
+
describe TSV::FileCyclist do
|
4
|
+
let(:tsv_path) { File.join(File.dirname(__FILE__), '..', '..', 'fixtures', filename) }
|
5
|
+
let(:source) { tsv_path }
|
6
|
+
let(:filename) { 'example.tsv' }
|
7
|
+
|
8
|
+
let(:header) { true }
|
9
|
+
let(:parameters) { { header: header } }
|
10
|
+
|
11
|
+
subject(:cyclist) { TSV::FileCyclist.new(source, parameters) }
|
12
|
+
|
13
|
+
it_behaves_like "Cyclist"
|
14
|
+
|
15
|
+
describe "accessing unavailable files" do
|
16
|
+
subject { lambda { TSV::FileCyclist.new(tsv_path).to_a } }
|
17
|
+
|
18
|
+
context "when file is not found" do
|
19
|
+
let(:tsv_path) { "AManThatWasntThere.tsv" }
|
20
|
+
|
21
|
+
it "returns FileNotFoundException" do
|
22
|
+
expect(subject).to raise_error(Errno::ENOENT)
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
context "when filename is nil" do
|
27
|
+
let(:tsv_path) { nil }
|
28
|
+
|
29
|
+
it "returns FileNameInvalidException" do
|
30
|
+
expect(subject).to raise_error(Errno::ENOENT)
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
@@ -0,0 +1,168 @@
|
|
1
|
+
require File.join(File.dirname(__FILE__), '..', '..', 'spec_helper.rb')
|
2
|
+
|
3
|
+
describe TSV::Row do
|
4
|
+
describe "::new" do
|
5
|
+
it "sets header and data from params" do
|
6
|
+
t = TSV::Row.new(['data'], ['header'])
|
7
|
+
|
8
|
+
expect(t.header).to eq(['header'])
|
9
|
+
expect(t.data).to eq(['data'])
|
10
|
+
end
|
11
|
+
|
12
|
+
context "when header and data length do not match" do
|
13
|
+
it "raises TSV::Row::InputError" do
|
14
|
+
expect { TSV::Row.new(['data'], ['header', 'footer']) }.to raise_error(TSV::Row::InputError)
|
15
|
+
expect { TSV::Row.new(['data', 'not data'], ['header']) }.to raise_error(TSV::Row::InputError)
|
16
|
+
end
|
17
|
+
end
|
18
|
+
end
|
19
|
+
|
20
|
+
let(:header) { ['first', 'second', 'third'] }
|
21
|
+
let(:data) { ['one', 'two', 'three'] }
|
22
|
+
|
23
|
+
subject(:row) { TSV::Row.new(data, header) }
|
24
|
+
|
25
|
+
describe "#[]" do
|
26
|
+
describe "array interface compatibility" do
|
27
|
+
context "when provided with element number" do
|
28
|
+
it "returns requested element" do
|
29
|
+
expect(subject[1]).to eq "two"
|
30
|
+
end
|
31
|
+
end
|
32
|
+
|
33
|
+
context "when provided with negative offset" do
|
34
|
+
it "returns requested element" do
|
35
|
+
expect(subject[-1]).to eq "three"
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
context "when provided with header name" do
|
40
|
+
it "returns requested element" do
|
41
|
+
expect(subject['third']).to eq "three"
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
context "when provided with nil or symbol" do
|
46
|
+
it "raises TSV::Row::InvalidKey" do
|
47
|
+
expect { subject[nil] }.to raise_error(TSV::Row::InvalidKey)
|
48
|
+
expect { subject[:something] }.to raise_error(TSV::Row::InvalidKey)
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
context "when provided with unknown numeric key" do
|
53
|
+
let(:cases) { [-(data.length + 1), data.length, 500, -500]}
|
54
|
+
|
55
|
+
it "raises TSV::Row::UnknownKey" do
|
56
|
+
cases.each do |item|
|
57
|
+
expect { subject[item] }.to raise_error(TSV::Row::UnknownKey)
|
58
|
+
end
|
59
|
+
end
|
60
|
+
end
|
61
|
+
|
62
|
+
context "when provided with unknown string key" do
|
63
|
+
it "raises TSV::Row::UnknownKey" do
|
64
|
+
expect { subject['something'] }.to raise_error(TSV::Row::UnknownKey)
|
65
|
+
end
|
66
|
+
end
|
67
|
+
end
|
68
|
+
end
|
69
|
+
|
70
|
+
describe "#[]=" do
|
71
|
+
it "raises TSV::ReadOnly exception" do
|
72
|
+
expect { subject['a'] = 123 }.to raise_error(TSV::ReadOnly, 'TSV data is read only. Export data to modify it.')
|
73
|
+
end
|
74
|
+
end
|
75
|
+
|
76
|
+
describe "accessors" do
|
77
|
+
describe "header" do
|
78
|
+
it "does not have setter" do
|
79
|
+
expect(subject).to_not respond_to(:"header=")
|
80
|
+
end
|
81
|
+
|
82
|
+
it "has getter" do
|
83
|
+
expect(subject.header).to eq ['first', 'second', 'third']
|
84
|
+
end
|
85
|
+
end
|
86
|
+
|
87
|
+
describe "data" do
|
88
|
+
it "does not have setter" do
|
89
|
+
expect(subject).to_not respond_to(:"header=")
|
90
|
+
end
|
91
|
+
|
92
|
+
it "has getter" do
|
93
|
+
expect(subject.data).to eq ['one', 'two', 'three']
|
94
|
+
end
|
95
|
+
end
|
96
|
+
end
|
97
|
+
|
98
|
+
describe "iterators" do
|
99
|
+
describe "Enumerable #methods (except #to_h, which we have a better implementation for)" do
|
100
|
+
(Enumerable.instance_methods(false) - [:to_h]).each do |name|
|
101
|
+
it "delegates #{name} to data array" do
|
102
|
+
expect(subject.data).to receive(name)
|
103
|
+
subject.send(name)
|
104
|
+
end
|
105
|
+
end
|
106
|
+
end
|
107
|
+
|
108
|
+
describe "#with_header" do
|
109
|
+
subject { row.with_header }
|
110
|
+
|
111
|
+
it "gathers header and data into hash" do
|
112
|
+
expect(subject).to eq({
|
113
|
+
"first" => "one",
|
114
|
+
"second" => "two",
|
115
|
+
"third" => "three"
|
116
|
+
})
|
117
|
+
end
|
118
|
+
end
|
119
|
+
|
120
|
+
describe "#to_h" do
|
121
|
+
subject { row.to_h }
|
122
|
+
|
123
|
+
it "gathers header and data into hash" do
|
124
|
+
expect(subject).to eq({
|
125
|
+
"first" => "one",
|
126
|
+
"second" => "two",
|
127
|
+
"third" => "three"
|
128
|
+
})
|
129
|
+
end
|
130
|
+
end
|
131
|
+
end
|
132
|
+
|
133
|
+
describe "#==" do
|
134
|
+
let(:other_header) { header }
|
135
|
+
let(:other_data) { data }
|
136
|
+
|
137
|
+
let(:other_row) { TSV::Row.new(other_data, other_header) }
|
138
|
+
subject { row == other_row }
|
139
|
+
|
140
|
+
context "when compared to TSV::Row" do
|
141
|
+
context "when both objects' data and header are equal" do
|
142
|
+
it { should be true }
|
143
|
+
end
|
144
|
+
|
145
|
+
context "when data attributes are not equal" do
|
146
|
+
let(:other_data) { data.reverse }
|
147
|
+
it { should be false }
|
148
|
+
end
|
149
|
+
|
150
|
+
context "when header attributes are not equal" do
|
151
|
+
let(:other_header) { header.reverse }
|
152
|
+
it { should be false }
|
153
|
+
end
|
154
|
+
|
155
|
+
context "when both objects' data and header are not equal" do
|
156
|
+
let(:other_data) { data.reverse }
|
157
|
+
let(:other_header) { header.reverse }
|
158
|
+
it { should be false }
|
159
|
+
end
|
160
|
+
end
|
161
|
+
|
162
|
+
context "when compared to something else than TSV::Row" do
|
163
|
+
let(:other_row) { data }
|
164
|
+
|
165
|
+
it { should be false }
|
166
|
+
end
|
167
|
+
end
|
168
|
+
end
|
@@ -0,0 +1,13 @@
|
|
1
|
+
require File.join(File.dirname(__FILE__), '..', '..', 'spec_helper.rb')
|
2
|
+
|
3
|
+
describe TSV::StringCyclist do
|
4
|
+
let(:source) { IO.read(File.join(File.dirname(__FILE__), '..', '..', 'fixtures', filename)) }
|
5
|
+
let(:filename) { 'example.tsv' }
|
6
|
+
|
7
|
+
let(:header) { true }
|
8
|
+
let(:parameters) { { header: header } }
|
9
|
+
|
10
|
+
subject(:cyclist) { TSV::StringCyclist.new(source, parameters) }
|
11
|
+
|
12
|
+
it_behaves_like "Cyclist"
|
13
|
+
end
|
@@ -0,0 +1,59 @@
|
|
1
|
+
require File.join(File.dirname(__FILE__), '..', 'spec_helper.rb')
|
2
|
+
|
3
|
+
describe TSV do
|
4
|
+
let(:filename) { 'example.tsv' }
|
5
|
+
|
6
|
+
describe "#parse" do
|
7
|
+
let(:header) { nil }
|
8
|
+
let(:content) { IO.read(File.join(File.dirname(__FILE__), '..', 'fixtures', filename)) }
|
9
|
+
let(:parameters) { { header: header } }
|
10
|
+
|
11
|
+
subject { TSV.parse(content, parameters) }
|
12
|
+
|
13
|
+
it "returns String Cyclist initialized with given data" do
|
14
|
+
expect(subject).to be_a TSV::StringCyclist
|
15
|
+
expect(subject.source).to eq(content)
|
16
|
+
end
|
17
|
+
|
18
|
+
context "when block is given" do
|
19
|
+
it "passes block to Cyclist" do
|
20
|
+
data = []
|
21
|
+
|
22
|
+
TSV.parse(content) do |i|
|
23
|
+
data.push i
|
24
|
+
end
|
25
|
+
|
26
|
+
headers = %w{first second third}
|
27
|
+
expect(data).to eq [ TSV::Row.new( ['0', '1', '2'], headers ),
|
28
|
+
TSV::Row.new( ['one', 'two', 'three'], headers ),
|
29
|
+
TSV::Row.new( ['weird data', 's@mthin#', 'else'], headers ) ]
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
34
|
+
describe "#parse_file" do
|
35
|
+
let(:tsv_path) { File.join(File.dirname(__FILE__), '..', 'fixtures', filename) }
|
36
|
+
|
37
|
+
subject { TSV.parse_file tsv_path }
|
38
|
+
|
39
|
+
it "returns Cyclist object initialized with given filepath" do
|
40
|
+
expect(subject).to be_a TSV::FileCyclist
|
41
|
+
expect(subject.filepath).to eq tsv_path
|
42
|
+
end
|
43
|
+
|
44
|
+
context "when block is given" do
|
45
|
+
it "passes block to Cyclist" do
|
46
|
+
data = []
|
47
|
+
|
48
|
+
TSV.parse_file(tsv_path) do |i|
|
49
|
+
data.push i
|
50
|
+
end
|
51
|
+
|
52
|
+
headers = %w{first second third}
|
53
|
+
expect(data).to eq [ TSV::Row.new( ['0', '1', '2'], headers ),
|
54
|
+
TSV::Row.new( ['one', 'two', 'three'], headers ),
|
55
|
+
TSV::Row.new( ['weird data', 's@mthin#', 'else'], headers ) ]
|
56
|
+
end
|
57
|
+
end
|
58
|
+
end
|
59
|
+
end
|
data/spec/spec_helper.rb
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'bundler/setup'
|
3
|
+
|
4
|
+
require 'pry'
|
5
|
+
require 'rspec'
|
6
|
+
|
7
|
+
require 'tsv'
|
8
|
+
|
9
|
+
require "codeclimate-test-reporter"
|
10
|
+
CodeClimate::TestReporter.start
|
11
|
+
|
12
|
+
# Disabling old rspec should syntax
|
13
|
+
RSpec.configure do |config|
|
14
|
+
config.expect_with :rspec do |c|
|
15
|
+
c.syntax = :expect
|
16
|
+
end
|
17
|
+
|
18
|
+
config.raise_errors_for_deprecations!
|
19
|
+
end
|
20
|
+
|
21
|
+
Dir[File.expand_path(File.join(File.dirname(__FILE__),'support','**','*.rb'))].each {|f| require f}
|
@@ -0,0 +1,109 @@
|
|
1
|
+
shared_examples_for "Cyclist" do
|
2
|
+
describe "::new" do
|
3
|
+
it "initializes header to true by default" do
|
4
|
+
expect(subject.header).to be true
|
5
|
+
end
|
6
|
+
|
7
|
+
it "initializes source to given value" do
|
8
|
+
expect(subject.source).to eq(source)
|
9
|
+
end
|
10
|
+
|
11
|
+
context "when block is given" do
|
12
|
+
it "passes block to enumerator through each" do
|
13
|
+
data = []
|
14
|
+
|
15
|
+
described_class.new(source) do |v|
|
16
|
+
data << v
|
17
|
+
end
|
18
|
+
|
19
|
+
headers = %w{first second third}
|
20
|
+
expect(data).to eq [ TSV::Row.new( ['0', '1', '2'], headers ),
|
21
|
+
TSV::Row.new( ['one', 'two', 'three'], headers ),
|
22
|
+
TSV::Row.new( ['weird data', 's@mthin#', 'else'], headers ) ]
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
describe "#enumerator" do
|
28
|
+
it { expect(cyclist.enumerator).to be_a_kind_of(Enumerator) }
|
29
|
+
subject { cyclist.enumerator.to_a }
|
30
|
+
|
31
|
+
context "string is empty" do
|
32
|
+
let(:filename) { 'empty.tsv' }
|
33
|
+
|
34
|
+
it { should be_empty }
|
35
|
+
end
|
36
|
+
|
37
|
+
context "string is incorrect" do
|
38
|
+
let(:filename) { 'broken.tsv' }
|
39
|
+
|
40
|
+
it "should raise exception" do
|
41
|
+
expect { subject }.to raise_error(TSV::Row::InputError)
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
context "string is correct" do
|
46
|
+
context "when requested without header" do
|
47
|
+
let(:header) { false }
|
48
|
+
let(:auto_header) { %w{0 1 2} }
|
49
|
+
|
50
|
+
it "returns its content as array of arrays" do
|
51
|
+
expect(subject).to eq [ TSV::Row.new( ['first', 'second', 'third'], auto_header ),
|
52
|
+
TSV::Row.new( ['0', '1', '2'], auto_header ),
|
53
|
+
TSV::Row.new( ['one', 'two', 'three'], auto_header ),
|
54
|
+
TSV::Row.new( ['weird data', 's@mthin#', 'else'], auto_header ) ]
|
55
|
+
end
|
56
|
+
|
57
|
+
it "freezes data and header for TSV::Row" do
|
58
|
+
subject.each do |i|
|
59
|
+
expect(i.data).to be_frozen
|
60
|
+
expect(i.header).to be_frozen
|
61
|
+
end
|
62
|
+
end
|
63
|
+
end
|
64
|
+
|
65
|
+
context "when requested with header" do
|
66
|
+
let(:header) { true }
|
67
|
+
|
68
|
+
it "returns its content as array of hashes" do
|
69
|
+
headers = %w{first second third}
|
70
|
+
expect(subject).to eq [ TSV::Row.new( ['0', '1', '2'], headers ),
|
71
|
+
TSV::Row.new( ['one', 'two', 'three'], headers ),
|
72
|
+
TSV::Row.new( ['weird data', 's@mthin#', 'else'], headers ) ]
|
73
|
+
end
|
74
|
+
|
75
|
+
it "freezes data and header for TSV::Row" do
|
76
|
+
subject.each do |i|
|
77
|
+
expect(i.data).to be_frozen
|
78
|
+
expect(i.header).to be_frozen
|
79
|
+
end
|
80
|
+
end
|
81
|
+
end
|
82
|
+
end
|
83
|
+
end
|
84
|
+
|
85
|
+
describe "#with_header" do
|
86
|
+
subject { cyclist.with_header }
|
87
|
+
|
88
|
+
it "returns a Cyclist with header option set to true" do
|
89
|
+
expect(subject.header).to be true
|
90
|
+
end
|
91
|
+
end
|
92
|
+
|
93
|
+
describe "#without_header" do
|
94
|
+
subject { cyclist.without_header }
|
95
|
+
|
96
|
+
it "returns a Cyclist with header option set to false" do
|
97
|
+
expect(subject.header).to be false
|
98
|
+
end
|
99
|
+
end
|
100
|
+
|
101
|
+
describe "enumerator interfaces" do
|
102
|
+
( Enumerable.instance_methods(false) + Enumerator.instance_methods(false) ).each do |name|
|
103
|
+
it "delegates #{name} to enumerator" do
|
104
|
+
expect(cyclist.enumerator).to receive(name)
|
105
|
+
cyclist.send(name)
|
106
|
+
end
|
107
|
+
end
|
108
|
+
end
|
109
|
+
end
|
@@ -0,0 +1,95 @@
|
|
1
|
+
require File.join(File.dirname(__FILE__), 'spec_helper.rb')
|
2
|
+
|
3
|
+
describe TSV do
|
4
|
+
let(:header) { nil }
|
5
|
+
let(:tsv_path) { File.join(File.dirname(__FILE__), 'fixtures', filename) }
|
6
|
+
let(:parameters) { { header: header } }
|
7
|
+
|
8
|
+
describe "reading file" do
|
9
|
+
subject { TSV.parse_file(tsv_path, parameters).to_a }
|
10
|
+
|
11
|
+
context "when file is empty" do
|
12
|
+
let(:filename) { 'empty.tsv' }
|
13
|
+
|
14
|
+
context "when requested without header" do
|
15
|
+
let(:header) { true }
|
16
|
+
|
17
|
+
it { expect(subject).to be_empty }
|
18
|
+
end
|
19
|
+
|
20
|
+
context "when requested with header" do
|
21
|
+
let(:header) { false }
|
22
|
+
|
23
|
+
it { expect(subject).to be_empty }
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
context "when file is invalid" do
|
28
|
+
subject { lambda { TSV.parse_file(tsv_path, parameters).to_a } }
|
29
|
+
let(:filename) { 'broken.tsv' }
|
30
|
+
|
31
|
+
it "when file is broken" do
|
32
|
+
expect(subject).to raise_error TSV::Row::InputError
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
context "when file is valid" do
|
37
|
+
let(:filename) { 'example.tsv' }
|
38
|
+
|
39
|
+
context "when no block is passed" do
|
40
|
+
let(:parameters) { Hash.new }
|
41
|
+
|
42
|
+
it "returns its content as array of hashes" do
|
43
|
+
headers = %w{first second third}
|
44
|
+
expect(subject).to eq [ TSV::Row.new( ['0', '1', '2'], headers ),
|
45
|
+
TSV::Row.new( ['one', 'two', 'three'], headers ),
|
46
|
+
TSV::Row.new( ['weird data', 's@mthin#', 'else'], headers ) ]
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
describe "reading from string" do
|
53
|
+
subject { TSV.parse(IO.read(tsv_path), parameters).to_a }
|
54
|
+
|
55
|
+
context "when string is empty" do
|
56
|
+
let(:filename) { 'empty.tsv' }
|
57
|
+
|
58
|
+
context "when requested without header" do
|
59
|
+
let(:header) { true }
|
60
|
+
|
61
|
+
it { expect(subject).to be_empty }
|
62
|
+
end
|
63
|
+
|
64
|
+
context "when requested with header" do
|
65
|
+
let(:header) { false }
|
66
|
+
|
67
|
+
it { expect(subject).to be_empty }
|
68
|
+
end
|
69
|
+
end
|
70
|
+
|
71
|
+
context "when string is invalid" do
|
72
|
+
subject { lambda { TSV.parse(IO.read(tsv_path), parameters).to_a } }
|
73
|
+
let(:filename) { 'broken.tsv' }
|
74
|
+
|
75
|
+
it "when file is broken" do
|
76
|
+
expect(subject).to raise_error TSV::Row::InputError
|
77
|
+
end
|
78
|
+
end
|
79
|
+
|
80
|
+
context "when string is valid" do
|
81
|
+
let(:filename) { 'example.tsv' }
|
82
|
+
|
83
|
+
context "when no block is passed" do
|
84
|
+
let(:parameters) { Hash.new }
|
85
|
+
|
86
|
+
it "returns its content as array of hashes" do
|
87
|
+
headers = %w{first second third}
|
88
|
+
expect(subject).to eq [ TSV::Row.new( ['0', '1', '2'], headers ),
|
89
|
+
TSV::Row.new( ['one', 'two', 'three'], headers ),
|
90
|
+
TSV::Row.new( ['weird data', 's@mthin#', 'else'], headers ) ]
|
91
|
+
end
|
92
|
+
end
|
93
|
+
end
|
94
|
+
end
|
95
|
+
end
|
data/tsv.gemspec
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'tsv/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "tsv"
|
8
|
+
spec.version = TSV::VERSION
|
9
|
+
spec.authors = ["Dmytro Soltys", "Alexander Rozumiy"]
|
10
|
+
spec.email = ["soap@slotos.net", "brain-geek@yandex.ua"]
|
11
|
+
spec.description = %q{Streamed TSV parser}
|
12
|
+
spec.summary = %q{Provides a simple parser for standard compliant and not so (missing header line) TSV files}
|
13
|
+
spec.homepage = ""
|
14
|
+
spec.license = "MIT"
|
15
|
+
|
16
|
+
spec.files = `git ls-files`.split($/)
|
17
|
+
spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
|
18
|
+
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
|
19
|
+
spec.require_paths = ["lib"]
|
20
|
+
end
|
metadata
ADDED
@@ -0,0 +1,79 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: tsv
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Dmytro Soltys
|
8
|
+
- Alexander Rozumiy
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2014-07-10 00:00:00.000000000 Z
|
13
|
+
dependencies: []
|
14
|
+
description: Streamed TSV parser
|
15
|
+
email:
|
16
|
+
- soap@slotos.net
|
17
|
+
- brain-geek@yandex.ua
|
18
|
+
executables: []
|
19
|
+
extensions: []
|
20
|
+
extra_rdoc_files: []
|
21
|
+
files:
|
22
|
+
- .gitignore
|
23
|
+
- .rspec
|
24
|
+
- .travis.yml
|
25
|
+
- Gemfile
|
26
|
+
- LICENSE.txt
|
27
|
+
- README.md
|
28
|
+
- Rakefile
|
29
|
+
- lib/tsv.rb
|
30
|
+
- lib/tsv/cyclist.rb
|
31
|
+
- lib/tsv/row.rb
|
32
|
+
- lib/tsv/version.rb
|
33
|
+
- spec/fixtures/broken.tsv
|
34
|
+
- spec/fixtures/empty.tsv
|
35
|
+
- spec/fixtures/example.tsv
|
36
|
+
- spec/lib/tsv/file_cyclist_spec.rb
|
37
|
+
- spec/lib/tsv/row_spec.rb
|
38
|
+
- spec/lib/tsv/string_cyclist_spec.rb
|
39
|
+
- spec/lib/tsv_spec.rb
|
40
|
+
- spec/spec_helper.rb
|
41
|
+
- spec/support/cyclist_generic.rb
|
42
|
+
- spec/tsv_integration_spec.rb
|
43
|
+
- tsv.gemspec
|
44
|
+
homepage: ''
|
45
|
+
licenses:
|
46
|
+
- MIT
|
47
|
+
metadata: {}
|
48
|
+
post_install_message:
|
49
|
+
rdoc_options: []
|
50
|
+
require_paths:
|
51
|
+
- lib
|
52
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
53
|
+
requirements:
|
54
|
+
- - '>='
|
55
|
+
- !ruby/object:Gem::Version
|
56
|
+
version: '0'
|
57
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - '>='
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '0'
|
62
|
+
requirements: []
|
63
|
+
rubyforge_project:
|
64
|
+
rubygems_version: 2.0.6
|
65
|
+
signing_key:
|
66
|
+
specification_version: 4
|
67
|
+
summary: Provides a simple parser for standard compliant and not so (missing header
|
68
|
+
line) TSV files
|
69
|
+
test_files:
|
70
|
+
- spec/fixtures/broken.tsv
|
71
|
+
- spec/fixtures/empty.tsv
|
72
|
+
- spec/fixtures/example.tsv
|
73
|
+
- spec/lib/tsv/file_cyclist_spec.rb
|
74
|
+
- spec/lib/tsv/row_spec.rb
|
75
|
+
- spec/lib/tsv/string_cyclist_spec.rb
|
76
|
+
- spec/lib/tsv_spec.rb
|
77
|
+
- spec/spec_helper.rb
|
78
|
+
- spec/support/cyclist_generic.rb
|
79
|
+
- spec/tsv_integration_spec.rb
|