taxonifi 0.5.0 → 0.5.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +1 -2
- data/README.md +13 -27
- data/lib/taxonifi/splitter/parser.rb +1 -1
- data/lib/taxonifi/splitter/tokens.rb +4 -4
- data/lib/taxonifi/version.rb +1 -1
- data/test/test_parser.rb +32 -13
- data/test/test_splitter_tokens.rb +1 -3
- data/travis/before_install.sh +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 1ba6e02176e1a58a103de5dfd244b3851fe72965f2421e97660096524bf70592
|
4
|
+
data.tar.gz: 26f00acd0c9d3457f6b0668c86035c721970e215d70c5ae78b6822ccce93dfb3
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7fd1d847c8a383d2a98a11c251f6d384bff3ab6f2de042264704d43ebc76bd9f4db9cf07209a73983ef78fdca3c6ac5945f2a93ca3c1815aaabc23d6f143fcad
|
7
|
+
data.tar.gz: 06f77cc15c87c548d9807df95ce650ee6e14c3cf17ab57db9eace8d6c657862c6ebcf62f23106db4667db61390cab6ba8bf333e5f5861e1f553c5e7cab40ab2d
|
data/.travis.yml
CHANGED
data/README.md
CHANGED
@@ -1,29 +1,23 @@
|
|
1
1
|
|
2
2
|
[![Build Status](https://travis-ci.org/SpeciesFileGroup/taxonifi.svg?branch=master)](https://travis-ci.org/SpeciesFileGroup/taxonifi)
|
3
|
-
[![Dependency Status][7]][8]
|
4
3
|
|
5
4
|
|
6
|
-
|
7
|
-
|
5
|
+
|
6
|
+
# taxonifi
|
8
7
|
There will always be "legacy" taxonomic data that needs shuffling around. The taxonifi gem is a suite of general purpose tools that act as a middle layer for data-conversion purposes (e.g. migrating legacy taxonomic databases). Its first application was to convert DwC-style data downloaded from EoL into a Species File. The code is well documented in unit tests, poke around to see if it might be useful. In particular, if you've considered building a collection of regular expressions particular to biodiversity data look at the Tokens code and related tests.
|
9
8
|
|
10
9
|
Overall, the goal is to provide well documented (and unit-tested) coded that is broadly useful, and vanilla enough to encourage other to fork and hack on their own.
|
11
10
|
|
12
|
-
Source
|
13
|
-
------
|
11
|
+
# Source
|
14
12
|
Source is available at https://github.com/SpeciesFile/taxonifi . The rdoc API is also viewable at http://taxonifi.speciesfile.org , (though those docs may lag behind commits to github).
|
15
13
|
|
16
|
-
What's next?
|
17
|
-
------------
|
18
|
-
|
14
|
+
# What's next?
|
19
15
|
Before you jump on board you should also check out similar code from the Global Names team at https://github.com/GlobalNamesArchitecture. Future integration and merging of shared functionality is planned.
|
20
16
|
|
21
17
|
Taxonifi is presently coded for convience, not speed (though it's not necessarily slow). It assumes that conversion processes are typically one-offs that can afford to run over a longer period of time (read minutes rather than seconds). Reading, and fully parsing into objects, around 25k rows of nomenclature (class to species, inc. author year, = ~45k names) in to memory as Taxonifi objects benchmarks at around 2 minutes.
|
22
18
|
|
23
|
-
Getting started
|
24
|
-
|
25
|
-
taxonifi is coded for Ruby 1.9.3, it has not been tested on earlier versions (though it will certainly not work with 1.8.7).
|
26
|
-
Using Ruby Version Manager (RVM, https://rvm.io/ ) is highly recommend. You can test your version of Ruby by doinging "ruby -v" in your terminal.
|
19
|
+
# Getting started
|
20
|
+
taxonifi is coded for Ruby 2.6.5, 0.4.0 works on 1.9.4.
|
27
21
|
|
28
22
|
To install:
|
29
23
|
|
@@ -110,8 +104,7 @@ Parent/child style nomenclature is also parseable.
|
|
110
104
|
|
111
105
|
There are *lots* more examples of code use in the test suite.
|
112
106
|
|
113
|
-
Export/conversion
|
114
|
-
-----------------
|
107
|
+
# Export/conversion
|
115
108
|
|
116
109
|
The following is an example that translates a DwC style input format as exported by EOL into tables importable to SpeciesFile. The input file is has id, parent, child, vernacular, synonym columns. Data are exported by default to a the users home folder in a taxonifi directory. The export creates 6 tables that can be imported into Species File directly.
|
117
110
|
|
@@ -144,8 +137,7 @@ csv = CSV.read('input/my_data.tab', {
|
|
144
137
|
col_sep: "\t" } )
|
145
138
|
```
|
146
139
|
|
147
|
-
Code organization
|
148
|
-
-----------------
|
140
|
+
# Code organization
|
149
141
|
|
150
142
|
```
|
151
143
|
test # unit tests, quite a few of them
|
@@ -158,8 +150,7 @@ lib/model # Taxonifi objects
|
|
158
150
|
lib/splitter # a parser/lexer/token suite for breaking down data
|
159
151
|
```
|
160
152
|
|
161
|
-
Contributing to taxonifi
|
162
|
-
------------------------
|
153
|
+
# Contributing to taxonifi
|
163
154
|
|
164
155
|
(this is generic)
|
165
156
|
|
@@ -172,22 +163,17 @@ Contributing to taxonifi
|
|
172
163
|
* All pull requests should test clean.
|
173
164
|
* Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
|
174
165
|
|
175
|
-
About
|
176
|
-
-----
|
166
|
+
# About
|
177
167
|
|
178
168
|
taxonifi is coded by Matt Yoder in consultation with the Species File Group at University of Illinois.
|
179
169
|
|
180
|
-
Copyright
|
181
|
-
---------
|
170
|
+
# Copyright
|
182
171
|
|
183
|
-
Copyright (c) 2012 Illinois Natural History Survey. See LICENSE.txt for
|
172
|
+
Copyright (c) 2012-2020 Illinois Natural History Survey. See LICENSE.txt for
|
184
173
|
further details.
|
185
174
|
|
186
|
-
|
187
|
-
|
188
175
|
[1]: https://secure.travis-ci.org/SpeciesFileGroup/taxonifi.png?branch=master
|
189
176
|
[2]: https://travis-ci.org/SpeciesFileGroup/taxonifi.svg?branch=master
|
190
|
-
|
191
|
-
[8]: https://gemnasium.com/SpeciesFileGroup/taxonifi?branch=master
|
177
|
+
|
192
178
|
|
193
179
|
|
@@ -39,7 +39,7 @@ module Taxonifi::Splitter::Tokens
|
|
39
39
|
attr_reader :authors, :year, :parens
|
40
40
|
# This is going to hit just everything, should only be used
|
41
41
|
# in one off when you know you have that string.
|
42
|
-
@regexp = Regexp.new(/\A\s*(\(?[^\+\d)]+(\d
|
42
|
+
@regexp = Regexp.new(/\A\s*(\(?[^\+\d)]+(\d{4})?\)?)\s*/i)
|
43
43
|
|
44
44
|
def initialize(str)
|
45
45
|
str.strip!
|
@@ -52,9 +52,9 @@ module Taxonifi::Splitter::Tokens
|
|
52
52
|
@parens = false
|
53
53
|
end
|
54
54
|
# check for year
|
55
|
-
if w =~ /(\d
|
56
|
-
@year = $1.to_i
|
57
|
-
w.gsub!(/\d\
|
55
|
+
if w =~ /(\d{4})\Z/
|
56
|
+
@year = $1 ? $1.to_i : nil
|
57
|
+
w.gsub!(/\d{4}\Z/, "")
|
58
58
|
w.strip!
|
59
59
|
end
|
60
60
|
w.gsub!(/,\s*\Z/, '')
|
data/lib/taxonifi/version.rb
CHANGED
data/test/test_parser.rb
CHANGED
@@ -12,7 +12,7 @@ class Test_TaxonifiSplitterParser < Test::Unit::TestCase
|
|
12
12
|
assert_equal "Smith", builder.names.last.author
|
13
13
|
assert_equal 1912 , builder.names.last.year
|
14
14
|
assert_equal false, builder.names.last.parens
|
15
|
-
assert_equal "Foo stuff Smith, 1912", builder.display_name
|
15
|
+
assert_equal "Foo stuff Smith, 1912", builder.display_name
|
16
16
|
end
|
17
17
|
|
18
18
|
def test_that_parse_species_name_parses_subspecies
|
@@ -25,7 +25,7 @@ class Test_TaxonifiSplitterParser < Test::Unit::TestCase
|
|
25
25
|
assert_equal "Smith", builder.names.last.author
|
26
26
|
assert_equal 1912 , builder.names.last.year
|
27
27
|
assert_equal false, builder.names.last.parens
|
28
|
-
assert_equal "Foo stuff things Smith, 1912", builder.display_name
|
28
|
+
assert_equal "Foo stuff things Smith, 1912", builder.display_name
|
29
29
|
end
|
30
30
|
|
31
31
|
def test_that_parse_species_name_parses_subgenera
|
@@ -34,15 +34,15 @@ class Test_TaxonifiSplitterParser < Test::Unit::TestCase
|
|
34
34
|
Taxonifi::Splitter::Parser.new(lexer, builder).parse_species_name
|
35
35
|
assert_equal "Foo", builder.genus.name
|
36
36
|
assert_equal "Bar", builder.subgenus.name
|
37
|
-
assert_equal builder.genus, builder.subgenus.parent
|
37
|
+
assert_equal builder.genus, builder.subgenus.parent
|
38
38
|
assert_equal "stuff", builder.species.name
|
39
|
-
assert_equal builder.subgenus, builder.species.parent
|
39
|
+
assert_equal builder.subgenus, builder.species.parent
|
40
40
|
assert_equal "things", builder.subspecies.name
|
41
|
-
assert_equal builder.species, builder.subspecies.parent
|
41
|
+
assert_equal builder.species, builder.subspecies.parent
|
42
42
|
assert_equal "Smith", builder.names.last.author
|
43
43
|
assert_equal 1912, builder.names.last.year
|
44
44
|
assert_equal true, builder.names.last.parens
|
45
|
-
assert_equal "Foo (Bar) stuff things (Smith, 1912)", builder.display_name
|
45
|
+
assert_equal "Foo (Bar) stuff things (Smith, 1912)", builder.display_name
|
46
46
|
end
|
47
47
|
|
48
48
|
def test_that_parse_species_name_parses_variety_following_subspecies
|
@@ -56,7 +56,7 @@ class Test_TaxonifiSplitterParser < Test::Unit::TestCase
|
|
56
56
|
assert_equal "Smith", builder.names.last.author
|
57
57
|
assert_equal 1912 , builder.names.last.year
|
58
58
|
assert_equal false, builder.names.last.parens
|
59
|
-
assert_equal "Foo stuff things var. blorf Smith, 1912", builder.display_name
|
59
|
+
assert_equal "Foo stuff things var. blorf Smith, 1912", builder.display_name
|
60
60
|
end
|
61
61
|
|
62
62
|
|
@@ -71,10 +71,10 @@ class Test_TaxonifiSplitterParser < Test::Unit::TestCase
|
|
71
71
|
assert_equal "Smith", builder.names.last.author
|
72
72
|
assert_equal 1912 , builder.names.last.year
|
73
73
|
assert_equal false, builder.names.last.parens
|
74
|
-
assert_equal "Foo stuff var. blorf Smith, 1912", builder.display_name
|
74
|
+
assert_equal "Foo stuff var. blorf Smith, 1912", builder.display_name
|
75
75
|
end
|
76
76
|
|
77
|
-
|
77
|
+
|
78
78
|
def test_that_parse_species_name_parses_variety_following_species_without_author_year
|
79
79
|
lexer = Taxonifi::Splitter::Lexer.new("Foo stuff v. blorf", :species_name)
|
80
80
|
builder = Taxonifi::Model::SpeciesName.new
|
@@ -84,10 +84,9 @@ class Test_TaxonifiSplitterParser < Test::Unit::TestCase
|
|
84
84
|
assert_equal nil, builder.subspecies
|
85
85
|
assert_equal "blorf", builder.variety.name
|
86
86
|
assert_equal nil, builder.names.last.parens # not set
|
87
|
-
assert_equal "Foo stuff var. blorf", builder.display_name
|
87
|
+
assert_equal "Foo stuff var. blorf", builder.display_name
|
88
88
|
end
|
89
89
|
|
90
|
-
|
91
90
|
def test_that_parse_species_name_parses_variety_following_species_without_author_year_II
|
92
91
|
lexer = Taxonifi::Splitter::Lexer.new("Calyptonotus rolandri var. opacus", :species_name)
|
93
92
|
builder = Taxonifi::Model::SpeciesName.new
|
@@ -95,12 +94,32 @@ class Test_TaxonifiSplitterParser < Test::Unit::TestCase
|
|
95
94
|
assert_equal "Calyptonotus", builder.genus.name
|
96
95
|
assert_equal "rolandri", builder.species.name
|
97
96
|
assert_equal nil, builder.subspecies
|
97
|
+
assert_equal nil, builder.names.last.year
|
98
98
|
assert_equal "opacus", builder.variety.name
|
99
99
|
assert_equal nil, builder.names.last.parens # not set
|
100
|
-
assert_equal "Calyptonotus rolandri var. opacus", builder.display_name
|
100
|
+
assert_equal "Calyptonotus rolandri var. opacus", builder.display_name
|
101
|
+
end
|
102
|
+
|
103
|
+
def test_that_parse_family_name_parses_to_nil_year
|
104
|
+
lexer = Taxonifi::Splitter::Lexer.new("Bus aus (Jones)", :species_name)
|
105
|
+
builder = Taxonifi::Model::SpeciesName.new
|
106
|
+
Taxonifi::Splitter::Parser.new(lexer, builder).parse_species_name
|
107
|
+
assert_equal nil, builder.names.last.year
|
101
108
|
end
|
102
109
|
|
110
|
+
def test_that_parse_family_name_parses_to_year_2
|
111
|
+
lexer = Taxonifi::Splitter::Lexer.new("Bus aus Jones, 1920", :species_name)
|
112
|
+
builder = Taxonifi::Model::SpeciesName.new
|
113
|
+
Taxonifi::Splitter::Parser.new(lexer, builder).parse_species_name
|
114
|
+
assert_equal 1920, builder.names.last.year
|
115
|
+
end
|
103
116
|
|
117
|
+
def test_that_parse_family_name_parses_to_year_3
|
118
|
+
lexer = Taxonifi::Splitter::Lexer.new("Bus aus (Jones, 1920)", :species_name)
|
119
|
+
builder = Taxonifi::Model::SpeciesName.new
|
120
|
+
Taxonifi::Splitter::Parser.new(lexer, builder).parse_species_name
|
121
|
+
assert_equal 1920, builder.names.last.year
|
122
|
+
end
|
104
123
|
|
105
|
-
end
|
124
|
+
end
|
106
125
|
|
@@ -88,7 +88,7 @@ class Test_TaxonifiSplitterTokens < Test::Unit::TestCase
|
|
88
88
|
lexer = Taxonifi::Splitter::Lexer.new(s)
|
89
89
|
assert t = lexer.pop(Taxonifi::Splitter::Tokens::AuthorYear)
|
90
90
|
assert_equal a.strip, t.authors
|
91
|
-
assert_equal (y.size > 0 ? y.strip.to_i : nil), t.year
|
91
|
+
assert_equal (y.size > 0 ? y.strip.to_i : nil), t.year # bad test
|
92
92
|
assert_equal p, t.parens
|
93
93
|
s = nil
|
94
94
|
end
|
@@ -425,8 +425,6 @@ class Test_TaxonifiSplitterTokens < Test::Unit::TestCase
|
|
425
425
|
assert_equal "33", t.pg_end
|
426
426
|
assert_equal "ix 14, 19", t.remainder
|
427
427
|
|
428
|
-
|
429
428
|
end
|
430
|
-
|
431
429
|
end
|
432
430
|
|
data/travis/before_install.sh
CHANGED
@@ -1,2 +1,2 @@
|
|
1
1
|
#!/bin/sh
|
2
|
-
gem install bundler -v=1.
|
2
|
+
gem install bundler -v=1.17.3
|