gnfinder 0.2.2 → 0.9.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.rubocop.yml +1 -1
- data/README.md +63 -31
- data/gnfinder.gemspec +1 -1
- data/lib/gnfinder/client.rb +19 -5
- data/lib/gnfinder/version.rb +2 -1
- data/lib/protob_pb.rb +71 -60
- data/lib/protob_services_pb.rb +2 -1
- metadata +4 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: dffcaa8904b1b9a47761b66aff325c86615aedec1600b9abe13194035c4aa061
|
4
|
+
data.tar.gz: f33a382799901181d3092f2c05585318accd1e5baebb6bb053d6ac7bc015e7d6
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7a0358fac8b9f05532be3f5434c07a8fd5bd242d35545f45faa4798d8cc0eb540b21a7e11c292c624cbc6608c633106812fcc1ab219bcd2ce9a72b5e9d17fa66
|
7
|
+
data.tar.gz: 5dfc5b032530c7fea01672bdf3a0a5ce595832110b665a680ca575f1916cd17e84b33991da9462f6c35927513cf6cd71e7df3dd87bd65027480129489e793f1a
|
data/.rubocop.yml
CHANGED
data/README.md
CHANGED
@@ -4,6 +4,18 @@ Ruby gem to access functionality of [gnfinder] project written in Go. This gem
|
|
4
4
|
allows to perform fast and accurate scientific name finding in UTF-8 encoded
|
5
5
|
plain texts for Ruby-based projects.
|
6
6
|
|
7
|
+
- [gnfinder](#gnfinder)
|
8
|
+
- [Requirements](#requirements)
|
9
|
+
- [Installation](#installation)
|
10
|
+
- [Usage](#usage)
|
11
|
+
- [Finding names in a text using default settings](#finding-names-in-a-text-using-default-settings)
|
12
|
+
- [Optionally disable Bayes search](#optionally-disable-bayes-search)
|
13
|
+
- [Set a language for the text](#set-a-language-for-the-text)
|
14
|
+
- [Set automatic detection of text's language](#set-automatic-detection-of-texts-language)
|
15
|
+
- [Set verification option](#set-verification-option)
|
16
|
+
- [Set preferred data-sources list](#set-preferred-data-sources-list)
|
17
|
+
- [Combination of parameters.](#combination-of-parameters)
|
18
|
+
- [Development](#development)
|
7
19
|
|
8
20
|
## Requirements
|
9
21
|
|
@@ -28,7 +40,7 @@ the original Go-lang [gnfinder] README file.
|
|
28
40
|
First you need to create a instance of a `gnfinder` client
|
29
41
|
|
30
42
|
```ruby
|
31
|
-
|
43
|
+
require 'gnfinder'
|
32
44
|
|
33
45
|
gf = Gnfinder::Client.new
|
34
46
|
```
|
@@ -36,60 +48,78 @@ gf = Gnfinder::Client.new
|
|
36
48
|
By default the client will try to connect to `localhost:8778`. If you
|
37
49
|
have another location for the server use:
|
38
50
|
|
39
|
-
|
40
|
-
|
41
|
-
|
51
|
+
|
52
|
+
|
53
|
+
```ruby
|
54
|
+
require 'gnfinder'
|
55
|
+
|
56
|
+
# you can use global public gnfinder server
|
57
|
+
# located at finder-rpc.globalnames.org
|
58
|
+
gf = Gnfinder::Client.new(host = 'finder-rpc.globalnames.org', port = 80)
|
59
|
+
|
60
|
+
# localhost, different port
|
61
|
+
gf = Gnfinder::Client.new(host = '0.0.0.0', port = 8000)
|
42
62
|
```
|
43
63
|
|
44
64
|
### Finding names in a text using default settings
|
45
65
|
|
66
|
+
You can find format of returning result in [proto file] or in [tests]
|
67
|
+
|
46
68
|
```ruby
|
47
69
|
txt = File.read('utf8-text-with-names.txt')
|
48
70
|
|
49
|
-
|
50
|
-
puts names[0].value
|
51
|
-
puts names[0].odds
|
71
|
+
res = gf.find_names(txt)
|
72
|
+
puts res.names[0].value
|
73
|
+
puts res.names[0].odds
|
52
74
|
```
|
53
75
|
|
54
76
|
Returned result will have the following methods for each name:
|
55
77
|
|
56
|
-
* value:
|
78
|
+
* value: name-string cleaned up for verification.
|
57
79
|
* verbatim: name-string as it was found in the text.
|
58
80
|
* odds: Bayes' odds value. For example odds 0.1 would mean that according to
|
59
81
|
the algorithm there is 1 chance out of 10 that the name-string is
|
60
82
|
a scientific name. This field will be empty if Bayes algorithms did not run.
|
61
83
|
|
62
|
-
###
|
84
|
+
### Optionally disable Bayes search
|
63
85
|
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
override this default setting by running:
|
86
|
+
Some languages that are close to Latin (Italian, French, Portugese) would
|
87
|
+
generate too many false positives. To decrease amount of false positives you
|
88
|
+
can disable Bayes algorithm by running:
|
68
89
|
|
69
90
|
```ruby
|
70
|
-
names = gf.find_names(txt,
|
91
|
+
names = gf.find_names(txt, no_bayes: true).names
|
71
92
|
```
|
72
93
|
|
73
94
|
### Set a language for the text
|
74
95
|
|
75
|
-
|
76
|
-
|
77
|
-
large citations or list of references in a different language. It is possible
|
78
|
-
to set a language for a text by hand. For supported languages
|
79
|
-
(English and German) it will enable Bayes algorithm. For other languages
|
80
|
-
this setting will be ignored.
|
96
|
+
It is possible to supply the prevalent language to set a language for a text
|
97
|
+
by hand. That might Bayes algorithms work better
|
81
98
|
|
82
99
|
List of supported languages will increase with time.
|
83
100
|
|
84
101
|
```ruby
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
102
|
+
res = gf.find_names(txt, language: 'eng')
|
103
|
+
puts res.language
|
104
|
+
res = gf.find_names(txt, language: 'deu')
|
105
|
+
puts res.language
|
106
|
+
|
107
|
+
# Setting is ignored if language string is not known by gnfinder.
|
108
|
+
# Only 3-character notations iso-639-2 code are supported
|
109
|
+
res = gf.find_names(txt, language: 'rus')
|
110
|
+
puts res.language
|
92
111
|
```
|
112
|
+
## Set automatic detection of text's language
|
113
|
+
|
114
|
+
To enable automatic detection of prevalent language of a text use:
|
115
|
+
|
116
|
+
res = gf.find_names(txt, detect_language: true)
|
117
|
+
puts res.language
|
118
|
+
puts res.detect_language
|
119
|
+
puts res.language_detected
|
120
|
+
|
121
|
+
If detected language is not yet supported by Bayes algorithm, default
|
122
|
+
language (English) will be used.
|
93
123
|
|
94
124
|
### Set verification option
|
95
125
|
|
@@ -113,7 +143,7 @@ return the following information:
|
|
113
143
|
* path: the classification path of a matched name (if available)
|
114
144
|
|
115
145
|
```ruby
|
116
|
-
|
146
|
+
res = gf.find_names(txt, verification: true)
|
117
147
|
```
|
118
148
|
|
119
149
|
### Set preferred data-sources list
|
@@ -124,7 +154,7 @@ data-source (data-sources). There is a parameter that takes IDs from the
|
|
124
154
|
results will be returned back.
|
125
155
|
|
126
156
|
```ruby
|
127
|
-
|
157
|
+
res = gf.find_names(txt, verification: true, sources: [1, 4, 179])
|
128
158
|
```
|
129
159
|
### Combination of parameters.
|
130
160
|
|
@@ -134,11 +164,11 @@ a particular context. It is silently ignored.
|
|
134
164
|
```ruby
|
135
165
|
# Runs Bayes' algorithms using English training set, runs verification and
|
136
166
|
# returns matched results for 3 data-sources if they are available.
|
137
|
-
|
167
|
+
res = gf.find_names(txt, language: eng, verification: true,
|
138
168
|
sources: [1, 4, 179])
|
139
169
|
|
140
170
|
# Ignores `sources:` settings, because `with_verification` is not set to `true`
|
141
|
-
|
171
|
+
res = gf.find_names(txt, language: eng, sources: [1, 4, 179])
|
142
172
|
```
|
143
173
|
|
144
174
|
## Development
|
@@ -183,3 +213,5 @@ bundle exec rspec
|
|
183
213
|
[Go]: https://golang.org/doc/install
|
184
214
|
[client]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/lib/gnfinder/client.rb
|
185
215
|
[data-source list]: http://index.globalnames.org/datasource
|
216
|
+
[proto file]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/lib/protob_pb.rb
|
217
|
+
[tests]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/spec/lib/client_spec.rb
|
data/gnfinder.gemspec
CHANGED
@@ -20,7 +20,7 @@ Gem::Specification.new do |gem|
|
|
20
20
|
.reject { |f| f.match(%r{^(test|spec|features)/}) }
|
21
21
|
|
22
22
|
gem.require_paths = ['lib']
|
23
|
-
gem.required_ruby_version = '~> 2.
|
23
|
+
gem.required_ruby_version = '~> 2.6'
|
24
24
|
gem.add_development_dependency 'bundler', '~> 2.0'
|
25
25
|
gem.add_development_dependency 'byebug', '~> 10.0'
|
26
26
|
gem.add_development_dependency 'grpc', '~> 1.15'
|
data/lib/gnfinder/client.rb
CHANGED
@@ -1,31 +1,45 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
module Gnfinder
|
4
|
+
GNFINDER_MIN_VERSION = 'v0.9.0'
|
5
|
+
|
4
6
|
# Gnfinder::Client connects to gnfinder server
|
5
7
|
class Client
|
6
8
|
def initialize(host = '0.0.0.0', port = '8778')
|
7
9
|
@stub = Protob::GNFinder::Stub.new("#{host}:#{port}",
|
8
10
|
:this_channel_is_insecure)
|
11
|
+
return if gnfinder_version.version >= GNFINDER_MIN_VERSION
|
12
|
+
|
13
|
+
raise 'gRPC server of gnfinder should be at least ' \
|
14
|
+
' #{GNFINDER_MIN_VERSION}.\n Download latest version from ' \
|
15
|
+
'https://github.com/gnames/gnfinder/releases/latest.'
|
16
|
+
end
|
17
|
+
|
18
|
+
def gnfinder_version
|
19
|
+
@stub.ver(Protob::Void.new)
|
9
20
|
end
|
10
21
|
|
11
22
|
def ping
|
12
23
|
@stub.ping(Protob::Void.new).value
|
13
24
|
end
|
14
25
|
|
15
|
-
# rubocop:disable
|
26
|
+
# rubocop:disable all
|
16
27
|
def find_names(text, opts = {})
|
17
28
|
raise 'Text cannot be empty' if text.to_s.strip == ''
|
18
29
|
|
19
30
|
params = { text: text }
|
20
|
-
params[:
|
31
|
+
params[:no_bayes] = true if opts[:no_bayes]
|
21
32
|
params[:language] = opts[:language] if opts[:language].to_s.strip != ''
|
22
|
-
|
33
|
+
if opts[:detect_language]
|
34
|
+
params[:detect_language] = opts[:detect_language]
|
35
|
+
end
|
36
|
+
params[:verification] = true if opts[:verification]
|
23
37
|
if opts[:sources] && !opts[:sources].empty?
|
24
38
|
params[:sources] = opts[:sources]
|
25
39
|
end
|
26
40
|
|
27
|
-
@stub.find_names(Protob::Params.new(params))
|
41
|
+
@stub.find_names(Protob::Params.new(params))
|
28
42
|
end
|
29
|
-
# rubocop:enable
|
43
|
+
# rubocop:enable all
|
30
44
|
end
|
31
45
|
end
|
data/lib/gnfinder/version.rb
CHANGED
data/lib/protob_pb.rb
CHANGED
@@ -4,72 +4,83 @@
|
|
4
4
|
require 'google/protobuf'
|
5
5
|
|
6
6
|
Google::Protobuf::DescriptorPool.generated_pool.build do
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
7
|
+
add_file("protob.proto", :syntax => :proto3) do
|
8
|
+
add_message "protob.Void" do
|
9
|
+
end
|
10
|
+
add_message "protob.Pong" do
|
11
|
+
optional :value, :string, 1
|
12
|
+
end
|
13
|
+
add_message "protob.Version" do
|
14
|
+
optional :version, :string, 1
|
15
|
+
optional :build, :string, 2
|
16
|
+
end
|
17
|
+
add_message "protob.Params" do
|
18
|
+
optional :text, :string, 1
|
19
|
+
optional :no_bayes, :bool, 3
|
20
|
+
optional :language, :string, 4
|
21
|
+
optional :detect_language, :bool, 5
|
22
|
+
optional :verification, :bool, 6
|
23
|
+
repeated :sources, :int32, 7
|
24
|
+
end
|
25
|
+
add_message "protob.Output" do
|
26
|
+
optional :date, :string, 1
|
27
|
+
optional :finder_version, :string, 2
|
28
|
+
optional :language, :string, 3
|
29
|
+
optional :language_detected, :string, 4
|
30
|
+
optional :detect_language, :bool, 5
|
31
|
+
optional :total_tokens, :int32, 6
|
32
|
+
optional :total_candidates, :int32, 7
|
33
|
+
optional :total_names, :int32, 8
|
34
|
+
repeated :names, :message, 9, "protob.NameString"
|
35
|
+
end
|
36
|
+
add_message "protob.NameString" do
|
37
|
+
optional :type, :string, 1
|
38
|
+
optional :verbatim, :string, 2
|
39
|
+
optional :name, :string, 3
|
40
|
+
optional :odds, :float, 4
|
41
|
+
optional :offset_start, :int32, 5
|
42
|
+
optional :offset_end, :int32, 6
|
43
|
+
optional :verification, :message, 7, "protob.Verification"
|
44
|
+
end
|
45
|
+
add_message "protob.Verification" do
|
46
|
+
optional :best_result, :message, 1, "protob.ResultData"
|
47
|
+
repeated :preferred_results, :message, 2, "protob.ResultData"
|
48
|
+
optional :data_sources_num, :int32, 3
|
49
|
+
optional :data_source_quality, :string, 4
|
50
|
+
optional :retries, :int32, 5
|
51
|
+
optional :error, :string, 6
|
52
|
+
end
|
53
|
+
add_message "protob.ResultData" do
|
54
|
+
optional :data_source_id, :int32, 1
|
55
|
+
optional :data_source_title, :string, 2
|
56
|
+
optional :taxon_id, :string, 3
|
57
|
+
optional :matched_name, :string, 4
|
58
|
+
optional :matched_canonical, :string, 5
|
59
|
+
optional :current_name, :string, 6
|
60
|
+
optional :synonym, :bool, 7
|
61
|
+
optional :classification_path, :string, 8
|
62
|
+
optional :classification_rank, :string, 9
|
63
|
+
optional :classification_ids, :string, 10
|
64
|
+
optional :edit_distance, :int32, 11
|
65
|
+
optional :stem_edit_distance, :int32, 12
|
66
|
+
optional :match_type, :enum, 13, "protob.MatchType"
|
67
|
+
end
|
68
|
+
add_enum "protob.MatchType" do
|
69
|
+
value :NONE, 0
|
70
|
+
value :EXACT, 1
|
71
|
+
value :FUZZY, 2
|
72
|
+
value :PARTIAL_EXACT, 3
|
73
|
+
value :PARTIAL_FUZZY, 4
|
74
|
+
end
|
65
75
|
end
|
66
76
|
end
|
67
77
|
|
68
78
|
module Protob
|
69
|
-
Pong = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Pong").msgclass
|
70
79
|
Void = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Void").msgclass
|
80
|
+
Pong = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Pong").msgclass
|
81
|
+
Version = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Version").msgclass
|
71
82
|
Params = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Params").msgclass
|
72
|
-
|
83
|
+
Output = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Output").msgclass
|
73
84
|
NameString = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.NameString").msgclass
|
74
85
|
Verification = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Verification").msgclass
|
75
86
|
ResultData = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.ResultData").msgclass
|
data/lib/protob_services_pb.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: gnfinder
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.9.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dmitry Mozzherin
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2019-
|
11
|
+
date: 2019-10-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -142,15 +142,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
142
142
|
requirements:
|
143
143
|
- - "~>"
|
144
144
|
- !ruby/object:Gem::Version
|
145
|
-
version: '2.
|
145
|
+
version: '2.6'
|
146
146
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
147
147
|
requirements:
|
148
148
|
- - ">="
|
149
149
|
- !ruby/object:Gem::Version
|
150
150
|
version: '0'
|
151
151
|
requirements: []
|
152
|
-
|
153
|
-
rubygems_version: 2.7.6
|
152
|
+
rubygems_version: 3.0.3
|
154
153
|
signing_key:
|
155
154
|
specification_version: 4
|
156
155
|
summary: Scientific names finder
|