gnfinder 0.2.2 → 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +1 -1
- data/README.md +63 -31
- data/gnfinder.gemspec +1 -1
- data/lib/gnfinder/client.rb +19 -5
- data/lib/gnfinder/version.rb +2 -1
- data/lib/protob_pb.rb +71 -60
- data/lib/protob_services_pb.rb +2 -1
- metadata +4 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: dffcaa8904b1b9a47761b66aff325c86615aedec1600b9abe13194035c4aa061
|
4
|
+
data.tar.gz: f33a382799901181d3092f2c05585318accd1e5baebb6bb053d6ac7bc015e7d6
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7a0358fac8b9f05532be3f5434c07a8fd5bd242d35545f45faa4798d8cc0eb540b21a7e11c292c624cbc6608c633106812fcc1ab219bcd2ce9a72b5e9d17fa66
|
7
|
+
data.tar.gz: 5dfc5b032530c7fea01672bdf3a0a5ce595832110b665a680ca575f1916cd17e84b33991da9462f6c35927513cf6cd71e7df3dd87bd65027480129489e793f1a
|
data/.rubocop.yml
CHANGED
data/README.md
CHANGED
@@ -4,6 +4,18 @@ Ruby gem to access functionality of [gnfinder] project written in Go. This gem
|
|
4
4
|
allows to perform fast and accurate scientific name finding in UTF-8 encoded
|
5
5
|
plain texts for Ruby-based projects.
|
6
6
|
|
7
|
+
- [gnfinder](#gnfinder)
|
8
|
+
- [Requirements](#requirements)
|
9
|
+
- [Installation](#installation)
|
10
|
+
- [Usage](#usage)
|
11
|
+
- [Finding names in a text using default settings](#finding-names-in-a-text-using-default-settings)
|
12
|
+
- [Optionally disable Bayes search](#optionally-disable-bayes-search)
|
13
|
+
- [Set a language for the text](#set-a-language-for-the-text)
|
14
|
+
- [Set automatic detection of text's language](#set-automatic-detection-of-texts-language)
|
15
|
+
- [Set verification option](#set-verification-option)
|
16
|
+
- [Set preferred data-sources list](#set-preferred-data-sources-list)
|
17
|
+
- [Combination of parameters.](#combination-of-parameters)
|
18
|
+
- [Development](#development)
|
7
19
|
|
8
20
|
## Requirements
|
9
21
|
|
@@ -28,7 +40,7 @@ the original Go-lang [gnfinder] README file.
|
|
28
40
|
First you need to create a instance of a `gnfinder` client
|
29
41
|
|
30
42
|
```ruby
|
31
|
-
|
43
|
+
require 'gnfinder'
|
32
44
|
|
33
45
|
gf = Gnfinder::Client.new
|
34
46
|
```
|
@@ -36,60 +48,78 @@ gf = Gnfinder::Client.new
|
|
36
48
|
By default the client will try to connect to `localhost:8778`. If you
|
37
49
|
have another location for the server use:
|
38
50
|
|
39
|
-
|
40
|
-
|
41
|
-
|
51
|
+
|
52
|
+
|
53
|
+
```ruby
|
54
|
+
require 'gnfinder'
|
55
|
+
|
56
|
+
# you can use global public gnfinder server
|
57
|
+
# located at finder-rpc.globalnames.org
|
58
|
+
gf = Gnfinder::Client.new(host = 'finder-rpc.globalnames.org', port = 80)
|
59
|
+
|
60
|
+
# localhost, different port
|
61
|
+
gf = Gnfinder::Client.new(host = '0.0.0.0', port = 8000)
|
42
62
|
```
|
43
63
|
|
44
64
|
### Finding names in a text using default settings
|
45
65
|
|
66
|
+
You can find format of returning result in [proto file] or in [tests]
|
67
|
+
|
46
68
|
```ruby
|
47
69
|
txt = File.read('utf8-text-with-names.txt')
|
48
70
|
|
49
|
-
|
50
|
-
puts names[0].value
|
51
|
-
puts names[0].odds
|
71
|
+
res = gf.find_names(txt)
|
72
|
+
puts res.names[0].value
|
73
|
+
puts res.names[0].odds
|
52
74
|
```
|
53
75
|
|
54
76
|
Returned result will have the following methods for each name:
|
55
77
|
|
56
|
-
* value:
|
78
|
+
* value: name-string cleaned up for verification.
|
57
79
|
* verbatim: name-string as it was found in the text.
|
58
80
|
* odds: Bayes' odds value. For example odds 0.1 would mean that according to
|
59
81
|
the algorithm there is 1 chance out of 10 that the name-string is
|
60
82
|
a scientific name. This field will be empty if Bayes algorithms did not run.
|
61
83
|
|
62
|
-
###
|
84
|
+
### Optionally disable Bayes search
|
63
85
|
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
override this default setting by running:
|
86
|
+
Some languages that are close to Latin (Italian, French, Portugese) would
|
87
|
+
generate too many false positives. To decrease amount of false positives you
|
88
|
+
can disable Bayes algorithm by running:
|
68
89
|
|
69
90
|
```ruby
|
70
|
-
names = gf.find_names(txt,
|
91
|
+
names = gf.find_names(txt, no_bayes: true).names
|
71
92
|
```
|
72
93
|
|
73
94
|
### Set a language for the text
|
74
95
|
|
75
|
-
|
76
|
-
|
77
|
-
large citations or list of references in a different language. It is possible
|
78
|
-
to set a language for a text by hand. For supported languages
|
79
|
-
(English and German) it will enable Bayes algorithm. For other languages
|
80
|
-
this setting will be ignored.
|
96
|
+
It is possible to supply the prevalent language to set a language for a text
|
97
|
+
by hand. That might Bayes algorithms work better
|
81
98
|
|
82
99
|
List of supported languages will increase with time.
|
83
100
|
|
84
101
|
```ruby
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
102
|
+
res = gf.find_names(txt, language: 'eng')
|
103
|
+
puts res.language
|
104
|
+
res = gf.find_names(txt, language: 'deu')
|
105
|
+
puts res.language
|
106
|
+
|
107
|
+
# Setting is ignored if language string is not known by gnfinder.
|
108
|
+
# Only 3-character notations iso-639-2 code are supported
|
109
|
+
res = gf.find_names(txt, language: 'rus')
|
110
|
+
puts res.language
|
92
111
|
```
|
112
|
+
## Set automatic detection of text's language
|
113
|
+
|
114
|
+
To enable automatic detection of prevalent language of a text use:
|
115
|
+
|
116
|
+
res = gf.find_names(txt, detect_language: true)
|
117
|
+
puts res.language
|
118
|
+
puts res.detect_language
|
119
|
+
puts res.language_detected
|
120
|
+
|
121
|
+
If detected language is not yet supported by Bayes algorithm, default
|
122
|
+
language (English) will be used.
|
93
123
|
|
94
124
|
### Set verification option
|
95
125
|
|
@@ -113,7 +143,7 @@ return the following information:
|
|
113
143
|
* path: the classification path of a matched name (if available)
|
114
144
|
|
115
145
|
```ruby
|
116
|
-
|
146
|
+
res = gf.find_names(txt, verification: true)
|
117
147
|
```
|
118
148
|
|
119
149
|
### Set preferred data-sources list
|
@@ -124,7 +154,7 @@ data-source (data-sources). There is a parameter that takes IDs from the
|
|
124
154
|
results will be returned back.
|
125
155
|
|
126
156
|
```ruby
|
127
|
-
|
157
|
+
res = gf.find_names(txt, verification: true, sources: [1, 4, 179])
|
128
158
|
```
|
129
159
|
### Combination of parameters.
|
130
160
|
|
@@ -134,11 +164,11 @@ a particular context. It is silently ignored.
|
|
134
164
|
```ruby
|
135
165
|
# Runs Bayes' algorithms using English training set, runs verification and
|
136
166
|
# returns matched results for 3 data-sources if they are available.
|
137
|
-
|
167
|
+
res = gf.find_names(txt, language: eng, verification: true,
|
138
168
|
sources: [1, 4, 179])
|
139
169
|
|
140
170
|
# Ignores `sources:` settings, because `with_verification` is not set to `true`
|
141
|
-
|
171
|
+
res = gf.find_names(txt, language: eng, sources: [1, 4, 179])
|
142
172
|
```
|
143
173
|
|
144
174
|
## Development
|
@@ -183,3 +213,5 @@ bundle exec rspec
|
|
183
213
|
[Go]: https://golang.org/doc/install
|
184
214
|
[client]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/lib/gnfinder/client.rb
|
185
215
|
[data-source list]: http://index.globalnames.org/datasource
|
216
|
+
[proto file]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/lib/protob_pb.rb
|
217
|
+
[tests]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/spec/lib/client_spec.rb
|
data/gnfinder.gemspec
CHANGED
@@ -20,7 +20,7 @@ Gem::Specification.new do |gem|
|
|
20
20
|
.reject { |f| f.match(%r{^(test|spec|features)/}) }
|
21
21
|
|
22
22
|
gem.require_paths = ['lib']
|
23
|
-
gem.required_ruby_version = '~> 2.
|
23
|
+
gem.required_ruby_version = '~> 2.6'
|
24
24
|
gem.add_development_dependency 'bundler', '~> 2.0'
|
25
25
|
gem.add_development_dependency 'byebug', '~> 10.0'
|
26
26
|
gem.add_development_dependency 'grpc', '~> 1.15'
|
data/lib/gnfinder/client.rb
CHANGED
@@ -1,31 +1,45 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
module Gnfinder
|
4
|
+
GNFINDER_MIN_VERSION = 'v0.9.0'
|
5
|
+
|
4
6
|
# Gnfinder::Client connects to gnfinder server
|
5
7
|
class Client
|
6
8
|
def initialize(host = '0.0.0.0', port = '8778')
|
7
9
|
@stub = Protob::GNFinder::Stub.new("#{host}:#{port}",
|
8
10
|
:this_channel_is_insecure)
|
11
|
+
return if gnfinder_version.version >= GNFINDER_MIN_VERSION
|
12
|
+
|
13
|
+
raise 'gRPC server of gnfinder should be at least ' \
|
14
|
+
' #{GNFINDER_MIN_VERSION}.\n Download latest version from ' \
|
15
|
+
'https://github.com/gnames/gnfinder/releases/latest.'
|
16
|
+
end
|
17
|
+
|
18
|
+
def gnfinder_version
|
19
|
+
@stub.ver(Protob::Void.new)
|
9
20
|
end
|
10
21
|
|
11
22
|
def ping
|
12
23
|
@stub.ping(Protob::Void.new).value
|
13
24
|
end
|
14
25
|
|
15
|
-
# rubocop:disable
|
26
|
+
# rubocop:disable all
|
16
27
|
def find_names(text, opts = {})
|
17
28
|
raise 'Text cannot be empty' if text.to_s.strip == ''
|
18
29
|
|
19
30
|
params = { text: text }
|
20
|
-
params[:
|
31
|
+
params[:no_bayes] = true if opts[:no_bayes]
|
21
32
|
params[:language] = opts[:language] if opts[:language].to_s.strip != ''
|
22
|
-
|
33
|
+
if opts[:detect_language]
|
34
|
+
params[:detect_language] = opts[:detect_language]
|
35
|
+
end
|
36
|
+
params[:verification] = true if opts[:verification]
|
23
37
|
if opts[:sources] && !opts[:sources].empty?
|
24
38
|
params[:sources] = opts[:sources]
|
25
39
|
end
|
26
40
|
|
27
|
-
@stub.find_names(Protob::Params.new(params))
|
41
|
+
@stub.find_names(Protob::Params.new(params))
|
28
42
|
end
|
29
|
-
# rubocop:enable
|
43
|
+
# rubocop:enable all
|
30
44
|
end
|
31
45
|
end
|
data/lib/gnfinder/version.rb
CHANGED
data/lib/protob_pb.rb
CHANGED
@@ -4,72 +4,83 @@
|
|
4
4
|
require 'google/protobuf'
|
5
5
|
|
6
6
|
Google::Protobuf::DescriptorPool.generated_pool.build do
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
7
|
+
add_file("protob.proto", :syntax => :proto3) do
|
8
|
+
add_message "protob.Void" do
|
9
|
+
end
|
10
|
+
add_message "protob.Pong" do
|
11
|
+
optional :value, :string, 1
|
12
|
+
end
|
13
|
+
add_message "protob.Version" do
|
14
|
+
optional :version, :string, 1
|
15
|
+
optional :build, :string, 2
|
16
|
+
end
|
17
|
+
add_message "protob.Params" do
|
18
|
+
optional :text, :string, 1
|
19
|
+
optional :no_bayes, :bool, 3
|
20
|
+
optional :language, :string, 4
|
21
|
+
optional :detect_language, :bool, 5
|
22
|
+
optional :verification, :bool, 6
|
23
|
+
repeated :sources, :int32, 7
|
24
|
+
end
|
25
|
+
add_message "protob.Output" do
|
26
|
+
optional :date, :string, 1
|
27
|
+
optional :finder_version, :string, 2
|
28
|
+
optional :language, :string, 3
|
29
|
+
optional :language_detected, :string, 4
|
30
|
+
optional :detect_language, :bool, 5
|
31
|
+
optional :total_tokens, :int32, 6
|
32
|
+
optional :total_candidates, :int32, 7
|
33
|
+
optional :total_names, :int32, 8
|
34
|
+
repeated :names, :message, 9, "protob.NameString"
|
35
|
+
end
|
36
|
+
add_message "protob.NameString" do
|
37
|
+
optional :type, :string, 1
|
38
|
+
optional :verbatim, :string, 2
|
39
|
+
optional :name, :string, 3
|
40
|
+
optional :odds, :float, 4
|
41
|
+
optional :offset_start, :int32, 5
|
42
|
+
optional :offset_end, :int32, 6
|
43
|
+
optional :verification, :message, 7, "protob.Verification"
|
44
|
+
end
|
45
|
+
add_message "protob.Verification" do
|
46
|
+
optional :best_result, :message, 1, "protob.ResultData"
|
47
|
+
repeated :preferred_results, :message, 2, "protob.ResultData"
|
48
|
+
optional :data_sources_num, :int32, 3
|
49
|
+
optional :data_source_quality, :string, 4
|
50
|
+
optional :retries, :int32, 5
|
51
|
+
optional :error, :string, 6
|
52
|
+
end
|
53
|
+
add_message "protob.ResultData" do
|
54
|
+
optional :data_source_id, :int32, 1
|
55
|
+
optional :data_source_title, :string, 2
|
56
|
+
optional :taxon_id, :string, 3
|
57
|
+
optional :matched_name, :string, 4
|
58
|
+
optional :matched_canonical, :string, 5
|
59
|
+
optional :current_name, :string, 6
|
60
|
+
optional :synonym, :bool, 7
|
61
|
+
optional :classification_path, :string, 8
|
62
|
+
optional :classification_rank, :string, 9
|
63
|
+
optional :classification_ids, :string, 10
|
64
|
+
optional :edit_distance, :int32, 11
|
65
|
+
optional :stem_edit_distance, :int32, 12
|
66
|
+
optional :match_type, :enum, 13, "protob.MatchType"
|
67
|
+
end
|
68
|
+
add_enum "protob.MatchType" do
|
69
|
+
value :NONE, 0
|
70
|
+
value :EXACT, 1
|
71
|
+
value :FUZZY, 2
|
72
|
+
value :PARTIAL_EXACT, 3
|
73
|
+
value :PARTIAL_FUZZY, 4
|
74
|
+
end
|
65
75
|
end
|
66
76
|
end
|
67
77
|
|
68
78
|
module Protob
|
69
|
-
Pong = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Pong").msgclass
|
70
79
|
Void = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Void").msgclass
|
80
|
+
Pong = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Pong").msgclass
|
81
|
+
Version = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Version").msgclass
|
71
82
|
Params = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Params").msgclass
|
72
|
-
|
83
|
+
Output = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Output").msgclass
|
73
84
|
NameString = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.NameString").msgclass
|
74
85
|
Verification = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Verification").msgclass
|
75
86
|
ResultData = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.ResultData").msgclass
|
data/lib/protob_services_pb.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: gnfinder
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.9.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dmitry Mozzherin
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2019-
|
11
|
+
date: 2019-10-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -142,15 +142,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
142
142
|
requirements:
|
143
143
|
- - "~>"
|
144
144
|
- !ruby/object:Gem::Version
|
145
|
-
version: '2.
|
145
|
+
version: '2.6'
|
146
146
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
147
147
|
requirements:
|
148
148
|
- - ">="
|
149
149
|
- !ruby/object:Gem::Version
|
150
150
|
version: '0'
|
151
151
|
requirements: []
|
152
|
-
|
153
|
-
rubygems_version: 2.7.6
|
152
|
+
rubygems_version: 3.0.3
|
154
153
|
signing_key:
|
155
154
|
specification_version: 4
|
156
155
|
summary: Scientific names finder
|