gnfinder 0.2.2 → 0.9.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 92d97dc85a5d774323c7e995573c47bf004d606ad61e8b15bb81c0ce492b419e
4
- data.tar.gz: 381a9dc59941089bd0620ce24456a4e39944ff7a7decef4ecadd13e5ad918ff4
3
+ metadata.gz: dffcaa8904b1b9a47761b66aff325c86615aedec1600b9abe13194035c4aa061
4
+ data.tar.gz: f33a382799901181d3092f2c05585318accd1e5baebb6bb053d6ac7bc015e7d6
5
5
  SHA512:
6
- metadata.gz: 2851e18f695cffaed64c4da96349e00351da2918160ec16c5dada98a704d4156f9ef694f059a13a5072d7badb05ec30ae373468ee61aa7f0cab9ea81a7901df2
7
- data.tar.gz: 72ee313d4327eb6e2f3521438bcc788a24ee86ed412114118e61c3628bedec83cfd089dbb1ee478853e92178ca56572480d264cceba0fad1a096bd63f17f5b22
6
+ metadata.gz: 7a0358fac8b9f05532be3f5434c07a8fd5bd242d35545f45faa4798d8cc0eb540b21a7e11c292c624cbc6608c633106812fcc1ab219bcd2ce9a72b5e9d17fa66
7
+ data.tar.gz: 5dfc5b032530c7fea01672bdf3a0a5ce595832110b665a680ca575f1916cd17e84b33991da9462f6c35927513cf6cd71e7df3dd87bd65027480129489e793f1a
data/.rubocop.yml CHANGED
@@ -1,5 +1,5 @@
1
1
  AllCops:
2
- TargetRubyVersion: 2.5
2
+ TargetRubyVersion: 2.6
3
3
  Exclude:
4
4
  - 'lib/protob_pb.rb'
5
5
  - 'lib/protob_services_pb.rb'
data/README.md CHANGED
@@ -4,6 +4,18 @@ Ruby gem to access functionality of [gnfinder] project written in Go. This gem
4
4
  allows to perform fast and accurate scientific name finding in UTF-8 encoded
5
5
  plain texts for Ruby-based projects.
6
6
 
7
+ - [gnfinder](#gnfinder)
8
+ - [Requirements](#requirements)
9
+ - [Installation](#installation)
10
+ - [Usage](#usage)
11
+ - [Finding names in a text using default settings](#finding-names-in-a-text-using-default-settings)
12
+ - [Optionally disable Bayes search](#optionally-disable-bayes-search)
13
+ - [Set a language for the text](#set-a-language-for-the-text)
14
+ - [Set automatic detection of text's language](#set-automatic-detection-of-texts-language)
15
+ - [Set verification option](#set-verification-option)
16
+ - [Set preferred data-sources list](#set-preferred-data-sources-list)
17
+ - [Combination of parameters.](#combination-of-parameters)
18
+ - [Development](#development)
7
19
 
8
20
  ## Requirements
9
21
 
@@ -28,7 +40,7 @@ the original Go-lang [gnfinder] README file.
28
40
  First you need to create a instance of a `gnfinder` client
29
41
 
30
42
  ```ruby
31
- import 'gnfinder'
43
+ require 'gnfinder'
32
44
 
33
45
  gf = Gnfinder::Client.new
34
46
  ```
@@ -36,60 +48,78 @@ gf = Gnfinder::Client.new
36
48
  By default the client will try to connect to `localhost:8778`. If you
37
49
  have another location for the server use:
38
50
 
39
- ```
40
- import 'gnfinder'
41
- gf = Gnfinder.new(host: 123.123.123.123, port: 8000)
51
+
52
+
53
+ ```ruby
54
+ require 'gnfinder'
55
+
56
+ # you can use global public gnfinder server
57
+ # located at finder-rpc.globalnames.org
58
+ gf = Gnfinder::Client.new(host = 'finder-rpc.globalnames.org', port = 80)
59
+
60
+ # localhost, different port
61
+ gf = Gnfinder::Client.new(host = '0.0.0.0', port = 8000)
42
62
  ```
43
63
 
44
64
  ### Finding names in a text using default settings
45
65
 
66
+ You can find format of returning result in [proto file] or in [tests]
67
+
46
68
  ```ruby
47
69
  txt = File.read('utf8-text-with-names.txt')
48
70
 
49
- names = gf.find_names(txt)
50
- puts names[0].value
51
- puts names[0].odds
71
+ res = gf.find_names(txt)
72
+ puts res.names[0].value
73
+ puts res.names[0].odds
52
74
  ```
53
75
 
54
76
  Returned result will have the following methods for each name:
55
77
 
56
- * value: mame-string cleaned up for verification.
78
+ * value: name-string cleaned up for verification.
57
79
  * verbatim: name-string as it was found in the text.
58
80
  * odds: Bayes' odds value. For example odds 0.1 would mean that according to
59
81
  the algorithm there is 1 chance out of 10 that the name-string is
60
82
  a scientific name. This field will be empty if Bayes algorithms did not run.
61
83
 
62
- ### Always enable Bayes search
84
+ ### Optionally disable Bayes search
63
85
 
64
- For languages that are not supported by [gnfinder] only heuristic algorithms
65
- are used by default, because some languages that are close to Latin (Italian,
66
- French, Portugese) would generate too many false positives. However you can
67
- override this default setting by running:
86
+ Some languages that are close to Latin (Italian, French, Portugese) would
87
+ generate too many false positives. To decrease amount of false positives you
88
+ can disable Bayes algorithm by running:
68
89
 
69
90
  ```ruby
70
- names = gf.find_names(txt, with_bayes: true)
91
+ names = gf.find_names(txt, no_bayes: true).names
71
92
  ```
72
93
 
73
94
  ### Set a language for the text
74
95
 
75
- Sometimes gnfinder cannot determine the language of a text correctly. For
76
- example it happens when the text mostly consists of scientific names, or has
77
- large citations or list of references in a different language. It is possible
78
- to set a language for a text by hand. For supported languages
79
- (English and German) it will enable Bayes algorithm. For other languages
80
- this setting will be ignored.
96
+ It is possible to supply the prevalent language to set a language for a text
97
+ by hand. That might Bayes algorithms work better
81
98
 
82
99
  List of supported languages will increase with time.
83
100
 
84
101
  ```ruby
85
- names = gf.find_names(txt, language: 'eng')
86
- names = gf.find_names(txt, language: 'deu')
87
-
88
- # setting is ignored, only known by gnfinder
89
- # 3-character notations iso-639-2 code are supported
90
- names = gf.find_names(txt, language: 'english')
91
- names = gf.find_names(txt, language: 'rus')
102
+ res = gf.find_names(txt, language: 'eng')
103
+ puts res.language
104
+ res = gf.find_names(txt, language: 'deu')
105
+ puts res.language
106
+
107
+ # Setting is ignored if language string is not known by gnfinder.
108
+ # Only 3-character notations iso-639-2 code are supported
109
+ res = gf.find_names(txt, language: 'rus')
110
+ puts res.language
92
111
  ```
112
+ ## Set automatic detection of text's language
113
+
114
+ To enable automatic detection of prevalent language of a text use:
115
+
116
+ res = gf.find_names(txt, detect_language: true)
117
+ puts res.language
118
+ puts res.detect_language
119
+ puts res.language_detected
120
+
121
+ If detected language is not yet supported by Bayes algorithm, default
122
+ language (English) will be used.
93
123
 
94
124
  ### Set verification option
95
125
 
@@ -113,7 +143,7 @@ return the following information:
113
143
  * path: the classification path of a matched name (if available)
114
144
 
115
145
  ```ruby
116
- names = gf.find_names(txt, with_verification: true)
146
+ res = gf.find_names(txt, verification: true)
117
147
  ```
118
148
 
119
149
  ### Set preferred data-sources list
@@ -124,7 +154,7 @@ data-source (data-sources). There is a parameter that takes IDs from the
124
154
  results will be returned back.
125
155
 
126
156
  ```ruby
127
- names = gf.find_names(txt, with_verification: true, sources: [1, 4, 179])
157
+ res = gf.find_names(txt, verification: true, sources: [1, 4, 179])
128
158
  ```
129
159
  ### Combination of parameters.
130
160
 
@@ -134,11 +164,11 @@ a particular context. It is silently ignored.
134
164
  ```ruby
135
165
  # Runs Bayes' algorithms using English training set, runs verification and
136
166
  # returns matched results for 3 data-sources if they are available.
137
- names = gf.find_names(txt, language: eng, with_verification: true,
167
+ res = gf.find_names(txt, language: eng, verification: true,
138
168
  sources: [1, 4, 179])
139
169
 
140
170
  # Ignores `sources:` settings, because `with_verification` is not set to `true`
141
- names = gf.find_names(txt, language: eng, sources: [1, 4, 179])
171
+ res = gf.find_names(txt, language: eng, sources: [1, 4, 179])
142
172
  ```
143
173
 
144
174
  ## Development
@@ -183,3 +213,5 @@ bundle exec rspec
183
213
  [Go]: https://golang.org/doc/install
184
214
  [client]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/lib/gnfinder/client.rb
185
215
  [data-source list]: http://index.globalnames.org/datasource
216
+ [proto file]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/lib/protob_pb.rb
217
+ [tests]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/spec/lib/client_spec.rb
data/gnfinder.gemspec CHANGED
@@ -20,7 +20,7 @@ Gem::Specification.new do |gem|
20
20
  .reject { |f| f.match(%r{^(test|spec|features)/}) }
21
21
 
22
22
  gem.require_paths = ['lib']
23
- gem.required_ruby_version = '~> 2.5'
23
+ gem.required_ruby_version = '~> 2.6'
24
24
  gem.add_development_dependency 'bundler', '~> 2.0'
25
25
  gem.add_development_dependency 'byebug', '~> 10.0'
26
26
  gem.add_development_dependency 'grpc', '~> 1.15'
@@ -1,31 +1,45 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Gnfinder
4
+ GNFINDER_MIN_VERSION = 'v0.9.0'
5
+
4
6
  # Gnfinder::Client connects to gnfinder server
5
7
  class Client
6
8
  def initialize(host = '0.0.0.0', port = '8778')
7
9
  @stub = Protob::GNFinder::Stub.new("#{host}:#{port}",
8
10
  :this_channel_is_insecure)
11
+ return if gnfinder_version.version >= GNFINDER_MIN_VERSION
12
+
13
+ raise 'gRPC server of gnfinder should be at least ' \
14
+ ' #{GNFINDER_MIN_VERSION}.\n Download latest version from ' \
15
+ 'https://github.com/gnames/gnfinder/releases/latest.'
16
+ end
17
+
18
+ def gnfinder_version
19
+ @stub.ver(Protob::Void.new)
9
20
  end
10
21
 
11
22
  def ping
12
23
  @stub.ping(Protob::Void.new).value
13
24
  end
14
25
 
15
- # rubocop:disable Metrics/AbcSize, Metrics/CyclomaticComplexity
26
+ # rubocop:disable all
16
27
  def find_names(text, opts = {})
17
28
  raise 'Text cannot be empty' if text.to_s.strip == ''
18
29
 
19
30
  params = { text: text }
20
- params[:with_bayes] = true if opts[:with_bayes]
31
+ params[:no_bayes] = true if opts[:no_bayes]
21
32
  params[:language] = opts[:language] if opts[:language].to_s.strip != ''
22
- params[:with_verification] = true if opts[:with_verification]
33
+ if opts[:detect_language]
34
+ params[:detect_language] = opts[:detect_language]
35
+ end
36
+ params[:verification] = true if opts[:verification]
23
37
  if opts[:sources] && !opts[:sources].empty?
24
38
  params[:sources] = opts[:sources]
25
39
  end
26
40
 
27
- @stub.find_names(Protob::Params.new(params)).names
41
+ @stub.find_names(Protob::Params.new(params))
28
42
  end
29
- # rubocop:enable Metrics/AbcSize, Metrics/CyclomaticComplexity
43
+ # rubocop:enable all
30
44
  end
31
45
  end
@@ -2,7 +2,8 @@
2
2
 
3
3
  # Gnfinder is a namespace module for gndinfer gem.
4
4
  module Gnfinder
5
- VERSION = '0.2.2'
5
+ # Version corresponds to the minimal supported version of Go gnfinder
6
+ VERSION = '0.9.0'
6
7
 
7
8
  def self.version
8
9
  VERSION
data/lib/protob_pb.rb CHANGED
@@ -4,72 +4,83 @@
4
4
  require 'google/protobuf'
5
5
 
6
6
  Google::Protobuf::DescriptorPool.generated_pool.build do
7
- add_message "protob.Pong" do
8
- optional :value, :string, 1
9
- end
10
- add_message "protob.Void" do
11
- end
12
- add_message "protob.Params" do
13
- optional :text, :string, 1
14
- optional :with_bayes, :bool, 3
15
- optional :language, :string, 4
16
- optional :with_verification, :bool, 5
17
- repeated :sources, :int32, 6
18
- end
19
- add_message "protob.NameStrings" do
20
- optional :date, :string, 1
21
- optional :language, :string, 2
22
- optional :total_tokens, :int32, 3
23
- optional :total_candidates, :int32, 4
24
- optional :total_names, :int32, 5
25
- repeated :names, :message, 6, "protob.NameString"
26
- end
27
- add_message "protob.NameString" do
28
- optional :type, :string, 1
29
- optional :verbatim, :string, 2
30
- optional :name, :string, 3
31
- optional :odds, :float, 4
32
- optional :offset_start, :int32, 5
33
- optional :offset_end, :int32, 6
34
- optional :verification, :message, 7, "protob.Verification"
35
- end
36
- add_message "protob.Verification" do
37
- optional :best_result, :message, 1, "protob.ResultData"
38
- repeated :preferred_results, :message, 2, "protob.ResultData"
39
- optional :data_sources_num, :int32, 3
40
- optional :data_source_quality, :string, 4
41
- optional :retries, :int32, 5
42
- optional :error, :string, 6
43
- end
44
- add_message "protob.ResultData" do
45
- optional :data_source_id, :int32, 1
46
- optional :data_source_title, :string, 2
47
- optional :taxon_id, :string, 3
48
- optional :matched_name, :string, 4
49
- optional :matched_canonical, :string, 5
50
- optional :current_name, :string, 6
51
- optional :synonym, :bool, 7
52
- optional :classification_path, :string, 8
53
- optional :classification_rank, :string, 9
54
- optional :classification_ids, :string, 10
55
- optional :edit_distance, :int32, 11
56
- optional :stem_edit_distance, :int32, 12
57
- optional :match_type, :enum, 13, "protob.MatchType"
58
- end
59
- add_enum "protob.MatchType" do
60
- value :NONE, 0
61
- value :EXACT, 1
62
- value :FUZZY, 2
63
- value :PARTIAL_EXACT, 3
64
- value :PARTIAL_FUZZY, 4
7
+ add_file("protob.proto", :syntax => :proto3) do
8
+ add_message "protob.Void" do
9
+ end
10
+ add_message "protob.Pong" do
11
+ optional :value, :string, 1
12
+ end
13
+ add_message "protob.Version" do
14
+ optional :version, :string, 1
15
+ optional :build, :string, 2
16
+ end
17
+ add_message "protob.Params" do
18
+ optional :text, :string, 1
19
+ optional :no_bayes, :bool, 3
20
+ optional :language, :string, 4
21
+ optional :detect_language, :bool, 5
22
+ optional :verification, :bool, 6
23
+ repeated :sources, :int32, 7
24
+ end
25
+ add_message "protob.Output" do
26
+ optional :date, :string, 1
27
+ optional :finder_version, :string, 2
28
+ optional :language, :string, 3
29
+ optional :language_detected, :string, 4
30
+ optional :detect_language, :bool, 5
31
+ optional :total_tokens, :int32, 6
32
+ optional :total_candidates, :int32, 7
33
+ optional :total_names, :int32, 8
34
+ repeated :names, :message, 9, "protob.NameString"
35
+ end
36
+ add_message "protob.NameString" do
37
+ optional :type, :string, 1
38
+ optional :verbatim, :string, 2
39
+ optional :name, :string, 3
40
+ optional :odds, :float, 4
41
+ optional :offset_start, :int32, 5
42
+ optional :offset_end, :int32, 6
43
+ optional :verification, :message, 7, "protob.Verification"
44
+ end
45
+ add_message "protob.Verification" do
46
+ optional :best_result, :message, 1, "protob.ResultData"
47
+ repeated :preferred_results, :message, 2, "protob.ResultData"
48
+ optional :data_sources_num, :int32, 3
49
+ optional :data_source_quality, :string, 4
50
+ optional :retries, :int32, 5
51
+ optional :error, :string, 6
52
+ end
53
+ add_message "protob.ResultData" do
54
+ optional :data_source_id, :int32, 1
55
+ optional :data_source_title, :string, 2
56
+ optional :taxon_id, :string, 3
57
+ optional :matched_name, :string, 4
58
+ optional :matched_canonical, :string, 5
59
+ optional :current_name, :string, 6
60
+ optional :synonym, :bool, 7
61
+ optional :classification_path, :string, 8
62
+ optional :classification_rank, :string, 9
63
+ optional :classification_ids, :string, 10
64
+ optional :edit_distance, :int32, 11
65
+ optional :stem_edit_distance, :int32, 12
66
+ optional :match_type, :enum, 13, "protob.MatchType"
67
+ end
68
+ add_enum "protob.MatchType" do
69
+ value :NONE, 0
70
+ value :EXACT, 1
71
+ value :FUZZY, 2
72
+ value :PARTIAL_EXACT, 3
73
+ value :PARTIAL_FUZZY, 4
74
+ end
65
75
  end
66
76
  end
67
77
 
68
78
  module Protob
69
- Pong = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Pong").msgclass
70
79
  Void = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Void").msgclass
80
+ Pong = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Pong").msgclass
81
+ Version = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Version").msgclass
71
82
  Params = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Params").msgclass
72
- NameStrings = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.NameStrings").msgclass
83
+ Output = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Output").msgclass
73
84
  NameString = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.NameString").msgclass
74
85
  Verification = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Verification").msgclass
75
86
  ResultData = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.ResultData").msgclass
@@ -15,7 +15,8 @@ module Protob
15
15
  self.service_name = 'protob.GNFinder'
16
16
 
17
17
  rpc :Ping, Void, Pong
18
- rpc :FindNames, Params, NameStrings
18
+ rpc :Ver, Void, Version
19
+ rpc :FindNames, Params, Output
19
20
  end
20
21
 
21
22
  Stub = Service.rpc_stub_class
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gnfinder
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.2
4
+ version: 0.9.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dmitry Mozzherin
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-04-23 00:00:00.000000000 Z
11
+ date: 2019-10-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -142,15 +142,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
142
142
  requirements:
143
143
  - - "~>"
144
144
  - !ruby/object:Gem::Version
145
- version: '2.5'
145
+ version: '2.6'
146
146
  required_rubygems_version: !ruby/object:Gem::Requirement
147
147
  requirements:
148
148
  - - ">="
149
149
  - !ruby/object:Gem::Version
150
150
  version: '0'
151
151
  requirements: []
152
- rubyforge_project:
153
- rubygems_version: 2.7.6
152
+ rubygems_version: 3.0.3
154
153
  signing_key:
155
154
  specification_version: 4
156
155
  summary: Scientific names finder