gnfinder 0.2.2 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 92d97dc85a5d774323c7e995573c47bf004d606ad61e8b15bb81c0ce492b419e
4
- data.tar.gz: 381a9dc59941089bd0620ce24456a4e39944ff7a7decef4ecadd13e5ad918ff4
3
+ metadata.gz: dffcaa8904b1b9a47761b66aff325c86615aedec1600b9abe13194035c4aa061
4
+ data.tar.gz: f33a382799901181d3092f2c05585318accd1e5baebb6bb053d6ac7bc015e7d6
5
5
  SHA512:
6
- metadata.gz: 2851e18f695cffaed64c4da96349e00351da2918160ec16c5dada98a704d4156f9ef694f059a13a5072d7badb05ec30ae373468ee61aa7f0cab9ea81a7901df2
7
- data.tar.gz: 72ee313d4327eb6e2f3521438bcc788a24ee86ed412114118e61c3628bedec83cfd089dbb1ee478853e92178ca56572480d264cceba0fad1a096bd63f17f5b22
6
+ metadata.gz: 7a0358fac8b9f05532be3f5434c07a8fd5bd242d35545f45faa4798d8cc0eb540b21a7e11c292c624cbc6608c633106812fcc1ab219bcd2ce9a72b5e9d17fa66
7
+ data.tar.gz: 5dfc5b032530c7fea01672bdf3a0a5ce595832110b665a680ca575f1916cd17e84b33991da9462f6c35927513cf6cd71e7df3dd87bd65027480129489e793f1a
data/.rubocop.yml CHANGED
@@ -1,5 +1,5 @@
1
1
  AllCops:
2
- TargetRubyVersion: 2.5
2
+ TargetRubyVersion: 2.6
3
3
  Exclude:
4
4
  - 'lib/protob_pb.rb'
5
5
  - 'lib/protob_services_pb.rb'
data/README.md CHANGED
@@ -4,6 +4,18 @@ Ruby gem to access functionality of [gnfinder] project written in Go. This gem
4
4
  allows to perform fast and accurate scientific name finding in UTF-8 encoded
5
5
  plain texts for Ruby-based projects.
6
6
 
7
+ - [gnfinder](#gnfinder)
8
+ - [Requirements](#requirements)
9
+ - [Installation](#installation)
10
+ - [Usage](#usage)
11
+ - [Finding names in a text using default settings](#finding-names-in-a-text-using-default-settings)
12
+ - [Optionally disable Bayes search](#optionally-disable-bayes-search)
13
+ - [Set a language for the text](#set-a-language-for-the-text)
14
+ - [Set automatic detection of text's language](#set-automatic-detection-of-texts-language)
15
+ - [Set verification option](#set-verification-option)
16
+ - [Set preferred data-sources list](#set-preferred-data-sources-list)
17
+ - [Combination of parameters.](#combination-of-parameters)
18
+ - [Development](#development)
7
19
 
8
20
  ## Requirements
9
21
 
@@ -28,7 +40,7 @@ the original Go-lang [gnfinder] README file.
28
40
  First you need to create a instance of a `gnfinder` client
29
41
 
30
42
  ```ruby
31
- import 'gnfinder'
43
+ require 'gnfinder'
32
44
 
33
45
  gf = Gnfinder::Client.new
34
46
  ```
@@ -36,60 +48,78 @@ gf = Gnfinder::Client.new
36
48
  By default the client will try to connect to `localhost:8778`. If you
37
49
  have another location for the server use:
38
50
 
39
- ```
40
- import 'gnfinder'
41
- gf = Gnfinder.new(host: 123.123.123.123, port: 8000)
51
+
52
+
53
+ ```ruby
54
+ require 'gnfinder'
55
+
56
+ # you can use global public gnfinder server
57
+ # located at finder-rpc.globalnames.org
58
+ gf = Gnfinder::Client.new(host = 'finder-rpc.globalnames.org', port = 80)
59
+
60
+ # localhost, different port
61
+ gf = Gnfinder::Client.new(host = '0.0.0.0', port = 8000)
42
62
  ```
43
63
 
44
64
  ### Finding names in a text using default settings
45
65
 
66
+ You can find format of returning result in [proto file] or in [tests]
67
+
46
68
  ```ruby
47
69
  txt = File.read('utf8-text-with-names.txt')
48
70
 
49
- names = gf.find_names(txt)
50
- puts names[0].value
51
- puts names[0].odds
71
+ res = gf.find_names(txt)
72
+ puts res.names[0].value
73
+ puts res.names[0].odds
52
74
  ```
53
75
 
54
76
  Returned result will have the following methods for each name:
55
77
 
56
- * value: mame-string cleaned up for verification.
78
+ * value: name-string cleaned up for verification.
57
79
  * verbatim: name-string as it was found in the text.
58
80
  * odds: Bayes' odds value. For example odds 0.1 would mean that according to
59
81
  the algorithm there is 1 chance out of 10 that the name-string is
60
82
  a scientific name. This field will be empty if Bayes algorithms did not run.
61
83
 
62
- ### Always enable Bayes search
84
+ ### Optionally disable Bayes search
63
85
 
64
- For languages that are not supported by [gnfinder] only heuristic algorithms
65
- are used by default, because some languages that are close to Latin (Italian,
66
- French, Portugese) would generate too many false positives. However you can
67
- override this default setting by running:
86
+ Some languages that are close to Latin (Italian, French, Portugese) would
87
+ generate too many false positives. To decrease amount of false positives you
88
+ can disable Bayes algorithm by running:
68
89
 
69
90
  ```ruby
70
- names = gf.find_names(txt, with_bayes: true)
91
+ names = gf.find_names(txt, no_bayes: true).names
71
92
  ```
72
93
 
73
94
  ### Set a language for the text
74
95
 
75
- Sometimes gnfinder cannot determine the language of a text correctly. For
76
- example it happens when the text mostly consists of scientific names, or has
77
- large citations or list of references in a different language. It is possible
78
- to set a language for a text by hand. For supported languages
79
- (English and German) it will enable Bayes algorithm. For other languages
80
- this setting will be ignored.
96
+ It is possible to supply the prevalent language to set a language for a text
97
+ by hand. That might Bayes algorithms work better
81
98
 
82
99
  List of supported languages will increase with time.
83
100
 
84
101
  ```ruby
85
- names = gf.find_names(txt, language: 'eng')
86
- names = gf.find_names(txt, language: 'deu')
87
-
88
- # setting is ignored, only known by gnfinder
89
- # 3-character notations iso-639-2 code are supported
90
- names = gf.find_names(txt, language: 'english')
91
- names = gf.find_names(txt, language: 'rus')
102
+ res = gf.find_names(txt, language: 'eng')
103
+ puts res.language
104
+ res = gf.find_names(txt, language: 'deu')
105
+ puts res.language
106
+
107
+ # Setting is ignored if language string is not known by gnfinder.
108
+ # Only 3-character notations iso-639-2 code are supported
109
+ res = gf.find_names(txt, language: 'rus')
110
+ puts res.language
92
111
  ```
112
+ ## Set automatic detection of text's language
113
+
114
+ To enable automatic detection of prevalent language of a text use:
115
+
116
+ res = gf.find_names(txt, detect_language: true)
117
+ puts res.language
118
+ puts res.detect_language
119
+ puts res.language_detected
120
+
121
+ If detected language is not yet supported by Bayes algorithm, default
122
+ language (English) will be used.
93
123
 
94
124
  ### Set verification option
95
125
 
@@ -113,7 +143,7 @@ return the following information:
113
143
  * path: the classification path of a matched name (if available)
114
144
 
115
145
  ```ruby
116
- names = gf.find_names(txt, with_verification: true)
146
+ res = gf.find_names(txt, verification: true)
117
147
  ```
118
148
 
119
149
  ### Set preferred data-sources list
@@ -124,7 +154,7 @@ data-source (data-sources). There is a parameter that takes IDs from the
124
154
  results will be returned back.
125
155
 
126
156
  ```ruby
127
- names = gf.find_names(txt, with_verification: true, sources: [1, 4, 179])
157
+ res = gf.find_names(txt, verification: true, sources: [1, 4, 179])
128
158
  ```
129
159
  ### Combination of parameters.
130
160
 
@@ -134,11 +164,11 @@ a particular context. It is silently ignored.
134
164
  ```ruby
135
165
  # Runs Bayes' algorithms using English training set, runs verification and
136
166
  # returns matched results for 3 data-sources if they are available.
137
- names = gf.find_names(txt, language: eng, with_verification: true,
167
+ res = gf.find_names(txt, language: eng, verification: true,
138
168
  sources: [1, 4, 179])
139
169
 
140
170
  # Ignores `sources:` settings, because `with_verification` is not set to `true`
141
- names = gf.find_names(txt, language: eng, sources: [1, 4, 179])
171
+ res = gf.find_names(txt, language: eng, sources: [1, 4, 179])
142
172
  ```
143
173
 
144
174
  ## Development
@@ -183,3 +213,5 @@ bundle exec rspec
183
213
  [Go]: https://golang.org/doc/install
184
214
  [client]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/lib/gnfinder/client.rb
185
215
  [data-source list]: http://index.globalnames.org/datasource
216
+ [proto file]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/lib/protob_pb.rb
217
+ [tests]: https://github.com/GlobalNamesArchitecture/gnfinder/blob/master/spec/lib/client_spec.rb
data/gnfinder.gemspec CHANGED
@@ -20,7 +20,7 @@ Gem::Specification.new do |gem|
20
20
  .reject { |f| f.match(%r{^(test|spec|features)/}) }
21
21
 
22
22
  gem.require_paths = ['lib']
23
- gem.required_ruby_version = '~> 2.5'
23
+ gem.required_ruby_version = '~> 2.6'
24
24
  gem.add_development_dependency 'bundler', '~> 2.0'
25
25
  gem.add_development_dependency 'byebug', '~> 10.0'
26
26
  gem.add_development_dependency 'grpc', '~> 1.15'
@@ -1,31 +1,45 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Gnfinder
4
+ GNFINDER_MIN_VERSION = 'v0.9.0'
5
+
4
6
  # Gnfinder::Client connects to gnfinder server
5
7
  class Client
6
8
  def initialize(host = '0.0.0.0', port = '8778')
7
9
  @stub = Protob::GNFinder::Stub.new("#{host}:#{port}",
8
10
  :this_channel_is_insecure)
11
+ return if gnfinder_version.version >= GNFINDER_MIN_VERSION
12
+
13
+ raise 'gRPC server of gnfinder should be at least ' \
14
+ ' #{GNFINDER_MIN_VERSION}.\n Download latest version from ' \
15
+ 'https://github.com/gnames/gnfinder/releases/latest.'
16
+ end
17
+
18
+ def gnfinder_version
19
+ @stub.ver(Protob::Void.new)
9
20
  end
10
21
 
11
22
  def ping
12
23
  @stub.ping(Protob::Void.new).value
13
24
  end
14
25
 
15
- # rubocop:disable Metrics/AbcSize, Metrics/CyclomaticComplexity
26
+ # rubocop:disable all
16
27
  def find_names(text, opts = {})
17
28
  raise 'Text cannot be empty' if text.to_s.strip == ''
18
29
 
19
30
  params = { text: text }
20
- params[:with_bayes] = true if opts[:with_bayes]
31
+ params[:no_bayes] = true if opts[:no_bayes]
21
32
  params[:language] = opts[:language] if opts[:language].to_s.strip != ''
22
- params[:with_verification] = true if opts[:with_verification]
33
+ if opts[:detect_language]
34
+ params[:detect_language] = opts[:detect_language]
35
+ end
36
+ params[:verification] = true if opts[:verification]
23
37
  if opts[:sources] && !opts[:sources].empty?
24
38
  params[:sources] = opts[:sources]
25
39
  end
26
40
 
27
- @stub.find_names(Protob::Params.new(params)).names
41
+ @stub.find_names(Protob::Params.new(params))
28
42
  end
29
- # rubocop:enable Metrics/AbcSize, Metrics/CyclomaticComplexity
43
+ # rubocop:enable all
30
44
  end
31
45
  end
@@ -2,7 +2,8 @@
2
2
 
3
3
  # Gnfinder is a namespace module for gndinfer gem.
4
4
  module Gnfinder
5
- VERSION = '0.2.2'
5
+ # Version corresponds to the minimal supported version of Go gnfinder
6
+ VERSION = '0.9.0'
6
7
 
7
8
  def self.version
8
9
  VERSION
data/lib/protob_pb.rb CHANGED
@@ -4,72 +4,83 @@
4
4
  require 'google/protobuf'
5
5
 
6
6
  Google::Protobuf::DescriptorPool.generated_pool.build do
7
- add_message "protob.Pong" do
8
- optional :value, :string, 1
9
- end
10
- add_message "protob.Void" do
11
- end
12
- add_message "protob.Params" do
13
- optional :text, :string, 1
14
- optional :with_bayes, :bool, 3
15
- optional :language, :string, 4
16
- optional :with_verification, :bool, 5
17
- repeated :sources, :int32, 6
18
- end
19
- add_message "protob.NameStrings" do
20
- optional :date, :string, 1
21
- optional :language, :string, 2
22
- optional :total_tokens, :int32, 3
23
- optional :total_candidates, :int32, 4
24
- optional :total_names, :int32, 5
25
- repeated :names, :message, 6, "protob.NameString"
26
- end
27
- add_message "protob.NameString" do
28
- optional :type, :string, 1
29
- optional :verbatim, :string, 2
30
- optional :name, :string, 3
31
- optional :odds, :float, 4
32
- optional :offset_start, :int32, 5
33
- optional :offset_end, :int32, 6
34
- optional :verification, :message, 7, "protob.Verification"
35
- end
36
- add_message "protob.Verification" do
37
- optional :best_result, :message, 1, "protob.ResultData"
38
- repeated :preferred_results, :message, 2, "protob.ResultData"
39
- optional :data_sources_num, :int32, 3
40
- optional :data_source_quality, :string, 4
41
- optional :retries, :int32, 5
42
- optional :error, :string, 6
43
- end
44
- add_message "protob.ResultData" do
45
- optional :data_source_id, :int32, 1
46
- optional :data_source_title, :string, 2
47
- optional :taxon_id, :string, 3
48
- optional :matched_name, :string, 4
49
- optional :matched_canonical, :string, 5
50
- optional :current_name, :string, 6
51
- optional :synonym, :bool, 7
52
- optional :classification_path, :string, 8
53
- optional :classification_rank, :string, 9
54
- optional :classification_ids, :string, 10
55
- optional :edit_distance, :int32, 11
56
- optional :stem_edit_distance, :int32, 12
57
- optional :match_type, :enum, 13, "protob.MatchType"
58
- end
59
- add_enum "protob.MatchType" do
60
- value :NONE, 0
61
- value :EXACT, 1
62
- value :FUZZY, 2
63
- value :PARTIAL_EXACT, 3
64
- value :PARTIAL_FUZZY, 4
7
+ add_file("protob.proto", :syntax => :proto3) do
8
+ add_message "protob.Void" do
9
+ end
10
+ add_message "protob.Pong" do
11
+ optional :value, :string, 1
12
+ end
13
+ add_message "protob.Version" do
14
+ optional :version, :string, 1
15
+ optional :build, :string, 2
16
+ end
17
+ add_message "protob.Params" do
18
+ optional :text, :string, 1
19
+ optional :no_bayes, :bool, 3
20
+ optional :language, :string, 4
21
+ optional :detect_language, :bool, 5
22
+ optional :verification, :bool, 6
23
+ repeated :sources, :int32, 7
24
+ end
25
+ add_message "protob.Output" do
26
+ optional :date, :string, 1
27
+ optional :finder_version, :string, 2
28
+ optional :language, :string, 3
29
+ optional :language_detected, :string, 4
30
+ optional :detect_language, :bool, 5
31
+ optional :total_tokens, :int32, 6
32
+ optional :total_candidates, :int32, 7
33
+ optional :total_names, :int32, 8
34
+ repeated :names, :message, 9, "protob.NameString"
35
+ end
36
+ add_message "protob.NameString" do
37
+ optional :type, :string, 1
38
+ optional :verbatim, :string, 2
39
+ optional :name, :string, 3
40
+ optional :odds, :float, 4
41
+ optional :offset_start, :int32, 5
42
+ optional :offset_end, :int32, 6
43
+ optional :verification, :message, 7, "protob.Verification"
44
+ end
45
+ add_message "protob.Verification" do
46
+ optional :best_result, :message, 1, "protob.ResultData"
47
+ repeated :preferred_results, :message, 2, "protob.ResultData"
48
+ optional :data_sources_num, :int32, 3
49
+ optional :data_source_quality, :string, 4
50
+ optional :retries, :int32, 5
51
+ optional :error, :string, 6
52
+ end
53
+ add_message "protob.ResultData" do
54
+ optional :data_source_id, :int32, 1
55
+ optional :data_source_title, :string, 2
56
+ optional :taxon_id, :string, 3
57
+ optional :matched_name, :string, 4
58
+ optional :matched_canonical, :string, 5
59
+ optional :current_name, :string, 6
60
+ optional :synonym, :bool, 7
61
+ optional :classification_path, :string, 8
62
+ optional :classification_rank, :string, 9
63
+ optional :classification_ids, :string, 10
64
+ optional :edit_distance, :int32, 11
65
+ optional :stem_edit_distance, :int32, 12
66
+ optional :match_type, :enum, 13, "protob.MatchType"
67
+ end
68
+ add_enum "protob.MatchType" do
69
+ value :NONE, 0
70
+ value :EXACT, 1
71
+ value :FUZZY, 2
72
+ value :PARTIAL_EXACT, 3
73
+ value :PARTIAL_FUZZY, 4
74
+ end
65
75
  end
66
76
  end
67
77
 
68
78
  module Protob
69
- Pong = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Pong").msgclass
70
79
  Void = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Void").msgclass
80
+ Pong = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Pong").msgclass
81
+ Version = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Version").msgclass
71
82
  Params = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Params").msgclass
72
- NameStrings = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.NameStrings").msgclass
83
+ Output = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Output").msgclass
73
84
  NameString = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.NameString").msgclass
74
85
  Verification = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.Verification").msgclass
75
86
  ResultData = Google::Protobuf::DescriptorPool.generated_pool.lookup("protob.ResultData").msgclass
@@ -15,7 +15,8 @@ module Protob
15
15
  self.service_name = 'protob.GNFinder'
16
16
 
17
17
  rpc :Ping, Void, Pong
18
- rpc :FindNames, Params, NameStrings
18
+ rpc :Ver, Void, Version
19
+ rpc :FindNames, Params, Output
19
20
  end
20
21
 
21
22
  Stub = Service.rpc_stub_class
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gnfinder
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.2
4
+ version: 0.9.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dmitry Mozzherin
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-04-23 00:00:00.000000000 Z
11
+ date: 2019-10-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -142,15 +142,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
142
142
  requirements:
143
143
  - - "~>"
144
144
  - !ruby/object:Gem::Version
145
- version: '2.5'
145
+ version: '2.6'
146
146
  required_rubygems_version: !ruby/object:Gem::Requirement
147
147
  requirements:
148
148
  - - ">="
149
149
  - !ruby/object:Gem::Version
150
150
  version: '0'
151
151
  requirements: []
152
- rubyforge_project:
153
- rubygems_version: 2.7.6
152
+ rubygems_version: 3.0.3
154
153
  signing_key:
155
154
  specification_version: 4
156
155
  summary: Scientific names finder