gnparser 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ca997f4abf937af1e72a922b235560328ce24cab715212f131cfad1faae0180d
4
- data.tar.gz: c19783e6482bdf6f100355aaa1111a32363b88fc34f5012a56b99349c1f847ea
3
+ metadata.gz: 874326057814cbb368cc59f78df30e16faae8ba69dfe86140686597e779cb6b0
4
+ data.tar.gz: 904391d4ef7e7d5c48c58df05a343823fff0841b96edb82ae36a5fd7135e7df6
5
5
  SHA512:
6
- metadata.gz: 73c4aa417d996d80976afc931e8b6c67699cb388f61e34f5a1e872591a87ef6ebee76ac932fe5d70fe6cc4fee3f308a09be05dba34c86c051a80932bd7811f79
7
- data.tar.gz: ab8df5a0a97c90bd7481d9a3e858e3fc2ff2ced99a597f5d1fe103f2111340d22fff206dcb508a4d0bdba5efbd818ad4f8fe092da5178861fa8cd9ea4bf49754
6
+ metadata.gz: 513947853fec9b99cccc50819e1b7b4a5f6ea1558c6307e9d7d91e862a86f1fe4d26066ef1c2d9d20b6b25dd08629573c7f37e461b7b44014ae4b74361b41029
7
+ data.tar.gz: 1747254684ba4f37eda45db9b4ac4992e603ef98f63b10c231e06b1deab845892ce2d1e4694621982f0ae42929312cb0fcb0f1bba9d0b9469a6ade8b32c29eaf
data/.gitignore CHANGED
@@ -1,7 +1,9 @@
1
1
  *.sw?
2
2
  .DS_Store
3
3
  coverage
4
+ .gem
4
5
  rdoc
6
+ *.gem
5
7
  pkg
6
8
  *.swp
7
9
  *.swo
@@ -16,3 +18,4 @@ bin
16
18
  .bundle
17
19
  bundle_bin
18
20
  Gemfile.lock
21
+ .byebug.history
data/README.md CHANGED
@@ -53,93 +53,58 @@ puts ver.value
53
53
  puts ver.build_time.size
54
54
  ```
55
55
 
56
- ### Output formats
56
+ ### Parsing name-strings
57
57
 
58
- gem supports the following formats:
58
+ Refer to [gnparser.proto] file to learn available fields.
59
59
 
60
- **compact**
61
- : JSON output in one line. This is the default format.
60
+ All parsing methods take the following options:
62
61
 
63
- **pretty**
64
- : ``Pretty`` JSON output.
62
+ ``skip_cleaning:``
63
+ : if `true` names-strings get stripped from HTML tags, if there are any. For example `<i>Homo sapiens</i> <b>L.</b>` becomes `Homo sapiens L.`. It is `false` by default.
65
64
 
66
- **simple**
67
- : Pipe-separated string where the fields are, id, verbatim name, canonical form,
68
- extended canonical form, authorship, year, quality of parsing.
65
+ ``jobs_number:`` allows to control how many jobs will run in parallel by gnparser. Note that gnparser gRPC server also has a max jobs number option. So if ``jobs_number`` is higher than gRPC MaxJobsNumber, it is ignored.
69
66
 
70
- **debug**
71
- : Abtract Syntax Tree of the parsed result.
72
-
73
- ### Option preserve_order
74
-
75
- To speed parsing up parser normally executes several jobs in parallel, and
76
- as a result the order the jobs go back may be different from input. It means
77
- the user has to match output's ``verbatim`` field with the list of its names.
78
-
79
- If speed of parsing is sufficient you can use one-threaded parsing that
80
- guarantees that the order of output will be exactly the same as the order of
81
- the input. For this purpose use ``preserve_order: true`` option.
82
-
83
- ### Parse one name
67
+ #### Parse one name
84
68
 
85
69
  ```ruby
86
70
  res = gnp.parse('Puma concolor L.')
87
- puts res.value
88
- puts res.error
71
+ puts res.canonical.simple
72
+ puts res.authorship.value
73
+ puts res.name_type
74
+ puts res.species
89
75
  ```
90
76
 
91
- For non-default format:
77
+ Skipping cleaning from HTML tags:
92
78
 
93
79
  ```ruby
94
- res = gnp.parse('Puma concolor (Linn.)', format: :pretty)
95
- ...
96
- res = gnp.parse('Puma concolor (Linn.)', format: 'pretty')
97
- ...
98
- res = gnp.parse('Puma concolor (Linn.)', format: 'simple')
99
- ...
100
- res = gnp.parse('Puma concolor (Linn.)', format: :simple)
80
+ res = gnp.parse('<i>Puma concolor</i> (Linn.)', skip_cleaning: true)
81
+ #parsed will be false
82
+ res.parsed
101
83
  ...
84
+ res = gnp.parse('<i>Puma concolor</i> (Linn.)')
85
+ #parsed will be true
86
+ res.parsed
87
+ res.normalized
102
88
  ```
103
89
 
104
- ### Parse an array of names
90
+ #### Parse an array of names
91
+
92
+ There is a limit of 10,000 name strings per batch. Results are always returned in the same order as input.
105
93
 
106
94
  ```ruby
107
95
  names = ['Plantago major L.', 'Homo sapiens Linn. 1758', 'Bubo bubo']
108
96
 
109
- # fast, might get output in different order from input
110
- res = gnp.parse_ary(names, format: :pretty)
97
+ # run using 3 jobs
98
+ res = gnp.parse_ary(names, jobs_number: 3)
111
99
  res.each do |r|
112
- puts r.value
113
- puts r.error
100
+ puts r.canonical.full
114
101
  end
115
102
 
116
- # slower, returns the same order for output as it was for input
117
- results = []
118
- res = gnp.parse_ary(names, format: :pretty, preserve_order: true)
119
- res.each_with_index |r, i|
120
- results << { input: names[i], output: r }
121
- end
122
- ```
123
-
124
- ### Parse names from a file
125
-
126
- File should have one name string per line.
127
-
128
- ```ruby
129
- path = File.join(__dir__, "path", "to", "names.txt")
130
- res = gnp.parse_file(path, format: :compact)
131
- res.each do |r|
132
- puts r.value
133
- puts r.error
134
- end
135
103
 
136
- # preserving order of items in output
137
- results = []
138
- res = gnp.parse_file(path, format: :compact, preserve_order: true)
139
- res.each_with_index |r, i|
140
- results << { input: names[i], output: r }
141
- end
104
+ # do not strip HTML tags (in this case there are none in the input)
105
+ res = gnp.parse_ary(names, skip_cleaning: true)
142
106
  ```
143
107
 
144
108
  [gnparser]: https://gitlab.com/gogna/gnparser
145
109
  [releases]: https://gitlab.com/gogna/gnparser/releases
110
+ [gnparser.proto]: https://gitlab.com/gogna/gnparser/blob/master/pb/gnparser.proto
data/Rakefile CHANGED
@@ -26,9 +26,9 @@ end
26
26
 
27
27
  task :grpc do
28
28
  cmd = 'grpc_tools_ruby_protoc ' \
29
- '-I $GOPATH/src/gitlab.com/gogna/gnparser/grpc ' \
29
+ '-I $GOPATH/src/gitlab.com/gogna/gnparser/pb ' \
30
30
  '--ruby_out=lib --grpc_out=lib ' \
31
- '$GOPATH/src/gitlab.com/gogna/gnparser/grpc/gnparser.proto'
31
+ '$GOPATH/src/gitlab.com/gogna/gnparser/pb/gnparser.proto'
32
32
  puts cmd
33
33
  `#{cmd}`
34
34
  end
data/gnparser.gemspec CHANGED
@@ -23,7 +23,7 @@ Gem::Specification.new do |gem|
23
23
  gem.required_ruby_version = '~> 2.5'
24
24
  gem.add_dependency 'grpc', '~> 1.15'
25
25
  gem.add_dependency 'grpc-tools', '~> 1.15'
26
- gem.add_development_dependency 'bundler', '~> 1.16'
26
+ gem.add_development_dependency 'bundler', '~> 2.0'
27
27
  gem.add_development_dependency 'byebug', '~> 10.0'
28
28
  gem.add_development_dependency 'rake', '~> 12.3'
29
29
  gem.add_development_dependency 'rspec', '~> 3.8'
@@ -1,19 +1,13 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module GNparser
4
- FORMATS = {
5
- compact: Grpc::Format::Compact,
6
- pretty: Grpc::Format::Pretty,
7
- simple: Grpc::Format::Simple,
8
- debug: Grpc::Format::Debug
9
- }.freeze
10
4
  # Gnfinder::Client connects to gnfinder server
11
5
  class Client
12
- PARSER_MIN_VERSION = 'v0.6.0'
6
+ PARSER_MIN_VERSION = 'v0.9.0'
13
7
 
14
8
  def initialize(host = '0.0.0.0', port = '8778')
15
- @stub = Grpc::GNparser::Stub.new("#{host}:#{port}",
16
- :this_channel_is_insecure)
9
+ @stub = Pb::GNparser::Stub.new("#{host}:#{port}",
10
+ :this_channel_is_insecure)
17
11
  return if parser_version.value >= PARSER_MIN_VERSION
18
12
 
19
13
  raise 'gRPC server of gnparser should be at least ' \
@@ -21,73 +15,30 @@ module GNparser
21
15
  'https://gitlab/gogna/gnparser/releases.'
22
16
  end
23
17
 
18
+ # parser_version retrieves the version of gnparser used by gRPC service.
24
19
  def parser_version
25
- @stub.ver(Grpc::Void.new)
20
+ @stub.ver(Pb::Void.new)
26
21
  end
27
22
 
23
+ # parse parses one name at a time and returns protocol buffer object with
24
+ # the results. If the name-string contains HTML tags they will be stripped
25
+ # by gRPC method.
28
26
  def parse(name, opts = {})
29
- parse_iter([name], opts).next
30
- end
31
-
32
- def parse_ary(ary, opts = {})
33
- parse_iter(ary, opts)
34
- end
35
-
36
- def parse_file(path, opts = {})
37
- f = File.open(path)
38
- parse_iter(f, opts)
39
- end
40
-
41
- private
42
-
43
- def parse_iter(iter, opts)
44
- format = opts[:format] || :compact
45
- preserve_order = opts[:preserve_order]
46
- enum = InputEnum.new(iter, format)
47
- if preserve_order
48
- @stub.parse_in_order(enum.each_item)
49
- else
50
- @stub.parse(enum.each_item)
51
- end
52
- end
53
- end
54
-
55
- # InputEnum yields names one after another
56
- class InputEnum
57
- def initialize(iter, format = nil)
58
- @iter = iter
59
- @format = format
60
- end
61
-
62
- def each_item
63
- return enum_for(:each_item) unless block_given?
64
-
65
- yield input_format if @format
66
-
67
- @iter.each do |l|
68
- input = Grpc::Input.new
69
- input.name = l.strip
70
- yield input
71
- end
72
- end
73
-
74
- private
75
-
76
- def input_format
77
- frm = find_format
78
- @format = nil
79
- input = Grpc::Input.new
80
- input.format = frm
81
- input
82
- end
83
-
84
- def find_format
85
- format = @format.to_sym
86
- if FORMATS.key?(format)
87
- FORMATS[format]
88
- else
89
- FORMATS[:compact]
90
- end
27
+ parse_ary([name], opts)[0]
28
+ end
29
+
30
+ # parse_ary parses an array of name-strings and returns an array of
31
+ # protocol buffer objects with the results. The array preserves the same
32
+ # order of elements. If the name-string contains HTML tags they will be
33
+ # stripped internally by the gnparser.
34
+ def parse_ary(names, opts = {})
35
+ input = Pb::InputArray.new
36
+ input.names += names
37
+ input.skip_cleaning = true if opts[:skip_cleaning]
38
+ jobs = opts[:jobs_number].to_i
39
+ input.jobs_number = jobs if jobs.positive?
40
+ res = @stub.parse_array(input)
41
+ res.output
91
42
  end
92
43
  end
93
44
  end
@@ -2,7 +2,7 @@
2
2
 
3
3
  # GNparser is a namespace module for gnparser gem.
4
4
  module GNparser
5
- VERSION = '0.2.0'
5
+ VERSION = '0.4.0'
6
6
 
7
7
  def self.version
8
8
  VERSION
data/lib/gnparser_pb.rb CHANGED
@@ -4,34 +4,144 @@
4
4
  require 'google/protobuf'
5
5
 
6
6
  Google::Protobuf::DescriptorPool.generated_pool.build do
7
- add_message "grpc.Version" do
7
+ add_message "pb.Version" do
8
8
  optional :value, :string, 1
9
9
  optional :build_time, :string, 2
10
10
  end
11
- add_message "grpc.Void" do
11
+ add_message "pb.Void" do
12
12
  end
13
- add_message "grpc.Input" do
14
- oneof :content do
15
- optional :format, :enum, 1, "grpc.Format"
16
- optional :name, :string, 2
13
+ add_message "pb.InputArray" do
14
+ optional :jobs_number, :int32, 1
15
+ optional :skip_cleaning, :bool, 2
16
+ repeated :names, :string, 3
17
+ end
18
+ add_message "pb.OutputArray" do
19
+ repeated :output, :message, 1, "pb.Parsed"
20
+ end
21
+ add_message "pb.Parsed" do
22
+ optional :parsed, :bool, 1
23
+ optional :quality, :int32, 2
24
+ repeated :quality_warning, :message, 3, "pb.QualityWarning"
25
+ optional :verbatim, :string, 4
26
+ optional :normalized, :string, 5
27
+ optional :canonical, :message, 6, "pb.Canonical"
28
+ optional :authorship, :message, 7, "pb.Authorship"
29
+ repeated :positions, :message, 8, "pb.Position"
30
+ optional :hybrid, :bool, 9
31
+ optional :bacteria, :bool, 10
32
+ optional :tail, :string, 11
33
+ optional :id, :string, 12
34
+ optional :parser_version, :string, 13
35
+ optional :cardinality, :int32, 14
36
+ optional :name_type, :enum, 15, "pb.NameType"
37
+ repeated :details_hybrid_formula, :message, 20, "pb.HybridFormula"
38
+ oneof :details do
39
+ optional :uninomial, :message, 16, "pb.Uninomial"
40
+ optional :species, :message, 17, "pb.Species"
41
+ optional :comparison, :message, 18, "pb.Comparison"
42
+ optional :approximation, :message, 19, "pb.Approximation"
17
43
  end
18
44
  end
19
- add_message "grpc.Output" do
20
- optional :value, :string, 2
21
- optional :error, :string, 3
45
+ add_message "pb.HybridFormula" do
46
+ oneof :element do
47
+ optional :uninomial, :message, 1, "pb.Uninomial"
48
+ optional :species, :message, 2, "pb.Species"
49
+ optional :comparison, :message, 3, "pb.Comparison"
50
+ optional :approximation, :message, 4, "pb.Approximation"
51
+ end
52
+ end
53
+ add_message "pb.Canonical" do
54
+ optional :simple, :string, 1
55
+ optional :full, :string, 2
56
+ end
57
+ add_message "pb.Position" do
58
+ optional :type, :string, 1
59
+ optional :start, :int32, 2
60
+ optional :end, :int32, 3
61
+ end
62
+ add_message "pb.QualityWarning" do
63
+ optional :quality, :int32, 1
64
+ optional :message, :string, 2
65
+ end
66
+ add_message "pb.Uninomial" do
67
+ optional :value, :string, 1
68
+ optional :rank, :string, 2
69
+ optional :parent, :string, 3
70
+ optional :authorship, :message, 4, "pb.Authorship"
71
+ end
72
+ add_message "pb.Species" do
73
+ optional :genus, :string, 1
74
+ optional :sub_genus, :string, 2
75
+ optional :species, :string, 3
76
+ optional :species_authorship, :message, 4, "pb.Authorship"
77
+ repeated :infra_species, :message, 5, "pb.InfraSpecies"
78
+ end
79
+ add_message "pb.InfraSpecies" do
80
+ optional :value, :string, 1
81
+ optional :rank, :string, 2
82
+ optional :authorship, :message, 3, "pb.Authorship"
83
+ end
84
+ add_message "pb.Comparison" do
85
+ optional :genus, :string, 1
86
+ optional :species, :string, 2
87
+ optional :species_authorship, :message, 3, "pb.Authorship"
88
+ optional :comparison, :string, 4
89
+ end
90
+ add_message "pb.Approximation" do
91
+ optional :genus, :string, 1
92
+ optional :species, :string, 2
93
+ optional :species_authorship, :message, 3, "pb.Authorship"
94
+ optional :approximation, :string, 4
95
+ optional :ignored, :string, 5
96
+ end
97
+ add_message "pb.Authorship" do
98
+ optional :value, :string, 1
99
+ repeated :all_authors, :string, 2
100
+ optional :original, :message, 3, "pb.AuthGroup"
101
+ optional :combination, :message, 4, "pb.AuthGroup"
102
+ end
103
+ add_message "pb.AuthGroup" do
104
+ repeated :authors, :string, 1
105
+ optional :year, :string, 2
106
+ optional :approximate_year, :bool, 3
107
+ optional :ex_authors, :message, 4, "pb.Authors"
108
+ optional :emend_authors, :message, 5, "pb.Authors"
109
+ end
110
+ add_message "pb.Authors" do
111
+ repeated :authors, :string, 1
112
+ optional :year, :string, 2
113
+ optional :approximate_year, :bool, 3
22
114
  end
23
- add_enum "grpc.Format" do
24
- value :Compact, 0
25
- value :Pretty, 1
26
- value :Simple, 2
27
- value :Debug, 3
115
+ add_enum "pb.NameType" do
116
+ value :NONE, 0
117
+ value :UNINOMIAL, 1
118
+ value :SPECIES, 2
119
+ value :COMPARISON, 3
120
+ value :APPROX_SURROGATE, 4
121
+ value :SURROGATE, 5
122
+ value :NAMED_HYBRID, 6
123
+ value :HYBRID_FORMULA, 7
124
+ value :VIRUS, 8
28
125
  end
29
126
  end
30
127
 
31
- module Grpc
32
- Version = Google::Protobuf::DescriptorPool.generated_pool.lookup("grpc.Version").msgclass
33
- Void = Google::Protobuf::DescriptorPool.generated_pool.lookup("grpc.Void").msgclass
34
- Input = Google::Protobuf::DescriptorPool.generated_pool.lookup("grpc.Input").msgclass
35
- Output = Google::Protobuf::DescriptorPool.generated_pool.lookup("grpc.Output").msgclass
36
- Format = Google::Protobuf::DescriptorPool.generated_pool.lookup("grpc.Format").enummodule
128
+ module Pb
129
+ Version = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Version").msgclass
130
+ Void = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Void").msgclass
131
+ InputArray = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.InputArray").msgclass
132
+ OutputArray = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.OutputArray").msgclass
133
+ Parsed = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Parsed").msgclass
134
+ HybridFormula = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.HybridFormula").msgclass
135
+ Canonical = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Canonical").msgclass
136
+ Position = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Position").msgclass
137
+ QualityWarning = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.QualityWarning").msgclass
138
+ Uninomial = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Uninomial").msgclass
139
+ Species = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Species").msgclass
140
+ InfraSpecies = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.InfraSpecies").msgclass
141
+ Comparison = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Comparison").msgclass
142
+ Approximation = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Approximation").msgclass
143
+ Authorship = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Authorship").msgclass
144
+ AuthGroup = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.AuthGroup").msgclass
145
+ Authors = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Authors").msgclass
146
+ NameType = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.NameType").enummodule
37
147
  end
@@ -1,10 +1,10 @@
1
1
  # Generated by the protocol buffer compiler. DO NOT EDIT!
2
- # Source: gnparser.proto for package 'grpc'
2
+ # Source: gnparser.proto for package 'pb'
3
3
 
4
4
  require 'grpc'
5
5
  require 'gnparser_pb'
6
6
 
7
- module Grpc
7
+ module Pb
8
8
  module GNparser
9
9
  class Service
10
10
 
@@ -12,11 +12,14 @@ module Grpc
12
12
 
13
13
  self.marshal_class_method = :encode
14
14
  self.unmarshal_class_method = :decode
15
- self.service_name = 'grpc.GNparser'
15
+ self.service_name = 'pb.GNparser'
16
16
 
17
+ # Ver takes an empty argument (Void) and returns description of the gnparser
18
+ # version and build date and time.
17
19
  rpc :Ver, Void, Version
18
- rpc :Parse, stream(Input), stream(Output)
19
- rpc :ParseInOrder, stream(Input), stream(Output)
20
+ # ParseArray takes a list of name-strings (up to 10000), and retuns back
21
+ # a list of parsed results, preserving the order of input.
22
+ rpc :ParseArray, InputArray, OutputArray
20
23
  end
21
24
 
22
25
  Stub = Service.rpc_stub_class
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gnparser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dmitry Mozzherin
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-01-17 00:00:00.000000000 Z
11
+ date: 2019-08-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: grpc
@@ -44,14 +44,14 @@ dependencies:
44
44
  requirements:
45
45
  - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: '1.16'
47
+ version: '2.0'
48
48
  type: :development
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
52
  - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: '1.16'
54
+ version: '2.0'
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: byebug
57
57
  requirement: !ruby/object:Gem::Requirement
@@ -116,14 +116,12 @@ executables: []
116
116
  extensions: []
117
117
  extra_rdoc_files: []
118
118
  files:
119
- - ".byebug_history"
120
119
  - ".gitignore"
121
120
  - ".rspec"
122
121
  - ".rubocop.yml"
123
122
  - ".vscode/settings.json"
124
123
  - CHANGELOG.md
125
124
  - Gemfile
126
- - Gemfile.lock
127
125
  - LICENSE
128
126
  - README.md
129
127
  - Rakefile
data/.byebug_history DELETED
@@ -1,16 +0,0 @@
1
- q
2
- res.next
3
- q
4
- res.next
5
- res
6
- q
7
- res.each {|r| puts r}
8
- res.next
9
- p res
10
- q
11
- res.next
12
- res
13
- c
14
- q
15
- puts res.next
16
- res