gnparser 0.2.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ca997f4abf937af1e72a922b235560328ce24cab715212f131cfad1faae0180d
4
- data.tar.gz: c19783e6482bdf6f100355aaa1111a32363b88fc34f5012a56b99349c1f847ea
3
+ metadata.gz: 874326057814cbb368cc59f78df30e16faae8ba69dfe86140686597e779cb6b0
4
+ data.tar.gz: 904391d4ef7e7d5c48c58df05a343823fff0841b96edb82ae36a5fd7135e7df6
5
5
  SHA512:
6
- metadata.gz: 73c4aa417d996d80976afc931e8b6c67699cb388f61e34f5a1e872591a87ef6ebee76ac932fe5d70fe6cc4fee3f308a09be05dba34c86c051a80932bd7811f79
7
- data.tar.gz: ab8df5a0a97c90bd7481d9a3e858e3fc2ff2ced99a597f5d1fe103f2111340d22fff206dcb508a4d0bdba5efbd818ad4f8fe092da5178861fa8cd9ea4bf49754
6
+ metadata.gz: 513947853fec9b99cccc50819e1b7b4a5f6ea1558c6307e9d7d91e862a86f1fe4d26066ef1c2d9d20b6b25dd08629573c7f37e461b7b44014ae4b74361b41029
7
+ data.tar.gz: 1747254684ba4f37eda45db9b4ac4992e603ef98f63b10c231e06b1deab845892ce2d1e4694621982f0ae42929312cb0fcb0f1bba9d0b9469a6ade8b32c29eaf
data/.gitignore CHANGED
@@ -1,7 +1,9 @@
1
1
  *.sw?
2
2
  .DS_Store
3
3
  coverage
4
+ .gem
4
5
  rdoc
6
+ *.gem
5
7
  pkg
6
8
  *.swp
7
9
  *.swo
@@ -16,3 +18,4 @@ bin
16
18
  .bundle
17
19
  bundle_bin
18
20
  Gemfile.lock
21
+ .byebug.history
data/README.md CHANGED
@@ -53,93 +53,58 @@ puts ver.value
53
53
  puts ver.build_time.size
54
54
  ```
55
55
 
56
- ### Output formats
56
+ ### Parsing name-strings
57
57
 
58
- gem supports the following formats:
58
+ Refer to [gnparser.proto] file to learn available fields.
59
59
 
60
- **compact**
61
- : JSON output in one line. This is the default format.
60
+ All parsing methods take the following options:
62
61
 
63
- **pretty**
64
- : ``Pretty`` JSON output.
62
+ ``skip_cleaning:``
63
+ : if `true` names-strings get stripped from HTML tags, if there are any. For example `<i>Homo sapiens</i> <b>L.</b>` becomes `Homo sapiens L.`. It is `false` by default.
65
64
 
66
- **simple**
67
- : Pipe-separated string where the fields are, id, verbatim name, canonical form,
68
- extended canonical form, authorship, year, quality of parsing.
65
+ ``jobs_number:`` allows to control how many jobs will run in parallel by gnparser. Note that gnparser gRPC server also has a max jobs number option. So if ``jobs_number`` is higher than gRPC MaxJobsNumber, it is ignored.
69
66
 
70
- **debug**
71
- : Abtract Syntax Tree of the parsed result.
72
-
73
- ### Option preserve_order
74
-
75
- To speed parsing up parser normally executes several jobs in parallel, and
76
- as a result the order the jobs go back may be different from input. It means
77
- the user has to match output's ``verbatim`` field with the list of its names.
78
-
79
- If speed of parsing is sufficient you can use one-threaded parsing that
80
- guarantees that the order of output will be exactly the same as the order of
81
- the input. For this purpose use ``preserve_order: true`` option.
82
-
83
- ### Parse one name
67
+ #### Parse one name
84
68
 
85
69
  ```ruby
86
70
  res = gnp.parse('Puma concolor L.')
87
- puts res.value
88
- puts res.error
71
+ puts res.canonical.simple
72
+ puts res.authorship.value
73
+ puts res.name_type
74
+ puts res.species
89
75
  ```
90
76
 
91
- For non-default format:
77
+ Skipping cleaning from HTML tags:
92
78
 
93
79
  ```ruby
94
- res = gnp.parse('Puma concolor (Linn.)', format: :pretty)
95
- ...
96
- res = gnp.parse('Puma concolor (Linn.)', format: 'pretty')
97
- ...
98
- res = gnp.parse('Puma concolor (Linn.)', format: 'simple')
99
- ...
100
- res = gnp.parse('Puma concolor (Linn.)', format: :simple)
80
+ res = gnp.parse('<i>Puma concolor</i> (Linn.)', skip_cleaning: true)
81
+ #parsed will be false
82
+ res.parsed
101
83
  ...
84
+ res = gnp.parse('<i>Puma concolor</i> (Linn.)')
85
+ #parsed will be true
86
+ res.parsed
87
+ res.normalized
102
88
  ```
103
89
 
104
- ### Parse an array of names
90
+ #### Parse an array of names
91
+
92
+ There is a limit of 10,000 name strings per batch. Results are always returned in the same order as input.
105
93
 
106
94
  ```ruby
107
95
  names = ['Plantago major L.', 'Homo sapiens Linn. 1758', 'Bubo bubo']
108
96
 
109
- # fast, might get output in different order from input
110
- res = gnp.parse_ary(names, format: :pretty)
97
+ # run using 3 jobs
98
+ res = gnp.parse_ary(names, jobs_number: 3)
111
99
  res.each do |r|
112
- puts r.value
113
- puts r.error
100
+ puts r.canonical.full
114
101
  end
115
102
 
116
- # slower, returns the same order for output as it was for input
117
- results = []
118
- res = gnp.parse_ary(names, format: :pretty, preserve_order: true)
119
- res.each_with_index |r, i|
120
- results << { input: names[i], output: r }
121
- end
122
- ```
123
-
124
- ### Parse names from a file
125
-
126
- File should have one name string per line.
127
-
128
- ```ruby
129
- path = File.join(__dir__, "path", "to", "names.txt")
130
- res = gnp.parse_file(path, format: :compact)
131
- res.each do |r|
132
- puts r.value
133
- puts r.error
134
- end
135
103
 
136
- # preserving order of items in output
137
- results = []
138
- res = gnp.parse_file(path, format: :compact, preserve_order: true)
139
- res.each_with_index |r, i|
140
- results << { input: names[i], output: r }
141
- end
104
+ # do not strip HTML tags (in this case there are none in the input)
105
+ res = gnp.parse_ary(names, skip_cleaning: true)
142
106
  ```
143
107
 
144
108
  [gnparser]: https://gitlab.com/gogna/gnparser
145
109
  [releases]: https://gitlab.com/gogna/gnparser/releases
110
+ [gnparser.proto]: https://gitlab.com/gogna/gnparser/blob/master/pb/gnparser.proto
data/Rakefile CHANGED
@@ -26,9 +26,9 @@ end
26
26
 
27
27
  task :grpc do
28
28
  cmd = 'grpc_tools_ruby_protoc ' \
29
- '-I $GOPATH/src/gitlab.com/gogna/gnparser/grpc ' \
29
+ '-I $GOPATH/src/gitlab.com/gogna/gnparser/pb ' \
30
30
  '--ruby_out=lib --grpc_out=lib ' \
31
- '$GOPATH/src/gitlab.com/gogna/gnparser/grpc/gnparser.proto'
31
+ '$GOPATH/src/gitlab.com/gogna/gnparser/pb/gnparser.proto'
32
32
  puts cmd
33
33
  `#{cmd}`
34
34
  end
data/gnparser.gemspec CHANGED
@@ -23,7 +23,7 @@ Gem::Specification.new do |gem|
23
23
  gem.required_ruby_version = '~> 2.5'
24
24
  gem.add_dependency 'grpc', '~> 1.15'
25
25
  gem.add_dependency 'grpc-tools', '~> 1.15'
26
- gem.add_development_dependency 'bundler', '~> 1.16'
26
+ gem.add_development_dependency 'bundler', '~> 2.0'
27
27
  gem.add_development_dependency 'byebug', '~> 10.0'
28
28
  gem.add_development_dependency 'rake', '~> 12.3'
29
29
  gem.add_development_dependency 'rspec', '~> 3.8'
@@ -1,19 +1,13 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module GNparser
4
- FORMATS = {
5
- compact: Grpc::Format::Compact,
6
- pretty: Grpc::Format::Pretty,
7
- simple: Grpc::Format::Simple,
8
- debug: Grpc::Format::Debug
9
- }.freeze
10
4
  # Gnfinder::Client connects to gnfinder server
11
5
  class Client
12
- PARSER_MIN_VERSION = 'v0.6.0'
6
+ PARSER_MIN_VERSION = 'v0.9.0'
13
7
 
14
8
  def initialize(host = '0.0.0.0', port = '8778')
15
- @stub = Grpc::GNparser::Stub.new("#{host}:#{port}",
16
- :this_channel_is_insecure)
9
+ @stub = Pb::GNparser::Stub.new("#{host}:#{port}",
10
+ :this_channel_is_insecure)
17
11
  return if parser_version.value >= PARSER_MIN_VERSION
18
12
 
19
13
  raise 'gRPC server of gnparser should be at least ' \
@@ -21,73 +15,30 @@ module GNparser
21
15
  'https://gitlab/gogna/gnparser/releases.'
22
16
  end
23
17
 
18
+ # parser_version retrieves the version of gnparser used by gRPC service.
24
19
  def parser_version
25
- @stub.ver(Grpc::Void.new)
20
+ @stub.ver(Pb::Void.new)
26
21
  end
27
22
 
23
+ # parse parses one name at a time and returns protocol buffer object with
24
+ # the results. If the name-string contains HTML tags they will be stripped
25
+ # by gRPC method.
28
26
  def parse(name, opts = {})
29
- parse_iter([name], opts).next
30
- end
31
-
32
- def parse_ary(ary, opts = {})
33
- parse_iter(ary, opts)
34
- end
35
-
36
- def parse_file(path, opts = {})
37
- f = File.open(path)
38
- parse_iter(f, opts)
39
- end
40
-
41
- private
42
-
43
- def parse_iter(iter, opts)
44
- format = opts[:format] || :compact
45
- preserve_order = opts[:preserve_order]
46
- enum = InputEnum.new(iter, format)
47
- if preserve_order
48
- @stub.parse_in_order(enum.each_item)
49
- else
50
- @stub.parse(enum.each_item)
51
- end
52
- end
53
- end
54
-
55
- # InputEnum yields names one after another
56
- class InputEnum
57
- def initialize(iter, format = nil)
58
- @iter = iter
59
- @format = format
60
- end
61
-
62
- def each_item
63
- return enum_for(:each_item) unless block_given?
64
-
65
- yield input_format if @format
66
-
67
- @iter.each do |l|
68
- input = Grpc::Input.new
69
- input.name = l.strip
70
- yield input
71
- end
72
- end
73
-
74
- private
75
-
76
- def input_format
77
- frm = find_format
78
- @format = nil
79
- input = Grpc::Input.new
80
- input.format = frm
81
- input
82
- end
83
-
84
- def find_format
85
- format = @format.to_sym
86
- if FORMATS.key?(format)
87
- FORMATS[format]
88
- else
89
- FORMATS[:compact]
90
- end
27
+ parse_ary([name], opts)[0]
28
+ end
29
+
30
+ # parse_ary parses an array of name-strings and returns an array of
31
+ # protocol buffer objects with the results. The array preserves the same
32
+ # order of elements. If the name-string contains HTML tags they will be
33
+ # stripped internally by the gnparser.
34
+ def parse_ary(names, opts = {})
35
+ input = Pb::InputArray.new
36
+ input.names += names
37
+ input.skip_cleaning = true if opts[:skip_cleaning]
38
+ jobs = opts[:jobs_number].to_i
39
+ input.jobs_number = jobs if jobs.positive?
40
+ res = @stub.parse_array(input)
41
+ res.output
91
42
  end
92
43
  end
93
44
  end
@@ -2,7 +2,7 @@
2
2
 
3
3
  # GNparser is a namespace module for gnparser gem.
4
4
  module GNparser
5
- VERSION = '0.2.0'
5
+ VERSION = '0.4.0'
6
6
 
7
7
  def self.version
8
8
  VERSION
data/lib/gnparser_pb.rb CHANGED
@@ -4,34 +4,144 @@
4
4
  require 'google/protobuf'
5
5
 
6
6
  Google::Protobuf::DescriptorPool.generated_pool.build do
7
- add_message "grpc.Version" do
7
+ add_message "pb.Version" do
8
8
  optional :value, :string, 1
9
9
  optional :build_time, :string, 2
10
10
  end
11
- add_message "grpc.Void" do
11
+ add_message "pb.Void" do
12
12
  end
13
- add_message "grpc.Input" do
14
- oneof :content do
15
- optional :format, :enum, 1, "grpc.Format"
16
- optional :name, :string, 2
13
+ add_message "pb.InputArray" do
14
+ optional :jobs_number, :int32, 1
15
+ optional :skip_cleaning, :bool, 2
16
+ repeated :names, :string, 3
17
+ end
18
+ add_message "pb.OutputArray" do
19
+ repeated :output, :message, 1, "pb.Parsed"
20
+ end
21
+ add_message "pb.Parsed" do
22
+ optional :parsed, :bool, 1
23
+ optional :quality, :int32, 2
24
+ repeated :quality_warning, :message, 3, "pb.QualityWarning"
25
+ optional :verbatim, :string, 4
26
+ optional :normalized, :string, 5
27
+ optional :canonical, :message, 6, "pb.Canonical"
28
+ optional :authorship, :message, 7, "pb.Authorship"
29
+ repeated :positions, :message, 8, "pb.Position"
30
+ optional :hybrid, :bool, 9
31
+ optional :bacteria, :bool, 10
32
+ optional :tail, :string, 11
33
+ optional :id, :string, 12
34
+ optional :parser_version, :string, 13
35
+ optional :cardinality, :int32, 14
36
+ optional :name_type, :enum, 15, "pb.NameType"
37
+ repeated :details_hybrid_formula, :message, 20, "pb.HybridFormula"
38
+ oneof :details do
39
+ optional :uninomial, :message, 16, "pb.Uninomial"
40
+ optional :species, :message, 17, "pb.Species"
41
+ optional :comparison, :message, 18, "pb.Comparison"
42
+ optional :approximation, :message, 19, "pb.Approximation"
17
43
  end
18
44
  end
19
- add_message "grpc.Output" do
20
- optional :value, :string, 2
21
- optional :error, :string, 3
45
+ add_message "pb.HybridFormula" do
46
+ oneof :element do
47
+ optional :uninomial, :message, 1, "pb.Uninomial"
48
+ optional :species, :message, 2, "pb.Species"
49
+ optional :comparison, :message, 3, "pb.Comparison"
50
+ optional :approximation, :message, 4, "pb.Approximation"
51
+ end
52
+ end
53
+ add_message "pb.Canonical" do
54
+ optional :simple, :string, 1
55
+ optional :full, :string, 2
56
+ end
57
+ add_message "pb.Position" do
58
+ optional :type, :string, 1
59
+ optional :start, :int32, 2
60
+ optional :end, :int32, 3
61
+ end
62
+ add_message "pb.QualityWarning" do
63
+ optional :quality, :int32, 1
64
+ optional :message, :string, 2
65
+ end
66
+ add_message "pb.Uninomial" do
67
+ optional :value, :string, 1
68
+ optional :rank, :string, 2
69
+ optional :parent, :string, 3
70
+ optional :authorship, :message, 4, "pb.Authorship"
71
+ end
72
+ add_message "pb.Species" do
73
+ optional :genus, :string, 1
74
+ optional :sub_genus, :string, 2
75
+ optional :species, :string, 3
76
+ optional :species_authorship, :message, 4, "pb.Authorship"
77
+ repeated :infra_species, :message, 5, "pb.InfraSpecies"
78
+ end
79
+ add_message "pb.InfraSpecies" do
80
+ optional :value, :string, 1
81
+ optional :rank, :string, 2
82
+ optional :authorship, :message, 3, "pb.Authorship"
83
+ end
84
+ add_message "pb.Comparison" do
85
+ optional :genus, :string, 1
86
+ optional :species, :string, 2
87
+ optional :species_authorship, :message, 3, "pb.Authorship"
88
+ optional :comparison, :string, 4
89
+ end
90
+ add_message "pb.Approximation" do
91
+ optional :genus, :string, 1
92
+ optional :species, :string, 2
93
+ optional :species_authorship, :message, 3, "pb.Authorship"
94
+ optional :approximation, :string, 4
95
+ optional :ignored, :string, 5
96
+ end
97
+ add_message "pb.Authorship" do
98
+ optional :value, :string, 1
99
+ repeated :all_authors, :string, 2
100
+ optional :original, :message, 3, "pb.AuthGroup"
101
+ optional :combination, :message, 4, "pb.AuthGroup"
102
+ end
103
+ add_message "pb.AuthGroup" do
104
+ repeated :authors, :string, 1
105
+ optional :year, :string, 2
106
+ optional :approximate_year, :bool, 3
107
+ optional :ex_authors, :message, 4, "pb.Authors"
108
+ optional :emend_authors, :message, 5, "pb.Authors"
109
+ end
110
+ add_message "pb.Authors" do
111
+ repeated :authors, :string, 1
112
+ optional :year, :string, 2
113
+ optional :approximate_year, :bool, 3
22
114
  end
23
- add_enum "grpc.Format" do
24
- value :Compact, 0
25
- value :Pretty, 1
26
- value :Simple, 2
27
- value :Debug, 3
115
+ add_enum "pb.NameType" do
116
+ value :NONE, 0
117
+ value :UNINOMIAL, 1
118
+ value :SPECIES, 2
119
+ value :COMPARISON, 3
120
+ value :APPROX_SURROGATE, 4
121
+ value :SURROGATE, 5
122
+ value :NAMED_HYBRID, 6
123
+ value :HYBRID_FORMULA, 7
124
+ value :VIRUS, 8
28
125
  end
29
126
  end
30
127
 
31
- module Grpc
32
- Version = Google::Protobuf::DescriptorPool.generated_pool.lookup("grpc.Version").msgclass
33
- Void = Google::Protobuf::DescriptorPool.generated_pool.lookup("grpc.Void").msgclass
34
- Input = Google::Protobuf::DescriptorPool.generated_pool.lookup("grpc.Input").msgclass
35
- Output = Google::Protobuf::DescriptorPool.generated_pool.lookup("grpc.Output").msgclass
36
- Format = Google::Protobuf::DescriptorPool.generated_pool.lookup("grpc.Format").enummodule
128
+ module Pb
129
+ Version = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Version").msgclass
130
+ Void = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Void").msgclass
131
+ InputArray = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.InputArray").msgclass
132
+ OutputArray = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.OutputArray").msgclass
133
+ Parsed = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Parsed").msgclass
134
+ HybridFormula = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.HybridFormula").msgclass
135
+ Canonical = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Canonical").msgclass
136
+ Position = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Position").msgclass
137
+ QualityWarning = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.QualityWarning").msgclass
138
+ Uninomial = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Uninomial").msgclass
139
+ Species = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Species").msgclass
140
+ InfraSpecies = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.InfraSpecies").msgclass
141
+ Comparison = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Comparison").msgclass
142
+ Approximation = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Approximation").msgclass
143
+ Authorship = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Authorship").msgclass
144
+ AuthGroup = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.AuthGroup").msgclass
145
+ Authors = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.Authors").msgclass
146
+ NameType = Google::Protobuf::DescriptorPool.generated_pool.lookup("pb.NameType").enummodule
37
147
  end
@@ -1,10 +1,10 @@
1
1
  # Generated by the protocol buffer compiler. DO NOT EDIT!
2
- # Source: gnparser.proto for package 'grpc'
2
+ # Source: gnparser.proto for package 'pb'
3
3
 
4
4
  require 'grpc'
5
5
  require 'gnparser_pb'
6
6
 
7
- module Grpc
7
+ module Pb
8
8
  module GNparser
9
9
  class Service
10
10
 
@@ -12,11 +12,14 @@ module Grpc
12
12
 
13
13
  self.marshal_class_method = :encode
14
14
  self.unmarshal_class_method = :decode
15
- self.service_name = 'grpc.GNparser'
15
+ self.service_name = 'pb.GNparser'
16
16
 
17
+ # Ver takes an empty argument (Void) and returns description of the gnparser
18
+ # version and build date and time.
17
19
  rpc :Ver, Void, Version
18
- rpc :Parse, stream(Input), stream(Output)
19
- rpc :ParseInOrder, stream(Input), stream(Output)
20
+ # ParseArray takes a list of name-strings (up to 10000), and retuns back
21
+ # a list of parsed results, preserving the order of input.
22
+ rpc :ParseArray, InputArray, OutputArray
20
23
  end
21
24
 
22
25
  Stub = Service.rpc_stub_class
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gnparser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dmitry Mozzherin
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-01-17 00:00:00.000000000 Z
11
+ date: 2019-08-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: grpc
@@ -44,14 +44,14 @@ dependencies:
44
44
  requirements:
45
45
  - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: '1.16'
47
+ version: '2.0'
48
48
  type: :development
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
52
  - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: '1.16'
54
+ version: '2.0'
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: byebug
57
57
  requirement: !ruby/object:Gem::Requirement
@@ -116,14 +116,12 @@ executables: []
116
116
  extensions: []
117
117
  extra_rdoc_files: []
118
118
  files:
119
- - ".byebug_history"
120
119
  - ".gitignore"
121
120
  - ".rspec"
122
121
  - ".rubocop.yml"
123
122
  - ".vscode/settings.json"
124
123
  - CHANGELOG.md
125
124
  - Gemfile
126
- - Gemfile.lock
127
125
  - LICENSE
128
126
  - README.md
129
127
  - Rakefile
data/.byebug_history DELETED
@@ -1,16 +0,0 @@
1
- q
2
- res.next
3
- q
4
- res.next
5
- res
6
- q
7
- res.each {|r| puts r}
8
- res.next
9
- p res
10
- q
11
- res.next
12
- res
13
- c
14
- q
15
- puts res.next
16
- res