inci_score 1.2.1 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ffacec4022ea855b06a3eee2eae7813894ae11c8
4
- data.tar.gz: c53b431b2a74d2f0156d7079f8f866a4326b4c7e
3
+ metadata.gz: 5970cfdecac8492dbfd510dce7a24488e543233c
4
+ data.tar.gz: fb5b1171f1fcab479e24b33dc7d7f37582b93741
5
5
  SHA512:
6
- metadata.gz: 26538bc66e1aa5f845130e446a37d674b538173dcff2bbce5b6bc28c7e168979b44dba6b42ab8d80750b16bf92277d09c2eb9f31789a844485ab48601b13762a
7
- data.tar.gz: ebc439f00b724c3a51bef74e4fac90c838c604a34150d9fd3324a90ed24b618f0092ecc9cb7ed0cb74cf2f20adefc872ef26d5c245800388c568bab003cced68
6
+ metadata.gz: 42624a99c66bc3fcfb53cff14ebe6a153b220901df9b9e5f49f3d8ec2c9378436cd7090446bb449fefc7320c8ff61fdd5375b551b015312858fac8b8bfa8b66c
7
+ data.tar.gz: 2f6bcc48dd8727a6b2b9665882cd3a809b849a7876a7a5303b7a8c3438c373fb3e25c28545889f70fcc14aa1724ea5febfbb5a78a555456ec99e44b5ca9de329
data/.travis.yml CHANGED
@@ -1,5 +1,4 @@
1
1
  language: ruby
2
2
  rvm:
3
- - 2.2.2
4
- - 2.3.0
3
+ - 2.4.0
5
4
  before_install: gem install bundler -v 1.11.2
data/README.md CHANGED
@@ -11,9 +11,13 @@
11
11
  * [Starting Puma](#starting-puma)
12
12
  * [Triggering a request](#triggering-a-request)
13
13
  * [CLI API](#cli-api)
14
- * [Performance](#performance)
14
+ * [Refresh catalog](#refresh-catalog)
15
+ * [Benchmark](#benchmark)
15
16
  * [Levenshtein in C](#levenshtein-in-c)
16
- * [Records](#records)
17
+ * [Platform](#platform)
18
+ * [Wrk](#wrk)
19
+ * [Results](#results)
20
+ * [Ruby 2.4](#ruby-2.4)
17
21
 
18
22
  ## Scope
19
23
  This gem computes the score of cosmetic components basing on the information provided by the [Biodizionario site](http://www.biodizionario.it/) by Fabrizio Zago.
@@ -75,8 +79,8 @@ The Web API exposes the *InciScore* library over HTTP via the [Puma](http://puma
75
79
 
76
80
  ### Starting Puma
77
81
  Simply start Puma via the *config.ru* file included in the repository by spawning how many workers as your current workstation supports:
78
- ```
79
- bundle exec puma -w 7 -t 16:32 --preload
82
+ ```shell
83
+ bundle exec puma -w 8 -t 0:2 --preload
80
84
  ```
81
85
 
82
86
  ### Triggering a request
@@ -84,7 +88,7 @@ The Web API responds with a JSON object representing the original *InciScore::Re
84
88
 
85
89
  You can pass the source string directly as a HTTP parameter:
86
90
 
87
- ```
91
+ ```shell
88
92
  curl http://127.0.0.1:9292?src=aqua,dimethicone
89
93
  => {"components":{"aqua":0,"dimethicone":4},"unrecognized":[],"score":53.762874945799766,"valid":true}
90
94
  ```
@@ -92,8 +96,8 @@ curl http://127.0.0.1:9292?src=aqua,dimethicone
92
96
  ## CLI API
93
97
  You can collect INCI data by using the available binary:
94
98
 
95
- ```
96
- inci_score "aqua,dimethicone,pej-10,noent"
99
+ ```shell
100
+ inci_score --src="aqua,dimethicone,pej-10,noent"
97
101
 
98
102
  TOTAL SCORE:
99
103
  47.18034913243358
@@ -107,11 +111,41 @@ UNRECOGNIZED:
107
111
  noent
108
112
  ```
109
113
 
110
- ## Performance
114
+ ### Refresh catalog
115
+ When using CLI you have the option to fetch a fresh catalog from remote by specifyng a flag:
116
+ ```shell
117
+ inci_score --fresh --src="aqua,dimethicone,pej-10,noent"
118
+ ```
119
+
120
+ ## Benchmark
121
+
122
+ ### Levenshtein in C
111
123
  I noticed the APIs slows down dramatically when dealing with unrecognized components to fuzzy match on.
112
124
  I profiled the code by using the [benchmark-ips](https://github.com/evanphx/benchmark-ips) gem, finding the bottleneck was the pure Ruby implementation of the Levenshtein distance algorithm.
113
125
  After some pointless optimization, i replaced this routine with a C implementation: i opted for the straightforward [Ruby Inline](https://github.com/seattlerb/rubyinline) library to call the C code straight from Ruby.
114
126
  As a result i've got a 10x increment of the throughput, all without scarifying code readability.
115
127
 
116
- ### Numbers
117
- I moved the benchmark numbers to the [Crystal porting](https://github.com/costajob/inci_score.cr) of the InciScore library, please look there.
128
+ ### Platform
129
+ I registered these benchmarks with a MacBook PRO 15 mid 2015 having these specs:
130
+ * OSX El Captain
131
+ * 2,2 GHz Intel Core i7 (4 cores)
132
+ * 16 GB 1600 MHz DDR3
133
+
134
+ ### Wrk
135
+ As always i used [wrk](https://github.com/wg/wrk) as the loading tool.
136
+ I measured each library three times, picking the best lap.
137
+ The following script command is used:
138
+
139
+ ```
140
+ wrk -t 4 -c 100 -d 30s --timeout 2000 http://127.0.0.1:9292/?src=<list_of_ingredients>
141
+ ```
142
+
143
+ ### Results
144
+ | Type | Ingredients | Throughput (req/s) | Latency in ms (avg/stdev/max) |
145
+ | :----------------- | :----------------------- | -----------------: | ----------------------------: |
146
+ | exact matching | aqua,parfum,zeolite | 48863.58 | 0.31/0.55/10.82 |
147
+
148
+ ## Ruby 2.4
149
+ After upgrading to Ruby 2.4 i doubled the throughput of the matcher: i assume Ruby optimization to the [Hash access](#https://blog.heroku.com/ruby-2-4-features-hashes-integers-rounding) is the driving reason.
150
+ I also adopted the new #match? method to avoid creating a MatchData object when i am just checking for predicate.
151
+ In the end Ruby upgrade is a big deal for my gem, give it a try!
data/bin/inci_score CHANGED
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
+ lib = File.expand_path("../../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
2
4
 
3
- require 'bundler/setup'
4
- require 'inci_score'
5
+ require "inci_score"
5
6
 
6
- fail ArgumentError, "please specify at least a src argument" if ARGV.empty?
7
- puts InciScore::Computer.new(ARGV[0], InciScore::Catalog.fetch).call
7
+ InciScore::CLI.new(args: ARGV.clone).call
data/inci_score.gemspec CHANGED
@@ -14,7 +14,7 @@ Gem::Specification.new do |s|
14
14
  s.executables << "inci_score"
15
15
  s.require_paths = ["lib"]
16
16
  s.license = "MIT"
17
- s.required_ruby_version = ">= 2.2.2"
17
+ s.required_ruby_version = ">= 2.4"
18
18
 
19
19
  s.add_runtime_dependency "nokogiri", "~> 1.6"
20
20
  s.add_runtime_dependency "puma", "~> 3"
@@ -13,7 +13,7 @@ module InciScore
13
13
  def call(env)
14
14
  req = Rack::Request.new(env)
15
15
  src = req.params["src"]
16
- json = src ? Computer.new(src, catalog).call.to_json : %q({"error": "no valid source"})
16
+ json = src ? Computer.new(src: src, catalog: catalog).call.to_json : %q({"error": "no valid source"})
17
17
  ['200', {'Content-Type' => 'application/json'}, [json]]
18
18
  end
19
19
  end
@@ -0,0 +1,44 @@
1
+ require "optparse"
2
+ require "inci_score/computer"
3
+
4
+ module InciScore
5
+ class CLI
6
+ def initialize(args:, io: STDOUT, catalog: InciScore::Catalog.fetch)
7
+ @args = args
8
+ @io = io
9
+ @catalog = catalog
10
+ @src = nil
11
+ @fresh = nil
12
+ end
13
+
14
+ def call(computer_klass = Computer, fetcher = Fetcher.new)
15
+ parser.parse!(@args)
16
+ return @io.puts("Specify inci list as: --src='aqua, parfum, etc'") unless @src
17
+ @io.puts computer_klass.new(src: @src, catalog: catalog(fetcher)).call
18
+ end
19
+
20
+ private def parser
21
+ OptionParser.new do |opts|
22
+ opts.banner = %q{Usage: ./bin/inci_score --src='aqua, parfum, etc' --fresh}
23
+
24
+ opts.on("-sSRC", "--src=SRC", "The INCI list: 'aqua, parfum, etc'") do |src|
25
+ @src = src
26
+ end
27
+
28
+ opts.on("-f", "--fresh", "Fetch a fresh catalog from remote") do |fresh|
29
+ @fresh = fresh
30
+ end
31
+
32
+ opts.on("-h", "--help", "Prints this help") do
33
+ @io.puts opts
34
+ exit
35
+ end
36
+ end
37
+ end
38
+
39
+ private def catalog(fetcher)
40
+ return @catalog unless @fresh
41
+ fetcher.call
42
+ end
43
+ end
44
+ end
@@ -7,9 +7,11 @@ module InciScore
7
7
  class Computer
8
8
  TOLERANCE = 30.0
9
9
 
10
- def initialize(src, catalog)
10
+ def initialize(src:, catalog:, tolerance: TOLERANCE, rules: Normalizer::DEFAULT_RULES)
11
11
  @src = src
12
12
  @catalog = catalog
13
+ @tolerance = Float(tolerance)
14
+ @rules = rules
13
15
  @unrecognized = []
14
16
  end
15
17
 
@@ -20,17 +22,15 @@ module InciScore
20
22
  valid: valid?)
21
23
  end
22
24
 
23
- private
24
-
25
- def score
25
+ private def score
26
26
  Scorer.new(components.map(&:last)).call
27
27
  end
28
28
 
29
- def ingredients
30
- @ingredients ||= Normalizer.new(src: @src).call
29
+ private def ingredients
30
+ @ingredients ||= Normalizer.new(src: @src, rules: @rules).call
31
31
  end
32
32
 
33
- def components
33
+ private def components
34
34
  @components ||= ingredients.map do |ingredient|
35
35
  Recognizer.new(ingredient, @catalog).call.tap do |component|
36
36
  @unrecognized << ingredient unless component
@@ -38,8 +38,8 @@ module InciScore
38
38
  end.compact
39
39
  end
40
40
 
41
- def valid?
42
- @unrecognized.size / (ingredients.size / 100.0) <= TOLERANCE
41
+ private def valid?
42
+ @unrecognized.size / (ingredients.size / 100.0) <= @tolerance
43
43
  end
44
44
  end
45
45
  end
@@ -1,7 +1,7 @@
1
1
  require 'nokogiri'
2
2
 
3
3
  module InciScore
4
- class Parser
4
+ class Fetcher
5
5
  BIODIZIO_URI = 'http://www.biodizionario.it/biodizio.php'
6
6
  SEMAPHORES = %w[vv v g r rr]
7
7
  CSS_QUERY = 'table[width="751"] > tr > td img'
@@ -2,7 +2,7 @@ require 'inci_score/normalizer_rules'
2
2
 
3
3
  module InciScore
4
4
  class Normalizer
5
- DEFAULT_RULES = Rules.constants - [:Base]
5
+ DEFAULT_RULES = [Rules::Replacer, Rules::Downcaser, Rules::Beheader, Rules::Separator, Rules::Tokenizer, Rules::Sanitizer, Rules::Desynonymizer]
6
6
 
7
7
  attr_reader :src
8
8
 
@@ -12,9 +12,9 @@ module InciScore
12
12
  end
13
13
 
14
14
  def call
15
- @rules.reduce(@src) do |src, name|
16
- rule = Rules.const_get(name).new(src)
17
- src = rule.call
15
+ yield(@rules) if block_given?
16
+ @rules.reduce(@src) do |src, rule|
17
+ @src = rule.call(src)
18
18
  end
19
19
  end
20
20
  end
@@ -1,73 +1,90 @@
1
1
  module InciScore
2
2
  class Normalizer
3
3
  module Rules
4
- class Base
5
- SEPARATOR = ','
4
+ SEPARATOR = ','
6
5
 
7
- def initialize(src)
8
- @src = src
9
- end
6
+ module Replacer
7
+ extend self
10
8
 
11
- def call
12
- fail NotImplementedError
13
- end
14
- end
15
-
16
- class Replacer < Base
17
9
  REPLACEMENTS = [
18
10
  [/\n+|\t+/, ' '],
19
11
  ['‘', "'"],
20
12
  ['—', '-'],
21
- ['(', 'C'],
22
13
  ['_', ' '],
23
14
  ['~', '-'],
24
15
  ['|', 'l'],
25
16
  [' I ', '/']
26
17
  ]
27
18
 
28
- def call
29
- REPLACEMENTS.reduce(@src) do |src, replacement|
19
+ def call(src)
20
+ REPLACEMENTS.reduce(src) do |_src, replacement|
30
21
  invalid, valid = *replacement
31
- src.index(invalid) ? src.gsub(invalid, valid) : src
22
+ _src.index(invalid) ? _src.gsub(invalid, valid) : _src
32
23
  end
33
24
  end
34
25
  end
35
26
 
36
- class Downcaser < Base
37
- def call
38
- @src.downcase
27
+ module Downcaser
28
+ extend self
29
+
30
+ def call(src)
31
+ src.downcase
39
32
  end
40
33
  end
41
34
 
42
- class Beheader < Base
35
+ module Beheader
36
+ extend self
37
+
43
38
  TITLE_SEP = ':'
44
39
  MAX_INDEX = 50
45
40
 
46
- def call
47
- sep_index = @src.index(TITLE_SEP)
48
- return @src if !sep_index || sep_index > MAX_INDEX
49
- @src[sep_index+1, @src.size]
41
+ def call(src)
42
+ sep_index = src.index(TITLE_SEP)
43
+ return src if !sep_index || sep_index > MAX_INDEX
44
+ src[sep_index+1, src.size]
50
45
  end
51
46
  end
52
47
 
53
- class Separator < Base
48
+ module Separator
49
+ extend self
50
+
54
51
  SEPARATORS = ["; ", ". ", " ' ", " - ", " : "]
55
52
 
56
- def call
57
- SEPARATORS.reduce(@src) do |src, separator|
58
- src = src.gsub(separator, SEPARATOR)
53
+ def call(src)
54
+ SEPARATORS.reduce(src) do |_src, separator|
55
+ _src = _src.gsub(separator, SEPARATOR)
59
56
  end
60
57
  end
61
58
  end
62
59
 
63
- class Tokenizer < Base
64
- INVALID_CHARS = /[^\w\s-]/
60
+ module Tokenizer
61
+ extend self
62
+
63
+ def call(src)
64
+ src.split(SEPARATOR).map(&:strip)
65
+ end
66
+ end
67
+
68
+ module Sanitizer
69
+ extend self
70
+
71
+ INVALID_CHARS = /[^\/\(\)\w\s-]/
72
+
73
+ def call(src)
74
+ Array(src).map do |token|
75
+ token.gsub(INVALID_CHARS, '')
76
+ end.reject(&:empty?)
77
+ end
78
+ end
79
+
80
+ module Desynonymizer
81
+ extend self
82
+
83
+ SYNONYM = /\/.*/
65
84
 
66
- def call
67
- @src.split(SEPARATOR).map do |token|
68
- token = token.sub(/\/.*/, '')
69
- token = token.gsub(INVALID_CHARS, '')
70
- token = token.strip
85
+ def call(src)
86
+ Array(src).map do |token|
87
+ token.sub(SYNONYM, '').strip
71
88
  end.reject(&:empty?)
72
89
  end
73
90
  end
@@ -2,7 +2,7 @@ require 'inci_score/recognizer_rules'
2
2
 
3
3
  module InciScore
4
4
  class Recognizer
5
- DEFAULT_RULES = Rules.constants - [:Base]
5
+ DEFAULT_RULES = [Rules::Key, Rules::Levenshtein, Rules::Digits, Rules::Tokens]
6
6
 
7
7
  def initialize(src, catalog, rules = DEFAULT_RULES)
8
8
  @src = src
@@ -11,17 +11,13 @@ module InciScore
11
11
  end
12
12
 
13
13
  def call
14
- @component = apply_rules
15
- return [@component, @catalog[@component]] if @component
16
- end
17
-
18
- private
19
-
20
- def apply_rules
21
- @rules.reduce(nil) do |component, name|
22
- rule = Rules.const_get(name).new(@src, @catalog)
23
- component || rule.call
14
+ @component = @rules.reduce(nil) do |component, rule|
15
+ break(component) if component
16
+ _rule = rule.new(@src, @catalog)
17
+ yield(rule) if block_given?
18
+ _rule.call
24
19
  end
25
- end
20
+ [@component, @catalog[@component]] if @component
21
+ end
26
22
  end
27
23
  end
@@ -28,12 +28,12 @@ module InciScore
28
28
  def call
29
29
  size = @src.size
30
30
  initial = @src[0]
31
- component, distance = @catalog.reduce([nil, size]) do |min, (component, _)|
32
- next min unless component.start_with?(initial)
33
- match = (n = component.index(ALTERNATE_SEP)) ? component[0, n] : component
31
+ component, distance = @catalog.reduce([nil, size]) do |min, (_component, _)|
32
+ next min unless _component.start_with?(initial)
33
+ match = (n = _component.index(ALTERNATE_SEP)) ? _component[0, n] : _component
34
34
  next min if match.size > (size + TOLERANCE)
35
35
  dist = @src.distance(match)
36
- min = [component, dist] if dist < min[1]
36
+ min = [_component, dist] if dist < min[1]
37
37
  min
38
38
  end
39
39
  component unless distance > TOLERANCE || distance >= (size-1)
@@ -47,7 +47,7 @@ module InciScore
47
47
  return if @src.size < TOLERANCE
48
48
  digits = @src[0, MIN_MEANINGFUL]
49
49
  @catalog.detect do |component, _|
50
- component.match(/^#{Regexp::escape(digits)}/)
50
+ component.match?(/^#{Regexp::escape(digits)}/)
51
51
  end.to_a.first
52
52
  end
53
53
  end
@@ -58,7 +58,7 @@ module InciScore
58
58
  def call
59
59
  tokens.each do |token|
60
60
  @catalog.each do |component, _|
61
- return component if component.match(/\b#{Regexp.escape(token)}\b/)
61
+ return component if component.match?(/\b#{Regexp.escape(token)}\b/)
62
62
  end
63
63
  end
64
64
  nil
@@ -1,3 +1,3 @@
1
1
  module InciScore
2
- VERSION = "1.2.1"
2
+ VERSION = "2.0.1"
3
3
  end
data/lib/inci_score.rb CHANGED
@@ -1,4 +1,5 @@
1
+ require 'open-uri'
1
2
  require 'inci_score/version'
2
- require 'inci_score/parser'
3
+ require 'inci_score/fetcher'
3
4
  require 'inci_score/catalog'
4
- require 'inci_score/computer'
5
+ require 'inci_score/cli'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: inci_score
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.1
4
+ version: 2.0.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - costajob
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-09-28 00:00:00.000000000 Z
11
+ date: 2017-01-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -145,11 +145,12 @@ files:
145
145
  - lib/inci_score.rb
146
146
  - lib/inci_score/api/app.rb
147
147
  - lib/inci_score/catalog.rb
148
+ - lib/inci_score/cli.rb
148
149
  - lib/inci_score/computer.rb
150
+ - lib/inci_score/fetcher.rb
149
151
  - lib/inci_score/levenshtein.rb
150
152
  - lib/inci_score/normalizer.rb
151
153
  - lib/inci_score/normalizer_rules.rb
152
- - lib/inci_score/parser.rb
153
154
  - lib/inci_score/recognizer.rb
154
155
  - lib/inci_score/recognizer_rules.rb
155
156
  - lib/inci_score/response.rb
@@ -169,7 +170,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
169
170
  requirements:
170
171
  - - ">="
171
172
  - !ruby/object:Gem::Version
172
- version: 2.2.2
173
+ version: '2.4'
173
174
  required_rubygems_version: !ruby/object:Gem::Requirement
174
175
  requirements:
175
176
  - - ">="
@@ -177,7 +178,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
177
178
  version: '0'
178
179
  requirements: []
179
180
  rubyforge_project:
180
- rubygems_version: 2.5.1
181
+ rubygems_version: 2.6.8
181
182
  signing_key:
182
183
  specification_version: 4
183
184
  summary: A library that computes the hazard of cosmetic products components, based