inci_score 2.0.2 → 2.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 64259796b3d4a9d134999b66cdbd9bf869340ee5
4
- data.tar.gz: ab5a536ebc4841ab86d3f3a3e5d70a88c146b81b
3
+ metadata.gz: a58af371501ff4990b40385d05ae7589a946378f
4
+ data.tar.gz: e5f9a46ee5ab3c29a936c3598a10de2755f50686
5
5
  SHA512:
6
- metadata.gz: a03e0a913ab6c41350a2305de482801abafa191a249e5fe73437e2a444e0e25c0be9f8479f7c442134ca7ed663d8fef63ef7f985d43aeb05d62e33706fe29f42
7
- data.tar.gz: 3be018bbc1b8e64cae32bd810161237f50187a1f3e430489968ad865b771450b150ea28b59e3aea841a3c92d31b5f30bebd28c9e8b25ec8c4acf1de5a26565ae
6
+ metadata.gz: 3ddcaa818e14bdca461c235d11082b83512ec29a6f478d8a797252bfbb98a3377d2c06ad0af21ba0a88df512bcbffd1657fe0dcb8374bd5ac05d46e84c1afdbc
7
+ data.tar.gz: 75d995e0ce1e3a755e36a074ddace8d51d57d58d15f6720e3a2e6aeb407fbfd8bdb1f1276648ca74cc1372556763455c5caf7d9abbaa994d79af6cf03fbd7187
data/README.md CHANGED
@@ -7,17 +7,16 @@
7
7
  * [Sources](#sources)
8
8
  * [API](#api)
9
9
  * [Unrecognized components](#unrecognized-components)
10
- * [Web API](#web-api)
11
- * [Starting Puma](#starting-puma)
12
- * [Triggering a request](#triggering-a-request)
13
- * [CLI API](#cli-api)
10
+ * [CLI](#cli)
14
11
  * [Refresh catalog](#refresh-catalog)
12
+ * [HTTP server](#http-server)
13
+ * [Triggering a request](#triggering-a-request)
14
+ * [Getting help](#getting-help)
15
15
  * [Benchmark](#benchmark)
16
16
  * [Levenshtein in C](#levenshtein-in-c)
17
17
  * [Platform](#platform)
18
18
  * [Wrk](#wrk)
19
19
  * [Results](#results)
20
- * [Ruby 2.4](#ruby-2.4)
21
20
 
22
21
  ## Scope
23
22
  This gem computes the score of cosmetic components basing on the information provided by the [Biodizionario site](http://www.biodizionario.it/) by Fabrizio Zago.
@@ -74,30 +73,11 @@ inci.unrecognized
74
73
  => ["noent1", "noent2"]
75
74
  ```
76
75
 
77
- ## Web API
78
- The Web API exposes the *InciScore* library over HTTP via the [Puma](http://puma.io/) application server.
76
+ ## CLI
77
+ You can collect INCI data by using the available CLI interface:
79
78
 
80
- ### Starting Puma
81
- Simply start Puma via the *config.ru* file included in the repository by spawning how many workers as your current workstation supports:
82
79
  ```shell
83
- bundle exec puma -w 8 -t 0:2 --preload
84
- ```
85
-
86
- ### Triggering a request
87
- The Web API responds with a JSON object representing the original *InciScore::Response* one.
88
-
89
- You can pass the source string directly as a HTTP parameter:
90
-
91
- ```shell
92
- curl http://127.0.0.1:9292?src=aqua,dimethicone
93
- => {"components":{"aqua":0,"dimethicone":4},"unrecognized":[],"score":53.762874945799766,"valid":true}
94
- ```
95
-
96
- ## CLI API
97
- You can collect INCI data by using the available binary:
98
-
99
- ```shell
100
- inci_score --src="aqua,dimethicone,pej-10,noent"
80
+ inci_score --src="ingredients: aqua, dimethicone, pej-10, noent"
101
81
 
102
82
  TOTAL SCORE:
103
83
  47.18034913243358
@@ -112,9 +92,37 @@ UNRECOGNIZED:
112
92
  ```
113
93
 
114
94
  ### Refresh catalog
115
- When using CLI you have the option to fetch a fresh catalog from remote by specifyng a flag:
95
+ You also have the option to fetch a fresh catalog from www.biodizionario.it by specifyng a flag:
96
+ ```shell
97
+ inci_score --fresh --src="aqua, dimethicone"
98
+ ```
99
+
100
+ ### HTTP server
101
+ The CLI interface exposes a Web layer based on the [Puma](http://puma.io/) application server.
102
+ The HTTP server is started on the specified port by spawning as many workers as your current workstation supports:
103
+ ```shell
104
+ inci_score --http=9292
105
+ ```
106
+ Consider all other options are discarded when running HTTP server.
107
+
108
+ #### Triggering a request
109
+ The HTTP server responds with a JSON representation of the original *InciScore::Response* object.
110
+ You can pass the source string directly as a HTTP parameter:
111
+
112
+ ```shell
113
+ curl http://127.0.0.1:9292?src=aqua,dimethicone
114
+ => {"components":{"aqua":0,"dimethicone":4},"unrecognized":[],"score":53.762874945799766,"valid":true}
115
+ ```
116
+
117
+ ### Getting help
118
+ You can get CLI interface help by:
116
119
  ```shell
117
- inci_score --fresh --src="aqua,dimethicone,pej-10,noent"
120
+ inci_score --help
121
+ Usage: ./bin/inci_score --src='aqua, parfum, etc' --fresh
122
+ -s, --src=SRC The INCI list: 'aqua, parfum, etc'
123
+ -f, --fresh Fetch a fresh catalog from remote
124
+ --http=PORT Start Puma server on the specified port
125
+ -h, --help Prints this help
118
126
  ```
119
127
 
120
128
  ## Benchmark
@@ -123,7 +131,6 @@ inci_score --fresh --src="aqua,dimethicone,pej-10,noent"
123
131
  I noticed the APIs slows down dramatically when dealing with unrecognized components to fuzzy match on.
124
132
  I profiled the code by using the [benchmark-ips](https://github.com/evanphx/benchmark-ips) gem, finding the bottleneck was the pure Ruby implementation of the Levenshtein distance algorithm.
125
133
  After some pointless optimization, i replaced this routine with a C implementation: i opted for the straightforward [Ruby Inline](https://github.com/seattlerb/rubyinline) library to call the C code straight from Ruby.
126
- As a result i've got a 10x increment of the throughput, all without scarifying code readability.
127
134
 
128
135
  ### Platform
129
136
  I registered these benchmarks with a MacBook PRO 15 mid 2015 having these specs:
@@ -141,11 +148,7 @@ wrk -t 4 -c 100 -d 30s --timeout 2000 http://127.0.0.1:9292/?src=<list_of_ingred
141
148
  ```
142
149
 
143
150
  ### Results
144
- | Type | Ingredients | Throughput (req/s) | Latency in ms (avg/stdev/max) |
145
- | :----------------- | :----------------------- | -----------------: | ----------------------------: |
146
- | exact matching | aqua,parfum,zeolite | 48863.58 | 0.31/0.55/10.82 |
147
-
148
- ## Ruby 2.4
149
- After upgrading to Ruby 2.4 i doubled the throughput of the matcher (24008.11 vs 48863.58 req/s): i assume Ruby optimization to the [Hash access](https://blog.heroku.com/ruby-2-4-features-hashes-integers-rounding) is the driving reason.
150
- I also adopted the new #match? method to avoid creating a MatchData object when i am just checking for predicate.
151
- In the end Ruby upgrade is a big deal for my gem and i recommend to give it a try!
151
+ | Ingredients | Throughput (req/s) | Latency in ms (avg/stdev/max) |
152
+ | :----------------------- | -----------------: | ----------------------------: |
153
+ | aqua,parfum,zeolite | 26054.91 | 0.63/1.03/79.86 |
154
+ | agua,porfum,zaolite | 953.44 | 14.67/5.15/82.31 |
data/Rakefile CHANGED
@@ -13,6 +13,12 @@ namespace :spec do
13
13
  t.libs << 'lib'
14
14
  t.test_files = FileList['spec/integration/*_spec.rb']
15
15
  end
16
+
17
+ Rake::TestTask.new(:bench) do |t|
18
+ t.libs << 'spec'
19
+ t.libs << 'lib'
20
+ t.test_files = FileList['spec/bench/*_bench.rb']
21
+ end
16
22
  end
17
23
 
18
24
  task :default => :"spec:unit"
data/config.ru CHANGED
@@ -1,3 +1,3 @@
1
- require 'inci_score/api/app'
1
+ require 'inci_score/app'
2
2
 
3
- run InciScore::API::App
3
+ run InciScore::App
@@ -0,0 +1,19 @@
1
+ require 'rack'
2
+ require 'inci_score'
3
+
4
+ module InciScore
5
+ module App
6
+ extend self
7
+
8
+ def catalog
9
+ @catalog ||= Catalog.fetch
10
+ end
11
+
12
+ def call(env)
13
+ req = Rack::Request.new(env)
14
+ src = req.params["src"]
15
+ json = src ? Computer.new(src: src, catalog: catalog).call.to_json : %q({"error": "no valid source"})
16
+ ['200', {'Content-Type' => 'application/json'}, [json]]
17
+ end
18
+ end
19
+ end
@@ -1,5 +1,6 @@
1
1
  require "optparse"
2
2
  require "inci_score/computer"
3
+ require "inci_score/server"
3
4
 
4
5
  module InciScore
5
6
  class CLI
@@ -9,10 +10,12 @@ module InciScore
9
10
  @catalog = catalog
10
11
  @src = nil
11
12
  @fresh = nil
13
+ @port = nil
12
14
  end
13
15
 
14
- def call(computer_klass = Computer, fetcher = Fetcher.new)
16
+ def call(server_klass: Server, computer_klass: Computer, fetcher: Fetcher.new)
15
17
  parser.parse!(@args)
18
+ return server_klass.new(port: @port, preload: true).run if @port
16
19
  return @io.puts("Specify inci list as: --src='aqua, parfum, etc'") unless @src
17
20
  @io.puts computer_klass.new(src: @src, catalog: catalog(fetcher)).call
18
21
  end
@@ -29,6 +32,10 @@ module InciScore
29
32
  @fresh = fresh
30
33
  end
31
34
 
35
+ opts.on("--http=PORT", "Start Puma server on the specified port") do |port|
36
+ @port = port
37
+ end
38
+
32
39
  opts.on("-h", "--help", "Prints this help") do
33
40
  @io.puts opts
34
41
  exit
@@ -21,21 +21,19 @@ module InciScore
21
21
  end
22
22
  end
23
23
 
24
- private
25
-
26
- def doc
24
+ private def doc
27
25
  @src.respond_to?(:value) ? @src.value : @src
28
26
  end
29
27
 
30
- def semaphore(src)
28
+ private def semaphore(src)
31
29
  src.match(/(#{SEMAPHORES.join('|')}).gif$/)[1]
32
30
  end
33
31
 
34
- def normalize(node)
32
+ private def normalize(node)
35
33
  node.text.strip.downcase
36
34
  end
37
35
 
38
- def swap?(desc)
36
+ private def swap?(desc)
39
37
  return false if desc.empty?
40
38
  desc == desc.upcase
41
39
  end
@@ -13,9 +13,8 @@ module InciScore
13
13
  def call
14
14
  @component = @rules.reduce(nil) do |component, rule|
15
15
  break(component) if component
16
- _rule = rule.new(@src, @catalog)
17
16
  yield(rule) if block_given?
18
- _rule.call
17
+ rule.call(@src, @catalog)
19
18
  end
20
19
  [@component, @catalog[@component]] if @component
21
20
  end
@@ -4,36 +4,29 @@ module InciScore
4
4
  using Refinements
5
5
  class Recognizer
6
6
  module Rules
7
- class Base
8
- TOLERANCE = 3
7
+ TOLERANCE = 3
9
8
 
10
- def initialize(src, catalog)
11
- @src = src
12
- @catalog = catalog
13
- end
9
+ module Key
10
+ extend self
14
11
 
15
- def call
16
- fail NotmplementedError
12
+ def call(src, catalog)
13
+ src if catalog.has_key?(src)
17
14
  end
18
15
  end
19
16
 
20
- class Key < Base
21
- def call
22
- @src if @catalog.has_key?(@src)
23
- end
24
- end
17
+ module Levenshtein
18
+ extend self
25
19
 
26
- class Levenshtein < Base
27
20
  ALTERNATE_SEP = '/'
28
21
 
29
- def call
30
- size = @src.size
31
- initial = @src[0]
32
- component, distance = @catalog.reduce([nil, size]) do |min, (_component, _)|
22
+ def call(src, catalog)
23
+ size = src.size
24
+ initial = src[0]
25
+ component, distance = catalog.reduce([nil, size]) do |min, (_component, _)|
33
26
  next min unless _component.start_with?(initial)
34
27
  match = (n = _component.index(ALTERNATE_SEP)) ? _component[0, n] : _component
35
28
  next min if match.size > (size + TOLERANCE)
36
- dist = @src.distance(match)
29
+ dist = src.distance(match)
37
30
  min = [_component, dist] if dist < min[1]
38
31
  min
39
32
  end
@@ -41,34 +34,36 @@ module InciScore
41
34
  end
42
35
  end
43
36
 
44
- class Digits < Base
37
+ module Digits
38
+ extend self
39
+
45
40
  MIN_MEANINGFUL = 7
46
41
 
47
- def call
48
- return if @src.size < TOLERANCE
49
- digits = @src[0, MIN_MEANINGFUL]
50
- @catalog.detect do |component, _|
42
+ def call(src, catalog)
43
+ return if src.size < TOLERANCE
44
+ digits = src[0, MIN_MEANINGFUL]
45
+ catalog.detect do |component, _|
51
46
  component.matches?(/^#{Regexp::escape(digits)}/)
52
47
  end.to_a.first
53
48
  end
54
49
  end
55
50
 
56
- class Tokens < Base
51
+ module Tokens
52
+ extend self
53
+
57
54
  UNMATCHABLE = %w[extract oil sodium acid sulfate]
58
55
 
59
- def call
60
- tokens.each do |token|
61
- @catalog.each do |component, _|
56
+ def call(src, catalog)
57
+ tokens(src).each do |token|
58
+ catalog.each do |component, _|
62
59
  return component if component.matches?(/\b#{Regexp.escape(token)}\b/)
63
60
  end
64
61
  end
65
62
  nil
66
63
  end
67
64
 
68
- private
69
-
70
- def tokens
71
- (@src.split(' ') - UNMATCHABLE).reject { |t| t.size < TOLERANCE }.sort_by!(&:size).reverse!
65
+ def tokens(src)
66
+ (src.split(' ') - UNMATCHABLE).reject { |t| t.size < TOLERANCE }.sort_by!(&:size).reverse!
72
67
  end
73
68
  end
74
69
  end
@@ -0,0 +1,35 @@
1
+ require 'etc'
2
+ require 'puma'
3
+
4
+ module InciScore
5
+ class Server
6
+ DEFAULT_HOST = "0.0.0.0"
7
+
8
+ def initialize(port: 9292, threads: "1:2", workers: Etc.nprocessors, preload: false,
9
+ config_klass: Puma::Configuration, launcher_klass: Puma::Launcher)
10
+ @port = port
11
+ @workers = workers
12
+ @threads = threads.split(":")
13
+ @preload = preload
14
+ @config_klass = config_klass
15
+ @launcher_klass = launcher_klass
16
+ end
17
+
18
+ def run
19
+ launcher.run
20
+ end
21
+
22
+ private def launcher
23
+ @launcher_klass.new(config)
24
+ end
25
+
26
+ private def config
27
+ @config_klass.new do |c|
28
+ c.bind "tcp://#{DEFAULT_HOST}:#{@port}"
29
+ c.workers @workers if @workers > 1
30
+ c.threads *@threads
31
+ c.preload_app! if @preload
32
+ end
33
+ end
34
+ end
35
+ end
@@ -1,3 +1,3 @@
1
1
  module InciScore
2
- VERSION = "2.0.2"
2
+ VERSION = "2.1.1"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: inci_score
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.2
4
+ version: 2.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - costajob
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-01-03 00:00:00.000000000 Z
11
+ date: 2017-01-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -143,7 +143,7 @@ files:
143
143
  - ext/levenshtein.c
144
144
  - inci_score.gemspec
145
145
  - lib/inci_score.rb
146
- - lib/inci_score/api/app.rb
146
+ - lib/inci_score/app.rb
147
147
  - lib/inci_score/catalog.rb
148
148
  - lib/inci_score/cli.rb
149
149
  - lib/inci_score/computer.rb
@@ -157,6 +157,7 @@ files:
157
157
  - lib/inci_score/response.rb
158
158
  - lib/inci_score/score.rb
159
159
  - lib/inci_score/scorer.rb
160
+ - lib/inci_score/server.rb
160
161
  - lib/inci_score/version.rb
161
162
  - log/.gitignore
162
163
  homepage: https://github.com/costajob/inci_score.git
@@ -1,21 +0,0 @@
1
- require 'rack'
2
- require 'inci_score'
3
-
4
- module InciScore
5
- module API
6
- module App
7
- extend self
8
-
9
- def catalog
10
- @catalog ||= Catalog.fetch
11
- end
12
-
13
- def call(env)
14
- req = Rack::Request.new(env)
15
- src = req.params["src"]
16
- json = src ? Computer.new(src: src, catalog: catalog).call.to_json : %q({"error": "no valid source"})
17
- ['200', {'Content-Type' => 'application/json'}, [json]]
18
- end
19
- end
20
- end
21
- end