inci_score 2.0.2 → 2.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 64259796b3d4a9d134999b66cdbd9bf869340ee5
4
- data.tar.gz: ab5a536ebc4841ab86d3f3a3e5d70a88c146b81b
3
+ metadata.gz: a58af371501ff4990b40385d05ae7589a946378f
4
+ data.tar.gz: e5f9a46ee5ab3c29a936c3598a10de2755f50686
5
5
  SHA512:
6
- metadata.gz: a03e0a913ab6c41350a2305de482801abafa191a249e5fe73437e2a444e0e25c0be9f8479f7c442134ca7ed663d8fef63ef7f985d43aeb05d62e33706fe29f42
7
- data.tar.gz: 3be018bbc1b8e64cae32bd810161237f50187a1f3e430489968ad865b771450b150ea28b59e3aea841a3c92d31b5f30bebd28c9e8b25ec8c4acf1de5a26565ae
6
+ metadata.gz: 3ddcaa818e14bdca461c235d11082b83512ec29a6f478d8a797252bfbb98a3377d2c06ad0af21ba0a88df512bcbffd1657fe0dcb8374bd5ac05d46e84c1afdbc
7
+ data.tar.gz: 75d995e0ce1e3a755e36a074ddace8d51d57d58d15f6720e3a2e6aeb407fbfd8bdb1f1276648ca74cc1372556763455c5caf7d9abbaa994d79af6cf03fbd7187
data/README.md CHANGED
@@ -7,17 +7,16 @@
7
7
  * [Sources](#sources)
8
8
  * [API](#api)
9
9
  * [Unrecognized components](#unrecognized-components)
10
- * [Web API](#web-api)
11
- * [Starting Puma](#starting-puma)
12
- * [Triggering a request](#triggering-a-request)
13
- * [CLI API](#cli-api)
10
+ * [CLI](#cli)
14
11
  * [Refresh catalog](#refresh-catalog)
12
+ * [HTTP server](#http-server)
13
+ * [Triggering a request](#triggering-a-request)
14
+ * [Getting help](#getting-help)
15
15
  * [Benchmark](#benchmark)
16
16
  * [Levenshtein in C](#levenshtein-in-c)
17
17
  * [Platform](#platform)
18
18
  * [Wrk](#wrk)
19
19
  * [Results](#results)
20
- * [Ruby 2.4](#ruby-2.4)
21
20
 
22
21
  ## Scope
23
22
  This gem computes the score of cosmetic components basing on the information provided by the [Biodizionario site](http://www.biodizionario.it/) by Fabrizio Zago.
@@ -74,30 +73,11 @@ inci.unrecognized
74
73
  => ["noent1", "noent2"]
75
74
  ```
76
75
 
77
- ## Web API
78
- The Web API exposes the *InciScore* library over HTTP via the [Puma](http://puma.io/) application server.
76
+ ## CLI
77
+ You can collect INCI data by using the available CLI interface:
79
78
 
80
- ### Starting Puma
81
- Simply start Puma via the *config.ru* file included in the repository by spawning how many workers as your current workstation supports:
82
79
  ```shell
83
- bundle exec puma -w 8 -t 0:2 --preload
84
- ```
85
-
86
- ### Triggering a request
87
- The Web API responds with a JSON object representing the original *InciScore::Response* one.
88
-
89
- You can pass the source string directly as a HTTP parameter:
90
-
91
- ```shell
92
- curl http://127.0.0.1:9292?src=aqua,dimethicone
93
- => {"components":{"aqua":0,"dimethicone":4},"unrecognized":[],"score":53.762874945799766,"valid":true}
94
- ```
95
-
96
- ## CLI API
97
- You can collect INCI data by using the available binary:
98
-
99
- ```shell
100
- inci_score --src="aqua,dimethicone,pej-10,noent"
80
+ inci_score --src="ingredients: aqua, dimethicone, pej-10, noent"
101
81
 
102
82
  TOTAL SCORE:
103
83
  47.18034913243358
@@ -112,9 +92,37 @@ UNRECOGNIZED:
112
92
  ```
113
93
 
114
94
  ### Refresh catalog
115
- When using CLI you have the option to fetch a fresh catalog from remote by specifyng a flag:
95
+ You also have the option to fetch a fresh catalog from www.biodizionario.it by specifyng a flag:
96
+ ```shell
97
+ inci_score --fresh --src="aqua, dimethicone"
98
+ ```
99
+
100
+ ### HTTP server
101
+ The CLI interface exposes a Web layer based on the [Puma](http://puma.io/) application server.
102
+ The HTTP server is started on the specified port by spawning as many workers as your current workstation supports:
103
+ ```shell
104
+ inci_score --http=9292
105
+ ```
106
+ Consider all other options are discarded when running HTTP server.
107
+
108
+ #### Triggering a request
109
+ The HTTP server responds with a JSON representation of the original *InciScore::Response* object.
110
+ You can pass the source string directly as a HTTP parameter:
111
+
112
+ ```shell
113
+ curl http://127.0.0.1:9292?src=aqua,dimethicone
114
+ => {"components":{"aqua":0,"dimethicone":4},"unrecognized":[],"score":53.762874945799766,"valid":true}
115
+ ```
116
+
117
+ ### Getting help
118
+ You can get CLI interface help by:
116
119
  ```shell
117
- inci_score --fresh --src="aqua,dimethicone,pej-10,noent"
120
+ inci_score --help
121
+ Usage: ./bin/inci_score --src='aqua, parfum, etc' --fresh
122
+ -s, --src=SRC The INCI list: 'aqua, parfum, etc'
123
+ -f, --fresh Fetch a fresh catalog from remote
124
+ --http=PORT Start Puma server on the specified port
125
+ -h, --help Prints this help
118
126
  ```
119
127
 
120
128
  ## Benchmark
@@ -123,7 +131,6 @@ inci_score --fresh --src="aqua,dimethicone,pej-10,noent"
123
131
  I noticed the APIs slows down dramatically when dealing with unrecognized components to fuzzy match on.
124
132
  I profiled the code by using the [benchmark-ips](https://github.com/evanphx/benchmark-ips) gem, finding the bottleneck was the pure Ruby implementation of the Levenshtein distance algorithm.
125
133
  After some pointless optimization, i replaced this routine with a C implementation: i opted for the straightforward [Ruby Inline](https://github.com/seattlerb/rubyinline) library to call the C code straight from Ruby.
126
- As a result i've got a 10x increment of the throughput, all without scarifying code readability.
127
134
 
128
135
  ### Platform
129
136
  I registered these benchmarks with a MacBook PRO 15 mid 2015 having these specs:
@@ -141,11 +148,7 @@ wrk -t 4 -c 100 -d 30s --timeout 2000 http://127.0.0.1:9292/?src=<list_of_ingred
141
148
  ```
142
149
 
143
150
  ### Results
144
- | Type | Ingredients | Throughput (req/s) | Latency in ms (avg/stdev/max) |
145
- | :----------------- | :----------------------- | -----------------: | ----------------------------: |
146
- | exact matching | aqua,parfum,zeolite | 48863.58 | 0.31/0.55/10.82 |
147
-
148
- ## Ruby 2.4
149
- After upgrading to Ruby 2.4 i doubled the throughput of the matcher (24008.11 vs 48863.58 req/s): i assume Ruby optimization to the [Hash access](https://blog.heroku.com/ruby-2-4-features-hashes-integers-rounding) is the driving reason.
150
- I also adopted the new #match? method to avoid creating a MatchData object when i am just checking for predicate.
151
- In the end Ruby upgrade is a big deal for my gem and i recommend to give it a try!
151
+ | Ingredients | Throughput (req/s) | Latency in ms (avg/stdev/max) |
152
+ | :----------------------- | -----------------: | ----------------------------: |
153
+ | aqua,parfum,zeolite | 26054.91 | 0.63/1.03/79.86 |
154
+ | agua,porfum,zaolite | 953.44 | 14.67/5.15/82.31 |
data/Rakefile CHANGED
@@ -13,6 +13,12 @@ namespace :spec do
13
13
  t.libs << 'lib'
14
14
  t.test_files = FileList['spec/integration/*_spec.rb']
15
15
  end
16
+
17
+ Rake::TestTask.new(:bench) do |t|
18
+ t.libs << 'spec'
19
+ t.libs << 'lib'
20
+ t.test_files = FileList['spec/bench/*_bench.rb']
21
+ end
16
22
  end
17
23
 
18
24
  task :default => :"spec:unit"
data/config.ru CHANGED
@@ -1,3 +1,3 @@
1
- require 'inci_score/api/app'
1
+ require 'inci_score/app'
2
2
 
3
- run InciScore::API::App
3
+ run InciScore::App
@@ -0,0 +1,19 @@
1
+ require 'rack'
2
+ require 'inci_score'
3
+
4
+ module InciScore
5
+ module App
6
+ extend self
7
+
8
+ def catalog
9
+ @catalog ||= Catalog.fetch
10
+ end
11
+
12
+ def call(env)
13
+ req = Rack::Request.new(env)
14
+ src = req.params["src"]
15
+ json = src ? Computer.new(src: src, catalog: catalog).call.to_json : %q({"error": "no valid source"})
16
+ ['200', {'Content-Type' => 'application/json'}, [json]]
17
+ end
18
+ end
19
+ end
@@ -1,5 +1,6 @@
1
1
  require "optparse"
2
2
  require "inci_score/computer"
3
+ require "inci_score/server"
3
4
 
4
5
  module InciScore
5
6
  class CLI
@@ -9,10 +10,12 @@ module InciScore
9
10
  @catalog = catalog
10
11
  @src = nil
11
12
  @fresh = nil
13
+ @port = nil
12
14
  end
13
15
 
14
- def call(computer_klass = Computer, fetcher = Fetcher.new)
16
+ def call(server_klass: Server, computer_klass: Computer, fetcher: Fetcher.new)
15
17
  parser.parse!(@args)
18
+ return server_klass.new(port: @port, preload: true).run if @port
16
19
  return @io.puts("Specify inci list as: --src='aqua, parfum, etc'") unless @src
17
20
  @io.puts computer_klass.new(src: @src, catalog: catalog(fetcher)).call
18
21
  end
@@ -29,6 +32,10 @@ module InciScore
29
32
  @fresh = fresh
30
33
  end
31
34
 
35
+ opts.on("--http=PORT", "Start Puma server on the specified port") do |port|
36
+ @port = port
37
+ end
38
+
32
39
  opts.on("-h", "--help", "Prints this help") do
33
40
  @io.puts opts
34
41
  exit
@@ -21,21 +21,19 @@ module InciScore
21
21
  end
22
22
  end
23
23
 
24
- private
25
-
26
- def doc
24
+ private def doc
27
25
  @src.respond_to?(:value) ? @src.value : @src
28
26
  end
29
27
 
30
- def semaphore(src)
28
+ private def semaphore(src)
31
29
  src.match(/(#{SEMAPHORES.join('|')}).gif$/)[1]
32
30
  end
33
31
 
34
- def normalize(node)
32
+ private def normalize(node)
35
33
  node.text.strip.downcase
36
34
  end
37
35
 
38
- def swap?(desc)
36
+ private def swap?(desc)
39
37
  return false if desc.empty?
40
38
  desc == desc.upcase
41
39
  end
@@ -13,9 +13,8 @@ module InciScore
13
13
  def call
14
14
  @component = @rules.reduce(nil) do |component, rule|
15
15
  break(component) if component
16
- _rule = rule.new(@src, @catalog)
17
16
  yield(rule) if block_given?
18
- _rule.call
17
+ rule.call(@src, @catalog)
19
18
  end
20
19
  [@component, @catalog[@component]] if @component
21
20
  end
@@ -4,36 +4,29 @@ module InciScore
4
4
  using Refinements
5
5
  class Recognizer
6
6
  module Rules
7
- class Base
8
- TOLERANCE = 3
7
+ TOLERANCE = 3
9
8
 
10
- def initialize(src, catalog)
11
- @src = src
12
- @catalog = catalog
13
- end
9
+ module Key
10
+ extend self
14
11
 
15
- def call
16
- fail NotmplementedError
12
+ def call(src, catalog)
13
+ src if catalog.has_key?(src)
17
14
  end
18
15
  end
19
16
 
20
- class Key < Base
21
- def call
22
- @src if @catalog.has_key?(@src)
23
- end
24
- end
17
+ module Levenshtein
18
+ extend self
25
19
 
26
- class Levenshtein < Base
27
20
  ALTERNATE_SEP = '/'
28
21
 
29
- def call
30
- size = @src.size
31
- initial = @src[0]
32
- component, distance = @catalog.reduce([nil, size]) do |min, (_component, _)|
22
+ def call(src, catalog)
23
+ size = src.size
24
+ initial = src[0]
25
+ component, distance = catalog.reduce([nil, size]) do |min, (_component, _)|
33
26
  next min unless _component.start_with?(initial)
34
27
  match = (n = _component.index(ALTERNATE_SEP)) ? _component[0, n] : _component
35
28
  next min if match.size > (size + TOLERANCE)
36
- dist = @src.distance(match)
29
+ dist = src.distance(match)
37
30
  min = [_component, dist] if dist < min[1]
38
31
  min
39
32
  end
@@ -41,34 +34,36 @@ module InciScore
41
34
  end
42
35
  end
43
36
 
44
- class Digits < Base
37
+ module Digits
38
+ extend self
39
+
45
40
  MIN_MEANINGFUL = 7
46
41
 
47
- def call
48
- return if @src.size < TOLERANCE
49
- digits = @src[0, MIN_MEANINGFUL]
50
- @catalog.detect do |component, _|
42
+ def call(src, catalog)
43
+ return if src.size < TOLERANCE
44
+ digits = src[0, MIN_MEANINGFUL]
45
+ catalog.detect do |component, _|
51
46
  component.matches?(/^#{Regexp::escape(digits)}/)
52
47
  end.to_a.first
53
48
  end
54
49
  end
55
50
 
56
- class Tokens < Base
51
+ module Tokens
52
+ extend self
53
+
57
54
  UNMATCHABLE = %w[extract oil sodium acid sulfate]
58
55
 
59
- def call
60
- tokens.each do |token|
61
- @catalog.each do |component, _|
56
+ def call(src, catalog)
57
+ tokens(src).each do |token|
58
+ catalog.each do |component, _|
62
59
  return component if component.matches?(/\b#{Regexp.escape(token)}\b/)
63
60
  end
64
61
  end
65
62
  nil
66
63
  end
67
64
 
68
- private
69
-
70
- def tokens
71
- (@src.split(' ') - UNMATCHABLE).reject { |t| t.size < TOLERANCE }.sort_by!(&:size).reverse!
65
+ def tokens(src)
66
+ (src.split(' ') - UNMATCHABLE).reject { |t| t.size < TOLERANCE }.sort_by!(&:size).reverse!
72
67
  end
73
68
  end
74
69
  end
@@ -0,0 +1,35 @@
1
+ require 'etc'
2
+ require 'puma'
3
+
4
+ module InciScore
5
+ class Server
6
+ DEFAULT_HOST = "0.0.0.0"
7
+
8
+ def initialize(port: 9292, threads: "1:2", workers: Etc.nprocessors, preload: false,
9
+ config_klass: Puma::Configuration, launcher_klass: Puma::Launcher)
10
+ @port = port
11
+ @workers = workers
12
+ @threads = threads.split(":")
13
+ @preload = preload
14
+ @config_klass = config_klass
15
+ @launcher_klass = launcher_klass
16
+ end
17
+
18
+ def run
19
+ launcher.run
20
+ end
21
+
22
+ private def launcher
23
+ @launcher_klass.new(config)
24
+ end
25
+
26
+ private def config
27
+ @config_klass.new do |c|
28
+ c.bind "tcp://#{DEFAULT_HOST}:#{@port}"
29
+ c.workers @workers if @workers > 1
30
+ c.threads *@threads
31
+ c.preload_app! if @preload
32
+ end
33
+ end
34
+ end
35
+ end
@@ -1,3 +1,3 @@
1
1
  module InciScore
2
- VERSION = "2.0.2"
2
+ VERSION = "2.1.1"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: inci_score
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.2
4
+ version: 2.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - costajob
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-01-03 00:00:00.000000000 Z
11
+ date: 2017-01-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -143,7 +143,7 @@ files:
143
143
  - ext/levenshtein.c
144
144
  - inci_score.gemspec
145
145
  - lib/inci_score.rb
146
- - lib/inci_score/api/app.rb
146
+ - lib/inci_score/app.rb
147
147
  - lib/inci_score/catalog.rb
148
148
  - lib/inci_score/cli.rb
149
149
  - lib/inci_score/computer.rb
@@ -157,6 +157,7 @@ files:
157
157
  - lib/inci_score/response.rb
158
158
  - lib/inci_score/score.rb
159
159
  - lib/inci_score/scorer.rb
160
+ - lib/inci_score/server.rb
160
161
  - lib/inci_score/version.rb
161
162
  - log/.gitignore
162
163
  homepage: https://github.com/costajob/inci_score.git
@@ -1,21 +0,0 @@
1
- require 'rack'
2
- require 'inci_score'
3
-
4
- module InciScore
5
- module API
6
- module App
7
- extend self
8
-
9
- def catalog
10
- @catalog ||= Catalog.fetch
11
- end
12
-
13
- def call(env)
14
- req = Rack::Request.new(env)
15
- src = req.params["src"]
16
- json = src ? Computer.new(src: src, catalog: catalog).call.to_json : %q({"error": "no valid source"})
17
- ['200', {'Content-Type' => 'application/json'}, [json]]
18
- end
19
- end
20
- end
21
- end