inci_score 2.0.2 → 2.1.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +41 -38
- data/Rakefile +6 -0
- data/config.ru +2 -2
- data/lib/inci_score/app.rb +19 -0
- data/lib/inci_score/cli.rb +8 -1
- data/lib/inci_score/fetcher.rb +4 -6
- data/lib/inci_score/recognizer.rb +1 -2
- data/lib/inci_score/recognizer_rules.rb +27 -32
- data/lib/inci_score/server.rb +35 -0
- data/lib/inci_score/version.rb +1 -1
- metadata +4 -3
- data/lib/inci_score/api/app.rb +0 -21
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a58af371501ff4990b40385d05ae7589a946378f
|
4
|
+
data.tar.gz: e5f9a46ee5ab3c29a936c3598a10de2755f50686
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3ddcaa818e14bdca461c235d11082b83512ec29a6f478d8a797252bfbb98a3377d2c06ad0af21ba0a88df512bcbffd1657fe0dcb8374bd5ac05d46e84c1afdbc
|
7
|
+
data.tar.gz: 75d995e0ce1e3a755e36a074ddace8d51d57d58d15f6720e3a2e6aeb407fbfd8bdb1f1276648ca74cc1372556763455c5caf7d9abbaa994d79af6cf03fbd7187
|
data/README.md
CHANGED
@@ -7,17 +7,16 @@
|
|
7
7
|
* [Sources](#sources)
|
8
8
|
* [API](#api)
|
9
9
|
* [Unrecognized components](#unrecognized-components)
|
10
|
-
* [
|
11
|
-
* [Starting Puma](#starting-puma)
|
12
|
-
* [Triggering a request](#triggering-a-request)
|
13
|
-
* [CLI API](#cli-api)
|
10
|
+
* [CLI](#cli)
|
14
11
|
* [Refresh catalog](#refresh-catalog)
|
12
|
+
* [HTTP server](#http-server)
|
13
|
+
* [Triggering a request](#triggering-a-request)
|
14
|
+
* [Getting help](#getting-help)
|
15
15
|
* [Benchmark](#benchmark)
|
16
16
|
* [Levenshtein in C](#levenshtein-in-c)
|
17
17
|
* [Platform](#platform)
|
18
18
|
* [Wrk](#wrk)
|
19
19
|
* [Results](#results)
|
20
|
-
* [Ruby 2.4](#ruby-2.4)
|
21
20
|
|
22
21
|
## Scope
|
23
22
|
This gem computes the score of cosmetic components basing on the information provided by the [Biodizionario site](http://www.biodizionario.it/) by Fabrizio Zago.
|
@@ -74,30 +73,11 @@ inci.unrecognized
|
|
74
73
|
=> ["noent1", "noent2"]
|
75
74
|
```
|
76
75
|
|
77
|
-
##
|
78
|
-
|
76
|
+
## CLI
|
77
|
+
You can collect INCI data by using the available CLI interface:
|
79
78
|
|
80
|
-
### Starting Puma
|
81
|
-
Simply start Puma via the *config.ru* file included in the repository by spawning how many workers as your current workstation supports:
|
82
79
|
```shell
|
83
|
-
|
84
|
-
```
|
85
|
-
|
86
|
-
### Triggering a request
|
87
|
-
The Web API responds with a JSON object representing the original *InciScore::Response* one.
|
88
|
-
|
89
|
-
You can pass the source string directly as a HTTP parameter:
|
90
|
-
|
91
|
-
```shell
|
92
|
-
curl http://127.0.0.1:9292?src=aqua,dimethicone
|
93
|
-
=> {"components":{"aqua":0,"dimethicone":4},"unrecognized":[],"score":53.762874945799766,"valid":true}
|
94
|
-
```
|
95
|
-
|
96
|
-
## CLI API
|
97
|
-
You can collect INCI data by using the available binary:
|
98
|
-
|
99
|
-
```shell
|
100
|
-
inci_score --src="aqua,dimethicone,pej-10,noent"
|
80
|
+
inci_score --src="ingredients: aqua, dimethicone, pej-10, noent"
|
101
81
|
|
102
82
|
TOTAL SCORE:
|
103
83
|
47.18034913243358
|
@@ -112,9 +92,37 @@ UNRECOGNIZED:
|
|
112
92
|
```
|
113
93
|
|
114
94
|
### Refresh catalog
|
115
|
-
|
95
|
+
You also have the option to fetch a fresh catalog from www.biodizionario.it by specifyng a flag:
|
96
|
+
```shell
|
97
|
+
inci_score --fresh --src="aqua, dimethicone"
|
98
|
+
```
|
99
|
+
|
100
|
+
### HTTP server
|
101
|
+
The CLI interface exposes a Web layer based on the [Puma](http://puma.io/) application server.
|
102
|
+
The HTTP server is started on the specified port by spawning as many workers as your current workstation supports:
|
103
|
+
```shell
|
104
|
+
inci_score --http=9292
|
105
|
+
```
|
106
|
+
Consider all other options are discarded when running HTTP server.
|
107
|
+
|
108
|
+
#### Triggering a request
|
109
|
+
The HTTP server responds with a JSON representation of the original *InciScore::Response* object.
|
110
|
+
You can pass the source string directly as a HTTP parameter:
|
111
|
+
|
112
|
+
```shell
|
113
|
+
curl http://127.0.0.1:9292?src=aqua,dimethicone
|
114
|
+
=> {"components":{"aqua":0,"dimethicone":4},"unrecognized":[],"score":53.762874945799766,"valid":true}
|
115
|
+
```
|
116
|
+
|
117
|
+
### Getting help
|
118
|
+
You can get CLI interface help by:
|
116
119
|
```shell
|
117
|
-
inci_score --
|
120
|
+
inci_score --help
|
121
|
+
Usage: ./bin/inci_score --src='aqua, parfum, etc' --fresh
|
122
|
+
-s, --src=SRC The INCI list: 'aqua, parfum, etc'
|
123
|
+
-f, --fresh Fetch a fresh catalog from remote
|
124
|
+
--http=PORT Start Puma server on the specified port
|
125
|
+
-h, --help Prints this help
|
118
126
|
```
|
119
127
|
|
120
128
|
## Benchmark
|
@@ -123,7 +131,6 @@ inci_score --fresh --src="aqua,dimethicone,pej-10,noent"
|
|
123
131
|
I noticed the APIs slows down dramatically when dealing with unrecognized components to fuzzy match on.
|
124
132
|
I profiled the code by using the [benchmark-ips](https://github.com/evanphx/benchmark-ips) gem, finding the bottleneck was the pure Ruby implementation of the Levenshtein distance algorithm.
|
125
133
|
After some pointless optimization, i replaced this routine with a C implementation: i opted for the straightforward [Ruby Inline](https://github.com/seattlerb/rubyinline) library to call the C code straight from Ruby.
|
126
|
-
As a result i've got a 10x increment of the throughput, all without scarifying code readability.
|
127
134
|
|
128
135
|
### Platform
|
129
136
|
I registered these benchmarks with a MacBook PRO 15 mid 2015 having these specs:
|
@@ -141,11 +148,7 @@ wrk -t 4 -c 100 -d 30s --timeout 2000 http://127.0.0.1:9292/?src=<list_of_ingred
|
|
141
148
|
```
|
142
149
|
|
143
150
|
### Results
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
148
|
-
## Ruby 2.4
|
149
|
-
After upgrading to Ruby 2.4 i doubled the throughput of the matcher (24008.11 vs 48863.58 req/s): i assume Ruby optimization to the [Hash access](https://blog.heroku.com/ruby-2-4-features-hashes-integers-rounding) is the driving reason.
|
150
|
-
I also adopted the new #match? method to avoid creating a MatchData object when i am just checking for predicate.
|
151
|
-
In the end Ruby upgrade is a big deal for my gem and i recommend to give it a try!
|
151
|
+
| Ingredients | Throughput (req/s) | Latency in ms (avg/stdev/max) |
|
152
|
+
| :----------------------- | -----------------: | ----------------------------: |
|
153
|
+
| aqua,parfum,zeolite | 26054.91 | 0.63/1.03/79.86 |
|
154
|
+
| agua,porfum,zaolite | 953.44 | 14.67/5.15/82.31 |
|
data/Rakefile
CHANGED
@@ -13,6 +13,12 @@ namespace :spec do
|
|
13
13
|
t.libs << 'lib'
|
14
14
|
t.test_files = FileList['spec/integration/*_spec.rb']
|
15
15
|
end
|
16
|
+
|
17
|
+
Rake::TestTask.new(:bench) do |t|
|
18
|
+
t.libs << 'spec'
|
19
|
+
t.libs << 'lib'
|
20
|
+
t.test_files = FileList['spec/bench/*_bench.rb']
|
21
|
+
end
|
16
22
|
end
|
17
23
|
|
18
24
|
task :default => :"spec:unit"
|
data/config.ru
CHANGED
@@ -1,3 +1,3 @@
|
|
1
|
-
require 'inci_score/
|
1
|
+
require 'inci_score/app'
|
2
2
|
|
3
|
-
run InciScore::
|
3
|
+
run InciScore::App
|
@@ -0,0 +1,19 @@
|
|
1
|
+
require 'rack'
|
2
|
+
require 'inci_score'
|
3
|
+
|
4
|
+
module InciScore
|
5
|
+
module App
|
6
|
+
extend self
|
7
|
+
|
8
|
+
def catalog
|
9
|
+
@catalog ||= Catalog.fetch
|
10
|
+
end
|
11
|
+
|
12
|
+
def call(env)
|
13
|
+
req = Rack::Request.new(env)
|
14
|
+
src = req.params["src"]
|
15
|
+
json = src ? Computer.new(src: src, catalog: catalog).call.to_json : %q({"error": "no valid source"})
|
16
|
+
['200', {'Content-Type' => 'application/json'}, [json]]
|
17
|
+
end
|
18
|
+
end
|
19
|
+
end
|
data/lib/inci_score/cli.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
require "optparse"
|
2
2
|
require "inci_score/computer"
|
3
|
+
require "inci_score/server"
|
3
4
|
|
4
5
|
module InciScore
|
5
6
|
class CLI
|
@@ -9,10 +10,12 @@ module InciScore
|
|
9
10
|
@catalog = catalog
|
10
11
|
@src = nil
|
11
12
|
@fresh = nil
|
13
|
+
@port = nil
|
12
14
|
end
|
13
15
|
|
14
|
-
def call(computer_klass
|
16
|
+
def call(server_klass: Server, computer_klass: Computer, fetcher: Fetcher.new)
|
15
17
|
parser.parse!(@args)
|
18
|
+
return server_klass.new(port: @port, preload: true).run if @port
|
16
19
|
return @io.puts("Specify inci list as: --src='aqua, parfum, etc'") unless @src
|
17
20
|
@io.puts computer_klass.new(src: @src, catalog: catalog(fetcher)).call
|
18
21
|
end
|
@@ -29,6 +32,10 @@ module InciScore
|
|
29
32
|
@fresh = fresh
|
30
33
|
end
|
31
34
|
|
35
|
+
opts.on("--http=PORT", "Start Puma server on the specified port") do |port|
|
36
|
+
@port = port
|
37
|
+
end
|
38
|
+
|
32
39
|
opts.on("-h", "--help", "Prints this help") do
|
33
40
|
@io.puts opts
|
34
41
|
exit
|
data/lib/inci_score/fetcher.rb
CHANGED
@@ -21,21 +21,19 @@ module InciScore
|
|
21
21
|
end
|
22
22
|
end
|
23
23
|
|
24
|
-
private
|
25
|
-
|
26
|
-
def doc
|
24
|
+
private def doc
|
27
25
|
@src.respond_to?(:value) ? @src.value : @src
|
28
26
|
end
|
29
27
|
|
30
|
-
def semaphore(src)
|
28
|
+
private def semaphore(src)
|
31
29
|
src.match(/(#{SEMAPHORES.join('|')}).gif$/)[1]
|
32
30
|
end
|
33
31
|
|
34
|
-
def normalize(node)
|
32
|
+
private def normalize(node)
|
35
33
|
node.text.strip.downcase
|
36
34
|
end
|
37
35
|
|
38
|
-
def swap?(desc)
|
36
|
+
private def swap?(desc)
|
39
37
|
return false if desc.empty?
|
40
38
|
desc == desc.upcase
|
41
39
|
end
|
@@ -13,9 +13,8 @@ module InciScore
|
|
13
13
|
def call
|
14
14
|
@component = @rules.reduce(nil) do |component, rule|
|
15
15
|
break(component) if component
|
16
|
-
_rule = rule.new(@src, @catalog)
|
17
16
|
yield(rule) if block_given?
|
18
|
-
|
17
|
+
rule.call(@src, @catalog)
|
19
18
|
end
|
20
19
|
[@component, @catalog[@component]] if @component
|
21
20
|
end
|
@@ -4,36 +4,29 @@ module InciScore
|
|
4
4
|
using Refinements
|
5
5
|
class Recognizer
|
6
6
|
module Rules
|
7
|
-
|
8
|
-
TOLERANCE = 3
|
7
|
+
TOLERANCE = 3
|
9
8
|
|
10
|
-
|
11
|
-
|
12
|
-
@catalog = catalog
|
13
|
-
end
|
9
|
+
module Key
|
10
|
+
extend self
|
14
11
|
|
15
|
-
def call
|
16
|
-
|
12
|
+
def call(src, catalog)
|
13
|
+
src if catalog.has_key?(src)
|
17
14
|
end
|
18
15
|
end
|
19
16
|
|
20
|
-
|
21
|
-
|
22
|
-
@src if @catalog.has_key?(@src)
|
23
|
-
end
|
24
|
-
end
|
17
|
+
module Levenshtein
|
18
|
+
extend self
|
25
19
|
|
26
|
-
class Levenshtein < Base
|
27
20
|
ALTERNATE_SEP = '/'
|
28
21
|
|
29
|
-
def call
|
30
|
-
size =
|
31
|
-
initial =
|
32
|
-
component, distance =
|
22
|
+
def call(src, catalog)
|
23
|
+
size = src.size
|
24
|
+
initial = src[0]
|
25
|
+
component, distance = catalog.reduce([nil, size]) do |min, (_component, _)|
|
33
26
|
next min unless _component.start_with?(initial)
|
34
27
|
match = (n = _component.index(ALTERNATE_SEP)) ? _component[0, n] : _component
|
35
28
|
next min if match.size > (size + TOLERANCE)
|
36
|
-
dist =
|
29
|
+
dist = src.distance(match)
|
37
30
|
min = [_component, dist] if dist < min[1]
|
38
31
|
min
|
39
32
|
end
|
@@ -41,34 +34,36 @@ module InciScore
|
|
41
34
|
end
|
42
35
|
end
|
43
36
|
|
44
|
-
|
37
|
+
module Digits
|
38
|
+
extend self
|
39
|
+
|
45
40
|
MIN_MEANINGFUL = 7
|
46
41
|
|
47
|
-
def call
|
48
|
-
return if
|
49
|
-
digits =
|
50
|
-
|
42
|
+
def call(src, catalog)
|
43
|
+
return if src.size < TOLERANCE
|
44
|
+
digits = src[0, MIN_MEANINGFUL]
|
45
|
+
catalog.detect do |component, _|
|
51
46
|
component.matches?(/^#{Regexp::escape(digits)}/)
|
52
47
|
end.to_a.first
|
53
48
|
end
|
54
49
|
end
|
55
50
|
|
56
|
-
|
51
|
+
module Tokens
|
52
|
+
extend self
|
53
|
+
|
57
54
|
UNMATCHABLE = %w[extract oil sodium acid sulfate]
|
58
55
|
|
59
|
-
def call
|
60
|
-
tokens.each do |token|
|
61
|
-
|
56
|
+
def call(src, catalog)
|
57
|
+
tokens(src).each do |token|
|
58
|
+
catalog.each do |component, _|
|
62
59
|
return component if component.matches?(/\b#{Regexp.escape(token)}\b/)
|
63
60
|
end
|
64
61
|
end
|
65
62
|
nil
|
66
63
|
end
|
67
64
|
|
68
|
-
|
69
|
-
|
70
|
-
def tokens
|
71
|
-
(@src.split(' ') - UNMATCHABLE).reject { |t| t.size < TOLERANCE }.sort_by!(&:size).reverse!
|
65
|
+
def tokens(src)
|
66
|
+
(src.split(' ') - UNMATCHABLE).reject { |t| t.size < TOLERANCE }.sort_by!(&:size).reverse!
|
72
67
|
end
|
73
68
|
end
|
74
69
|
end
|
@@ -0,0 +1,35 @@
|
|
1
|
+
require 'etc'
|
2
|
+
require 'puma'
|
3
|
+
|
4
|
+
module InciScore
|
5
|
+
class Server
|
6
|
+
DEFAULT_HOST = "0.0.0.0"
|
7
|
+
|
8
|
+
def initialize(port: 9292, threads: "1:2", workers: Etc.nprocessors, preload: false,
|
9
|
+
config_klass: Puma::Configuration, launcher_klass: Puma::Launcher)
|
10
|
+
@port = port
|
11
|
+
@workers = workers
|
12
|
+
@threads = threads.split(":")
|
13
|
+
@preload = preload
|
14
|
+
@config_klass = config_klass
|
15
|
+
@launcher_klass = launcher_klass
|
16
|
+
end
|
17
|
+
|
18
|
+
def run
|
19
|
+
launcher.run
|
20
|
+
end
|
21
|
+
|
22
|
+
private def launcher
|
23
|
+
@launcher_klass.new(config)
|
24
|
+
end
|
25
|
+
|
26
|
+
private def config
|
27
|
+
@config_klass.new do |c|
|
28
|
+
c.bind "tcp://#{DEFAULT_HOST}:#{@port}"
|
29
|
+
c.workers @workers if @workers > 1
|
30
|
+
c.threads *@threads
|
31
|
+
c.preload_app! if @preload
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
end
|
data/lib/inci_score/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: inci_score
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: 2.1.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- costajob
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-01-
|
11
|
+
date: 2017-01-04 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
@@ -143,7 +143,7 @@ files:
|
|
143
143
|
- ext/levenshtein.c
|
144
144
|
- inci_score.gemspec
|
145
145
|
- lib/inci_score.rb
|
146
|
-
- lib/inci_score/
|
146
|
+
- lib/inci_score/app.rb
|
147
147
|
- lib/inci_score/catalog.rb
|
148
148
|
- lib/inci_score/cli.rb
|
149
149
|
- lib/inci_score/computer.rb
|
@@ -157,6 +157,7 @@ files:
|
|
157
157
|
- lib/inci_score/response.rb
|
158
158
|
- lib/inci_score/score.rb
|
159
159
|
- lib/inci_score/scorer.rb
|
160
|
+
- lib/inci_score/server.rb
|
160
161
|
- lib/inci_score/version.rb
|
161
162
|
- log/.gitignore
|
162
163
|
homepage: https://github.com/costajob/inci_score.git
|
data/lib/inci_score/api/app.rb
DELETED
@@ -1,21 +0,0 @@
|
|
1
|
-
require 'rack'
|
2
|
-
require 'inci_score'
|
3
|
-
|
4
|
-
module InciScore
|
5
|
-
module API
|
6
|
-
module App
|
7
|
-
extend self
|
8
|
-
|
9
|
-
def catalog
|
10
|
-
@catalog ||= Catalog.fetch
|
11
|
-
end
|
12
|
-
|
13
|
-
def call(env)
|
14
|
-
req = Rack::Request.new(env)
|
15
|
-
src = req.params["src"]
|
16
|
-
json = src ? Computer.new(src: src, catalog: catalog).call.to_json : %q({"error": "no valid source"})
|
17
|
-
['200', {'Content-Type' => 'application/json'}, [json]]
|
18
|
-
end
|
19
|
-
end
|
20
|
-
end
|
21
|
-
end
|