inci_score 2.0.2 → 2.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +41 -38
- data/Rakefile +6 -0
- data/config.ru +2 -2
- data/lib/inci_score/app.rb +19 -0
- data/lib/inci_score/cli.rb +8 -1
- data/lib/inci_score/fetcher.rb +4 -6
- data/lib/inci_score/recognizer.rb +1 -2
- data/lib/inci_score/recognizer_rules.rb +27 -32
- data/lib/inci_score/server.rb +35 -0
- data/lib/inci_score/version.rb +1 -1
- metadata +4 -3
- data/lib/inci_score/api/app.rb +0 -21
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a58af371501ff4990b40385d05ae7589a946378f
|
4
|
+
data.tar.gz: e5f9a46ee5ab3c29a936c3598a10de2755f50686
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3ddcaa818e14bdca461c235d11082b83512ec29a6f478d8a797252bfbb98a3377d2c06ad0af21ba0a88df512bcbffd1657fe0dcb8374bd5ac05d46e84c1afdbc
|
7
|
+
data.tar.gz: 75d995e0ce1e3a755e36a074ddace8d51d57d58d15f6720e3a2e6aeb407fbfd8bdb1f1276648ca74cc1372556763455c5caf7d9abbaa994d79af6cf03fbd7187
|
data/README.md
CHANGED
@@ -7,17 +7,16 @@
|
|
7
7
|
* [Sources](#sources)
|
8
8
|
* [API](#api)
|
9
9
|
* [Unrecognized components](#unrecognized-components)
|
10
|
-
* [
|
11
|
-
* [Starting Puma](#starting-puma)
|
12
|
-
* [Triggering a request](#triggering-a-request)
|
13
|
-
* [CLI API](#cli-api)
|
10
|
+
* [CLI](#cli)
|
14
11
|
* [Refresh catalog](#refresh-catalog)
|
12
|
+
* [HTTP server](#http-server)
|
13
|
+
* [Triggering a request](#triggering-a-request)
|
14
|
+
* [Getting help](#getting-help)
|
15
15
|
* [Benchmark](#benchmark)
|
16
16
|
* [Levenshtein in C](#levenshtein-in-c)
|
17
17
|
* [Platform](#platform)
|
18
18
|
* [Wrk](#wrk)
|
19
19
|
* [Results](#results)
|
20
|
-
* [Ruby 2.4](#ruby-2.4)
|
21
20
|
|
22
21
|
## Scope
|
23
22
|
This gem computes the score of cosmetic components basing on the information provided by the [Biodizionario site](http://www.biodizionario.it/) by Fabrizio Zago.
|
@@ -74,30 +73,11 @@ inci.unrecognized
|
|
74
73
|
=> ["noent1", "noent2"]
|
75
74
|
```
|
76
75
|
|
77
|
-
##
|
78
|
-
|
76
|
+
## CLI
|
77
|
+
You can collect INCI data by using the available CLI interface:
|
79
78
|
|
80
|
-
### Starting Puma
|
81
|
-
Simply start Puma via the *config.ru* file included in the repository by spawning how many workers as your current workstation supports:
|
82
79
|
```shell
|
83
|
-
|
84
|
-
```
|
85
|
-
|
86
|
-
### Triggering a request
|
87
|
-
The Web API responds with a JSON object representing the original *InciScore::Response* one.
|
88
|
-
|
89
|
-
You can pass the source string directly as a HTTP parameter:
|
90
|
-
|
91
|
-
```shell
|
92
|
-
curl http://127.0.0.1:9292?src=aqua,dimethicone
|
93
|
-
=> {"components":{"aqua":0,"dimethicone":4},"unrecognized":[],"score":53.762874945799766,"valid":true}
|
94
|
-
```
|
95
|
-
|
96
|
-
## CLI API
|
97
|
-
You can collect INCI data by using the available binary:
|
98
|
-
|
99
|
-
```shell
|
100
|
-
inci_score --src="aqua,dimethicone,pej-10,noent"
|
80
|
+
inci_score --src="ingredients: aqua, dimethicone, pej-10, noent"
|
101
81
|
|
102
82
|
TOTAL SCORE:
|
103
83
|
47.18034913243358
|
@@ -112,9 +92,37 @@ UNRECOGNIZED:
|
|
112
92
|
```
|
113
93
|
|
114
94
|
### Refresh catalog
|
115
|
-
|
95
|
+
You also have the option to fetch a fresh catalog from www.biodizionario.it by specifyng a flag:
|
96
|
+
```shell
|
97
|
+
inci_score --fresh --src="aqua, dimethicone"
|
98
|
+
```
|
99
|
+
|
100
|
+
### HTTP server
|
101
|
+
The CLI interface exposes a Web layer based on the [Puma](http://puma.io/) application server.
|
102
|
+
The HTTP server is started on the specified port by spawning as many workers as your current workstation supports:
|
103
|
+
```shell
|
104
|
+
inci_score --http=9292
|
105
|
+
```
|
106
|
+
Consider all other options are discarded when running HTTP server.
|
107
|
+
|
108
|
+
#### Triggering a request
|
109
|
+
The HTTP server responds with a JSON representation of the original *InciScore::Response* object.
|
110
|
+
You can pass the source string directly as a HTTP parameter:
|
111
|
+
|
112
|
+
```shell
|
113
|
+
curl http://127.0.0.1:9292?src=aqua,dimethicone
|
114
|
+
=> {"components":{"aqua":0,"dimethicone":4},"unrecognized":[],"score":53.762874945799766,"valid":true}
|
115
|
+
```
|
116
|
+
|
117
|
+
### Getting help
|
118
|
+
You can get CLI interface help by:
|
116
119
|
```shell
|
117
|
-
inci_score --
|
120
|
+
inci_score --help
|
121
|
+
Usage: ./bin/inci_score --src='aqua, parfum, etc' --fresh
|
122
|
+
-s, --src=SRC The INCI list: 'aqua, parfum, etc'
|
123
|
+
-f, --fresh Fetch a fresh catalog from remote
|
124
|
+
--http=PORT Start Puma server on the specified port
|
125
|
+
-h, --help Prints this help
|
118
126
|
```
|
119
127
|
|
120
128
|
## Benchmark
|
@@ -123,7 +131,6 @@ inci_score --fresh --src="aqua,dimethicone,pej-10,noent"
|
|
123
131
|
I noticed the APIs slows down dramatically when dealing with unrecognized components to fuzzy match on.
|
124
132
|
I profiled the code by using the [benchmark-ips](https://github.com/evanphx/benchmark-ips) gem, finding the bottleneck was the pure Ruby implementation of the Levenshtein distance algorithm.
|
125
133
|
After some pointless optimization, i replaced this routine with a C implementation: i opted for the straightforward [Ruby Inline](https://github.com/seattlerb/rubyinline) library to call the C code straight from Ruby.
|
126
|
-
As a result i've got a 10x increment of the throughput, all without scarifying code readability.
|
127
134
|
|
128
135
|
### Platform
|
129
136
|
I registered these benchmarks with a MacBook PRO 15 mid 2015 having these specs:
|
@@ -141,11 +148,7 @@ wrk -t 4 -c 100 -d 30s --timeout 2000 http://127.0.0.1:9292/?src=<list_of_ingred
|
|
141
148
|
```
|
142
149
|
|
143
150
|
### Results
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
148
|
-
## Ruby 2.4
|
149
|
-
After upgrading to Ruby 2.4 i doubled the throughput of the matcher (24008.11 vs 48863.58 req/s): i assume Ruby optimization to the [Hash access](https://blog.heroku.com/ruby-2-4-features-hashes-integers-rounding) is the driving reason.
|
150
|
-
I also adopted the new #match? method to avoid creating a MatchData object when i am just checking for predicate.
|
151
|
-
In the end Ruby upgrade is a big deal for my gem and i recommend to give it a try!
|
151
|
+
| Ingredients | Throughput (req/s) | Latency in ms (avg/stdev/max) |
|
152
|
+
| :----------------------- | -----------------: | ----------------------------: |
|
153
|
+
| aqua,parfum,zeolite | 26054.91 | 0.63/1.03/79.86 |
|
154
|
+
| agua,porfum,zaolite | 953.44 | 14.67/5.15/82.31 |
|
data/Rakefile
CHANGED
@@ -13,6 +13,12 @@ namespace :spec do
|
|
13
13
|
t.libs << 'lib'
|
14
14
|
t.test_files = FileList['spec/integration/*_spec.rb']
|
15
15
|
end
|
16
|
+
|
17
|
+
Rake::TestTask.new(:bench) do |t|
|
18
|
+
t.libs << 'spec'
|
19
|
+
t.libs << 'lib'
|
20
|
+
t.test_files = FileList['spec/bench/*_bench.rb']
|
21
|
+
end
|
16
22
|
end
|
17
23
|
|
18
24
|
task :default => :"spec:unit"
|
data/config.ru
CHANGED
@@ -1,3 +1,3 @@
|
|
1
|
-
require 'inci_score/
|
1
|
+
require 'inci_score/app'
|
2
2
|
|
3
|
-
run InciScore::
|
3
|
+
run InciScore::App
|
@@ -0,0 +1,19 @@
|
|
1
|
+
require 'rack'
|
2
|
+
require 'inci_score'
|
3
|
+
|
4
|
+
module InciScore
|
5
|
+
module App
|
6
|
+
extend self
|
7
|
+
|
8
|
+
def catalog
|
9
|
+
@catalog ||= Catalog.fetch
|
10
|
+
end
|
11
|
+
|
12
|
+
def call(env)
|
13
|
+
req = Rack::Request.new(env)
|
14
|
+
src = req.params["src"]
|
15
|
+
json = src ? Computer.new(src: src, catalog: catalog).call.to_json : %q({"error": "no valid source"})
|
16
|
+
['200', {'Content-Type' => 'application/json'}, [json]]
|
17
|
+
end
|
18
|
+
end
|
19
|
+
end
|
data/lib/inci_score/cli.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
require "optparse"
|
2
2
|
require "inci_score/computer"
|
3
|
+
require "inci_score/server"
|
3
4
|
|
4
5
|
module InciScore
|
5
6
|
class CLI
|
@@ -9,10 +10,12 @@ module InciScore
|
|
9
10
|
@catalog = catalog
|
10
11
|
@src = nil
|
11
12
|
@fresh = nil
|
13
|
+
@port = nil
|
12
14
|
end
|
13
15
|
|
14
|
-
def call(computer_klass
|
16
|
+
def call(server_klass: Server, computer_klass: Computer, fetcher: Fetcher.new)
|
15
17
|
parser.parse!(@args)
|
18
|
+
return server_klass.new(port: @port, preload: true).run if @port
|
16
19
|
return @io.puts("Specify inci list as: --src='aqua, parfum, etc'") unless @src
|
17
20
|
@io.puts computer_klass.new(src: @src, catalog: catalog(fetcher)).call
|
18
21
|
end
|
@@ -29,6 +32,10 @@ module InciScore
|
|
29
32
|
@fresh = fresh
|
30
33
|
end
|
31
34
|
|
35
|
+
opts.on("--http=PORT", "Start Puma server on the specified port") do |port|
|
36
|
+
@port = port
|
37
|
+
end
|
38
|
+
|
32
39
|
opts.on("-h", "--help", "Prints this help") do
|
33
40
|
@io.puts opts
|
34
41
|
exit
|
data/lib/inci_score/fetcher.rb
CHANGED
@@ -21,21 +21,19 @@ module InciScore
|
|
21
21
|
end
|
22
22
|
end
|
23
23
|
|
24
|
-
private
|
25
|
-
|
26
|
-
def doc
|
24
|
+
private def doc
|
27
25
|
@src.respond_to?(:value) ? @src.value : @src
|
28
26
|
end
|
29
27
|
|
30
|
-
def semaphore(src)
|
28
|
+
private def semaphore(src)
|
31
29
|
src.match(/(#{SEMAPHORES.join('|')}).gif$/)[1]
|
32
30
|
end
|
33
31
|
|
34
|
-
def normalize(node)
|
32
|
+
private def normalize(node)
|
35
33
|
node.text.strip.downcase
|
36
34
|
end
|
37
35
|
|
38
|
-
def swap?(desc)
|
36
|
+
private def swap?(desc)
|
39
37
|
return false if desc.empty?
|
40
38
|
desc == desc.upcase
|
41
39
|
end
|
@@ -13,9 +13,8 @@ module InciScore
|
|
13
13
|
def call
|
14
14
|
@component = @rules.reduce(nil) do |component, rule|
|
15
15
|
break(component) if component
|
16
|
-
_rule = rule.new(@src, @catalog)
|
17
16
|
yield(rule) if block_given?
|
18
|
-
|
17
|
+
rule.call(@src, @catalog)
|
19
18
|
end
|
20
19
|
[@component, @catalog[@component]] if @component
|
21
20
|
end
|
@@ -4,36 +4,29 @@ module InciScore
|
|
4
4
|
using Refinements
|
5
5
|
class Recognizer
|
6
6
|
module Rules
|
7
|
-
|
8
|
-
TOLERANCE = 3
|
7
|
+
TOLERANCE = 3
|
9
8
|
|
10
|
-
|
11
|
-
|
12
|
-
@catalog = catalog
|
13
|
-
end
|
9
|
+
module Key
|
10
|
+
extend self
|
14
11
|
|
15
|
-
def call
|
16
|
-
|
12
|
+
def call(src, catalog)
|
13
|
+
src if catalog.has_key?(src)
|
17
14
|
end
|
18
15
|
end
|
19
16
|
|
20
|
-
|
21
|
-
|
22
|
-
@src if @catalog.has_key?(@src)
|
23
|
-
end
|
24
|
-
end
|
17
|
+
module Levenshtein
|
18
|
+
extend self
|
25
19
|
|
26
|
-
class Levenshtein < Base
|
27
20
|
ALTERNATE_SEP = '/'
|
28
21
|
|
29
|
-
def call
|
30
|
-
size =
|
31
|
-
initial =
|
32
|
-
component, distance =
|
22
|
+
def call(src, catalog)
|
23
|
+
size = src.size
|
24
|
+
initial = src[0]
|
25
|
+
component, distance = catalog.reduce([nil, size]) do |min, (_component, _)|
|
33
26
|
next min unless _component.start_with?(initial)
|
34
27
|
match = (n = _component.index(ALTERNATE_SEP)) ? _component[0, n] : _component
|
35
28
|
next min if match.size > (size + TOLERANCE)
|
36
|
-
dist =
|
29
|
+
dist = src.distance(match)
|
37
30
|
min = [_component, dist] if dist < min[1]
|
38
31
|
min
|
39
32
|
end
|
@@ -41,34 +34,36 @@ module InciScore
|
|
41
34
|
end
|
42
35
|
end
|
43
36
|
|
44
|
-
|
37
|
+
module Digits
|
38
|
+
extend self
|
39
|
+
|
45
40
|
MIN_MEANINGFUL = 7
|
46
41
|
|
47
|
-
def call
|
48
|
-
return if
|
49
|
-
digits =
|
50
|
-
|
42
|
+
def call(src, catalog)
|
43
|
+
return if src.size < TOLERANCE
|
44
|
+
digits = src[0, MIN_MEANINGFUL]
|
45
|
+
catalog.detect do |component, _|
|
51
46
|
component.matches?(/^#{Regexp::escape(digits)}/)
|
52
47
|
end.to_a.first
|
53
48
|
end
|
54
49
|
end
|
55
50
|
|
56
|
-
|
51
|
+
module Tokens
|
52
|
+
extend self
|
53
|
+
|
57
54
|
UNMATCHABLE = %w[extract oil sodium acid sulfate]
|
58
55
|
|
59
|
-
def call
|
60
|
-
tokens.each do |token|
|
61
|
-
|
56
|
+
def call(src, catalog)
|
57
|
+
tokens(src).each do |token|
|
58
|
+
catalog.each do |component, _|
|
62
59
|
return component if component.matches?(/\b#{Regexp.escape(token)}\b/)
|
63
60
|
end
|
64
61
|
end
|
65
62
|
nil
|
66
63
|
end
|
67
64
|
|
68
|
-
|
69
|
-
|
70
|
-
def tokens
|
71
|
-
(@src.split(' ') - UNMATCHABLE).reject { |t| t.size < TOLERANCE }.sort_by!(&:size).reverse!
|
65
|
+
def tokens(src)
|
66
|
+
(src.split(' ') - UNMATCHABLE).reject { |t| t.size < TOLERANCE }.sort_by!(&:size).reverse!
|
72
67
|
end
|
73
68
|
end
|
74
69
|
end
|
@@ -0,0 +1,35 @@
|
|
1
|
+
require 'etc'
|
2
|
+
require 'puma'
|
3
|
+
|
4
|
+
module InciScore
|
5
|
+
class Server
|
6
|
+
DEFAULT_HOST = "0.0.0.0"
|
7
|
+
|
8
|
+
def initialize(port: 9292, threads: "1:2", workers: Etc.nprocessors, preload: false,
|
9
|
+
config_klass: Puma::Configuration, launcher_klass: Puma::Launcher)
|
10
|
+
@port = port
|
11
|
+
@workers = workers
|
12
|
+
@threads = threads.split(":")
|
13
|
+
@preload = preload
|
14
|
+
@config_klass = config_klass
|
15
|
+
@launcher_klass = launcher_klass
|
16
|
+
end
|
17
|
+
|
18
|
+
def run
|
19
|
+
launcher.run
|
20
|
+
end
|
21
|
+
|
22
|
+
private def launcher
|
23
|
+
@launcher_klass.new(config)
|
24
|
+
end
|
25
|
+
|
26
|
+
private def config
|
27
|
+
@config_klass.new do |c|
|
28
|
+
c.bind "tcp://#{DEFAULT_HOST}:#{@port}"
|
29
|
+
c.workers @workers if @workers > 1
|
30
|
+
c.threads *@threads
|
31
|
+
c.preload_app! if @preload
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
end
|
data/lib/inci_score/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: inci_score
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: 2.1.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- costajob
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-01-
|
11
|
+
date: 2017-01-04 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
@@ -143,7 +143,7 @@ files:
|
|
143
143
|
- ext/levenshtein.c
|
144
144
|
- inci_score.gemspec
|
145
145
|
- lib/inci_score.rb
|
146
|
-
- lib/inci_score/
|
146
|
+
- lib/inci_score/app.rb
|
147
147
|
- lib/inci_score/catalog.rb
|
148
148
|
- lib/inci_score/cli.rb
|
149
149
|
- lib/inci_score/computer.rb
|
@@ -157,6 +157,7 @@ files:
|
|
157
157
|
- lib/inci_score/response.rb
|
158
158
|
- lib/inci_score/score.rb
|
159
159
|
- lib/inci_score/scorer.rb
|
160
|
+
- lib/inci_score/server.rb
|
160
161
|
- lib/inci_score/version.rb
|
161
162
|
- log/.gitignore
|
162
163
|
homepage: https://github.com/costajob/inci_score.git
|
data/lib/inci_score/api/app.rb
DELETED
@@ -1,21 +0,0 @@
|
|
1
|
-
require 'rack'
|
2
|
-
require 'inci_score'
|
3
|
-
|
4
|
-
module InciScore
|
5
|
-
module API
|
6
|
-
module App
|
7
|
-
extend self
|
8
|
-
|
9
|
-
def catalog
|
10
|
-
@catalog ||= Catalog.fetch
|
11
|
-
end
|
12
|
-
|
13
|
-
def call(env)
|
14
|
-
req = Rack::Request.new(env)
|
15
|
-
src = req.params["src"]
|
16
|
-
json = src ? Computer.new(src: src, catalog: catalog).call.to_json : %q({"error": "no valid source"})
|
17
|
-
['200', {'Content-Type' => 'application/json'}, [json]]
|
18
|
-
end
|
19
|
-
end
|
20
|
-
end
|
21
|
-
end
|