inci_score 1.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 7e4bae1a80f16b3684a4d38be2455c737d0fa58a
4
+ data.tar.gz: e1b1bc6d078e85953e14c071bdd82f678a87cf99
5
+ SHA512:
6
+ metadata.gz: a1234e75bf300c6de9de417f133d858876cebcaf90abac76b803ac242b5a88f8a36eea1e99badbb56ca7ef317d269de2dadaaa6502703cce05afa66ff829bdbc
7
+ data.tar.gz: e566482ae9a45829b5bbbe1c2431f35cfb72a68ba6e9eafbb2a32a75e9559a74c8d25819c520d05708f6c82691de1f139bf8628acf665f0b30a52730c4e7552a
data/.gitignore ADDED
@@ -0,0 +1,11 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ /**/.DS_Store
11
+ *.gem
data/.travis.yml ADDED
@@ -0,0 +1,7 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.0.0
4
+ - 2.1.8
5
+ - 2.2.1
6
+ - 2.3.0
7
+ before_install: gem install bundler -v 1.11.2
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in inci_score.gemspec
4
+ gemspec
data/README.md ADDED
@@ -0,0 +1,117 @@
1
+ ## Table of Contents
2
+
3
+ * [Scope](#scope)
4
+ * [INCI catalog](#inci-catalog)
5
+ * [Computation](#computation)
6
+ * [Component matching](#component-matching)
7
+ * [Sources](#sources)
8
+ * [API](#api)
9
+ * [Unrecognized components](#unrecognized-components)
10
+ * [Web API](#web-api)
11
+ * [Starting Puma](#starting-puma)
12
+ * [Triggering a request](#triggering-a-request)
13
+ * [CLI API](#cli-api)
14
+ * [Performance](#performance)
15
+ * [Levenshtein in C](#levenshtein-in-c)
16
+ * [Records](#records)
17
+
18
+ ## Scope
19
+ This gem computes the score of cosmetic components basing on the information provided by the [Biodizionario site](http://www.biodizionario.it/) by Fabrizio Zago.
20
+
21
+ ## INCI catalog
22
+ [INCI](https://en.wikipedia.org/wiki/International_Nomenclature_of_Cosmetic_Ingredients) catalog is fetched directly by the bidizionario site and kept in memory.
23
+ Currently there are more than 5000 components with a hazard score that ranges from 0 (safe) to 4 (dangerous).
24
+
25
+ ## Computation
26
+ The computation takes care to score each component of the cosmetic basing on:
27
+ * its hazard basing on the biodizionario score
28
+ * its position on the list of ingredients
29
+
30
+ The total score is then calculated on a percent basis.
31
+
32
+ ### Component matching
33
+ Since the ingredients list could come from an unreliable source (e.g. data scanned from a captured image), the gem tries to fuzzy match the ingredients by using different algorithms:
34
+ * exact matching
35
+ * [edit distance](https://en.wikipedia.org/wiki/Levenshtein_distance) behind a specified tolerance
36
+ * first relevant matching digits
37
+ * matching splitted tokens
38
+
39
+ ### Sources
40
+ The library accepts the list of ingredients as a single string of text. Since this source could come from an OCR program, the library performs a normalization by stripping invalid characters and removing the unimportant parts.
41
+ The ingredients are typically separated by comma, although normalizer will detect the most appropriate separator:
42
+
43
+ ```
44
+ "Ingredients: Aqua, Disodium Laureth Sulfosuccinate, Cocamidopropiyl\nBetaine"
45
+ ```
46
+
47
+ ## API
48
+ The API of the gem is pretty simple, you can open irb by *bundle console* and start computing the INCI score:
49
+
50
+ ```ruby
51
+ inci = InciScore::Computer.new(src: 'aqua, dimethicone').call
52
+ => #<InciScore::Response:0x000000029f8100 @components={"aqua"=>0, "dimethicone"=>4}, @score=53.762874945799766, @unrecognized=[], @valid=true>
53
+ inci.score
54
+ => 53.762874945799766
55
+ ```
56
+
57
+ As you see the results are wrapped by an *InciScore::Response* object, this is useful when dealing with the Web API (read below) and when printing them to standard output.
58
+
59
+ ### Unrecognized components
60
+ The API treats unrecognized components as a common case by just marking the object as non valid and raise a warning in case more than 30% of the ingredients are not found.
61
+ In such case the score is computed anyway by considering only recognized components.
62
+ Is still possible to query the object for its state:
63
+
64
+ ```ruby
65
+ inci = InciScore::Computer.new(src: 'ingredients:aqua,noent1,noent2').call
66
+ => #<InciScore::Response:0x000000030c16d0 @components={"aqua"=>0}, @score=100.0, @unrecognized=["noent1", "noent2"], @valid=false>
67
+ inci.valid
68
+ => false
69
+ inci.unrecognized
70
+ => ["noent1", "noent2"]
71
+ ```
72
+
73
+ ## Web API
74
+ The Web API exposes the *InciScore* library over HTTP via the [Puma](http://puma.io/) application server.
75
+
76
+ ### Starting Puma
77
+ Simply start Puma via the *config.ru* file included in the repository by spawning how many workers as your current workstation supports:
78
+ ```
79
+ bundle exec puma -w 8 -t 16:32 --preload
80
+ ```
81
+
82
+ ### Triggering a request
83
+ The Web API responds with a JSON object representing the original *InciScore::Response* one.
84
+
85
+ You can pass the source string directly as a HTTP parameter:
86
+
87
+ ```
88
+ curl http://127.0.0.1:9292?src=aqua,dimethicone
89
+ => {"components":{"aqua":0,"dimethicone":4},"unrecognized":[],"score":53.762874945799766,"valid":true}
90
+ ```
91
+
92
+ ## CLI API
93
+ You can collect INCI data by using the available binary:
94
+
95
+ ```
96
+ bin/inci_score "aqua,dimethicone,pej-10,noent"
97
+
98
+ TOTAL SCORE:
99
+ 47.18034913243358
100
+ VALID STATE:
101
+ true
102
+ COMPONENTS (hazard - name):
103
+ 0 - aqua
104
+ 4 - dimethicone
105
+ 3 - peg-10
106
+ UNRECOGNIZED:
107
+ noent
108
+ ```
109
+
110
+ ## Performance
111
+ I noticed the APIs slows down dramatically when dealing with unrecognized components to fuzzy match on.
112
+ I profiled the code by using the [benchmark-ips](https://github.com/evanphx/benchmark-ips) gem, finding the bottleneck was the pure Ruby implementation of the Levenshtein distance algorithm.
113
+ After some pointless optimization, i replaced this routine with a C implementation: i opted for the straightforward [Ruby Inline](https://github.com/seattlerb/rubyinline) library to call the C code straight from Ruby.
114
+ As a result i've got a 10x increment of the throughput, all without scarifying code readability.
115
+
116
+ ### Numbers
117
+ I moved the benchmark numbers to the [Crystal porting](https://github.com/costajob/inci_score.cr) of the InciScore library, please look there.
data/Rakefile ADDED
@@ -0,0 +1,18 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rake/testtask'
3
+
4
+ namespace :spec do
5
+ Rake::TestTask.new(:unit) do |t|
6
+ t.libs << 'spec'
7
+ t.libs << 'lib'
8
+ t.test_files = FileList['spec/unit/*_spec.rb']
9
+ end
10
+
11
+ Rake::TestTask.new(:integration) do |t|
12
+ t.libs << 'spec'
13
+ t.libs << 'lib'
14
+ t.test_files = FileList['spec/integration/*_spec.rb']
15
+ end
16
+ end
17
+
18
+ task :default => :"spec:unit"
data/bin/console ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'inci_score'
5
+ require 'irb'
6
+ require 'irb/completion'
7
+ IRB.start
data/bin/inci_score ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'inci_score'
5
+
6
+ fail ArgumentError, "please specify at least a src argument" if ARGV.empty?
7
+ puts InciScore::Computer.new(ARGV[0], InciScore::Catalog.fetch).call
data/bin/setup ADDED
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install