inci_score 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 7e4bae1a80f16b3684a4d38be2455c737d0fa58a
4
+ data.tar.gz: e1b1bc6d078e85953e14c071bdd82f678a87cf99
5
+ SHA512:
6
+ metadata.gz: a1234e75bf300c6de9de417f133d858876cebcaf90abac76b803ac242b5a88f8a36eea1e99badbb56ca7ef317d269de2dadaaa6502703cce05afa66ff829bdbc
7
+ data.tar.gz: e566482ae9a45829b5bbbe1c2431f35cfb72a68ba6e9eafbb2a32a75e9559a74c8d25819c520d05708f6c82691de1f139bf8628acf665f0b30a52730c4e7552a
data/.gitignore ADDED
@@ -0,0 +1,11 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ /**/.DS_Store
11
+ *.gem
data/.travis.yml ADDED
@@ -0,0 +1,7 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.0.0
4
+ - 2.1.8
5
+ - 2.2.1
6
+ - 2.3.0
7
+ before_install: gem install bundler -v 1.11.2
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in inci_score.gemspec
4
+ gemspec
data/README.md ADDED
@@ -0,0 +1,117 @@
1
+ ## Table of Contents
2
+
3
+ * [Scope](#scope)
4
+ * [INCI catalog](#inci-catalog)
5
+ * [Computation](#computation)
6
+ * [Component matching](#component-matching)
7
+ * [Sources](#sources)
8
+ * [API](#api)
9
+ * [Unrecognized components](#unrecognized-components)
10
+ * [Web API](#web-api)
11
+ * [Starting Puma](#starting-puma)
12
+ * [Triggering a request](#triggering-a-request)
13
+ * [CLI API](#cli-api)
14
+ * [Performance](#performance)
15
+ * [Levenshtein in C](#levenshtein-in-c)
16
+ * [Records](#records)
17
+
18
+ ## Scope
19
+ This gem computes the score of cosmetic components basing on the information provided by the [Biodizionario site](http://www.biodizionario.it/) by Fabrizio Zago.
20
+
21
+ ## INCI catalog
22
+ [INCI](https://en.wikipedia.org/wiki/International_Nomenclature_of_Cosmetic_Ingredients) catalog is fetched directly by the bidizionario site and kept in memory.
23
+ Currently there are more than 5000 components with a hazard score that ranges from 0 (safe) to 4 (dangerous).
24
+
25
+ ## Computation
26
+ The computation takes care to score each component of the cosmetic basing on:
27
+ * its hazard basing on the biodizionario score
28
+ * its position on the list of ingredients
29
+
30
+ The total score is then calculated on a percent basis.
31
+
32
+ ### Component matching
33
+ Since the ingredients list could come from an unreliable source (e.g. data scanned from a captured image), the gem tries to fuzzy match the ingredients by using different algorithms:
34
+ * exact matching
35
+ * [edit distance](https://en.wikipedia.org/wiki/Levenshtein_distance) behind a specified tolerance
36
+ * first relevant matching digits
37
+ * matching splitted tokens
38
+
39
+ ### Sources
40
+ The library accepts the list of ingredients as a single string of text. Since this source could come from an OCR program, the library performs a normalization by stripping invalid characters and removing the unimportant parts.
41
+ The ingredients are typically separated by comma, although normalizer will detect the most appropriate separator:
42
+
43
+ ```
44
+ "Ingredients: Aqua, Disodium Laureth Sulfosuccinate, Cocamidopropiyl\nBetaine"
45
+ ```
46
+
47
+ ## API
48
+ The API of the gem is pretty simple, you can open irb by *bundle console* and start computing the INCI score:
49
+
50
+ ```ruby
51
+ inci = InciScore::Computer.new(src: 'aqua, dimethicone').call
52
+ => #<InciScore::Response:0x000000029f8100 @components={"aqua"=>0, "dimethicone"=>4}, @score=53.762874945799766, @unrecognized=[], @valid=true>
53
+ inci.score
54
+ => 53.762874945799766
55
+ ```
56
+
57
+ As you see the results are wrapped by an *InciScore::Response* object, this is useful when dealing with the Web API (read below) and when printing them to standard output.
58
+
59
+ ### Unrecognized components
60
+ The API treats unrecognized components as a common case by just marking the object as non valid and raise a warning in case more than 30% of the ingredients are not found.
61
+ In such case the score is computed anyway by considering only recognized components.
62
+ Is still possible to query the object for its state:
63
+
64
+ ```ruby
65
+ inci = InciScore::Computer.new(src: 'ingredients:aqua,noent1,noent2').call
66
+ => #<InciScore::Response:0x000000030c16d0 @components={"aqua"=>0}, @score=100.0, @unrecognized=["noent1", "noent2"], @valid=false>
67
+ inci.valid
68
+ => false
69
+ inci.unrecognized
70
+ => ["noent1", "noent2"]
71
+ ```
72
+
73
+ ## Web API
74
+ The Web API exposes the *InciScore* library over HTTP via the [Puma](http://puma.io/) application server.
75
+
76
+ ### Starting Puma
77
+ Simply start Puma via the *config.ru* file included in the repository by spawning how many workers as your current workstation supports:
78
+ ```
79
+ bundle exec puma -w 8 -t 16:32 --preload
80
+ ```
81
+
82
+ ### Triggering a request
83
+ The Web API responds with a JSON object representing the original *InciScore::Response* one.
84
+
85
+ You can pass the source string directly as a HTTP parameter:
86
+
87
+ ```
88
+ curl http://127.0.0.1:9292?src=aqua,dimethicone
89
+ => {"components":{"aqua":0,"dimethicone":4},"unrecognized":[],"score":53.762874945799766,"valid":true}
90
+ ```
91
+
92
+ ## CLI API
93
+ You can collect INCI data by using the available binary:
94
+
95
+ ```
96
+ bin/inci_score "aqua,dimethicone,pej-10,noent"
97
+
98
+ TOTAL SCORE:
99
+ 47.18034913243358
100
+ VALID STATE:
101
+ true
102
+ COMPONENTS (hazard - name):
103
+ 0 - aqua
104
+ 4 - dimethicone
105
+ 3 - peg-10
106
+ UNRECOGNIZED:
107
+ noent
108
+ ```
109
+
110
+ ## Performance
111
+ I noticed the APIs slows down dramatically when dealing with unrecognized components to fuzzy match on.
112
+ I profiled the code by using the [benchmark-ips](https://github.com/evanphx/benchmark-ips) gem, finding the bottleneck was the pure Ruby implementation of the Levenshtein distance algorithm.
113
+ After some pointless optimization, i replaced this routine with a C implementation: i opted for the straightforward [Ruby Inline](https://github.com/seattlerb/rubyinline) library to call the C code straight from Ruby.
114
+ As a result i've got a 10x increment of the throughput, all without scarifying code readability.
115
+
116
+ ### Numbers
117
+ I moved the benchmark numbers to the [Crystal porting](https://github.com/costajob/inci_score.cr) of the InciScore library, please look there.
data/Rakefile ADDED
@@ -0,0 +1,18 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rake/testtask'
3
+
4
+ namespace :spec do
5
+ Rake::TestTask.new(:unit) do |t|
6
+ t.libs << 'spec'
7
+ t.libs << 'lib'
8
+ t.test_files = FileList['spec/unit/*_spec.rb']
9
+ end
10
+
11
+ Rake::TestTask.new(:integration) do |t|
12
+ t.libs << 'spec'
13
+ t.libs << 'lib'
14
+ t.test_files = FileList['spec/integration/*_spec.rb']
15
+ end
16
+ end
17
+
18
+ task :default => :"spec:unit"
data/bin/console ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'inci_score'
5
+ require 'irb'
6
+ require 'irb/completion'
7
+ IRB.start
data/bin/inci_score ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'inci_score'
5
+
6
+ fail ArgumentError, "please specify at least a src argument" if ARGV.empty?
7
+ puts InciScore::Computer.new(ARGV[0], InciScore::Catalog.fetch).call
data/bin/setup ADDED
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install