RubyGems - inci_score - Versions diffs - 4.3.0 → 4.5.0 - Mend

inci_score 4.3.0 → 4.5.0

Files changed (10) hide show

checksums.yaml +4 -4
data/README.md +10 -9
data/config/catalog.yml +0 -1
data/config/hazards.yml +29 -31
data/lib/inci_score/recognizer.rb +1 -5
data/lib/inci_score/recognizer_rules.rb +22 -12
data/lib/inci_score/response.rb +24 -5
data/lib/inci_score/scorer.rb +4 -6
data/lib/inci_score/version.rb +1 -1
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 0affbfe591b6551bc8761ccfc245e1b233f311e5804da9761661ebe959e5be4a
-  data.tar.gz: 772501375864af4b28617351a060a839048e97794c1b596062771d3e6d807e8a
+  metadata.gz: 462ec33d1c493272235feaef061ac62822c4dfd6ad6c339e858da8fdfa491894
+  data.tar.gz: 42f1e47b971185e92d4af19f2fbdee0f363dc9f241041b8ed800230ae9bd0e22
 SHA512:
-  metadata.gz: 0367e213bff98076b13ed908f36380f1e452dc4a8c1c85c3e16c0b4a69acac66fdcf783787040b2004603175fd8476def74deea86b41c05064baddbea4e5563a
-  data.tar.gz: 931e2a90a874d865418481a3beae838f614537ababf9386edc0d186d0921786972a54eb6cfa73ab235c260412241bc557298814cad07bc493b06aaf04651ae37
+  metadata.gz: cc4f049d56ea9fc60ce92943d7da50c15b697f8712733b409e3327ab73b3c4c0e60d0687f55fcbbefb9f4fd2fdc4c05fe5aa86435f5b75fb558bf456744c9cc4
+  data.tar.gz: a706360921a1cc36b1b5f1fef53932b859de69a27b02fd8b662b76dc8fe9808e1409779297bfa159c0d57bc6bb1dfb6e5287d4455def84500ce194fa0102706a

data/README.md CHANGED Viewed

@@ -9,8 +9,9 @@
 * [Usage](#usage)
   * [Library](#library)
   * [CLI](#cli)
-* [Benchmark](#benchmark)
+* [Benchmarks](#benchmark)
   * [Levenshtein in C](#levenshtein-in-c)
+  * [Run benchmarks](#run-benchmarks)
 ## Scope
 This gem computes the score of cosmetic components basing on the information provided by the [Biodizionario site](http://www.biodizionario.it/) by Fabrizio Zago.
@@ -56,7 +57,7 @@ You can include this gem into your own library and start computing the INCI scor
 require "inci_score"
 inci = InciScore::Computer.new(src: 'aqua, dimethicone').call
-inci.score # 53.7629
+inci.score # 53.76
 ```
 As you see the results are wrapped by an *InciScore::Response* object, this is useful when dealing with the CLI and HTTP interfaces (read below).
@@ -80,12 +81,10 @@ inci_score --src="ingredients: aqua, dimethicone, pej-10, noent"
 TOTAL SCORE:
       	47.18
-VALID STATE:
-      	true
 PRECISION:
       	75.0
 COMPONENTS:
-      	aqua\n	dimethicone\n	peg-10
+      	aqua (0), dimethicone (4), peg-10 (3)
 UNRECOGNIZED:
       	noent
 ```
@@ -98,15 +97,17 @@ Usage: inci_score --src="aqua, parfum, etc"
     -h, --help                       Prints this help
 ```
-## Benchmark
+## Benchmarks
 ### Levenshtein in C
 I noticed the APIs slows down dramatically when dealing with unrecognized components to fuzzy match on.
 I profiled the code by using the [benchmark-ips](https://github.com/evanphx/benchmark-ips) gem, finding the bottleneck was the pure Ruby implementation of the Levenshtein distance algorithm.
-After some pointless optimization, i replaced this routine with a C implementation: i opted for the straightforward [Ruby Inline](https://github.com/seattlerb/rubyinline) library to call the C code straight from Ruby.
-Once downloaded source code, run the bench specs by:
+After some pointless optimization, i replaced this routine with a C implementation: i opted for the straightforward [Ruby Inline](https://github.com/seattlerb/rubyinline) library to call the C code straight from Ruby, gaining an order of magnitude in speed (x30).
+### Run benchmarks
+Once downloaded source code, run the benchmarks by:
 ```shell
-bundle exec rake spec:bench
+bundle exec rake bench
 ```

data/config/catalog.yml CHANGED Viewed

@@ -1,5 +1,4 @@
 ---
-generic-hazard: 3
 aqua: 0
 water: 0
 parfum: 3

data/config/hazards.yml CHANGED Viewed

@@ -1,31 +1,29 @@
-- peg-
-- ppg-
-- dea-
-- mipa-
-- edta-
-- thicone
-- siloxane
-- chlorexidine
-- petrolatum
-- paraffinum liquidum
-- carbomer
-- crosspolymer
-- acrylate
-- styrene
-- copolymer
-- triethanolamine
-- triclosan
-- dmdm
-- hydantoin
-- imidazolidinyl urea
-- diazolidinyl urea
-- formaldheyde
-- methylchloroisothiazolinone
-- methylisothiazolinone
-- sodium hydroxymethylglycinate
-- nonoxynol
-- poloxamer
-- trimonium
-- dimonium
-- glycol
-- glicol
+---
+peg-: 3
+ppg-: 3
+dea-: 3
+mipa-: 3
+edta-: 4
+thicone: 4
+siloxane: 4
+chlorexidine: 4
+petrolatum: 3
+paraffinum: 3
+carbomer: 3
+crosspolymer: 3
+acrylate: 3
+styrene: 3
+copolymer: 3
+triethanolamine: 3
+triclosan: 4
+dmdm: 3
+hydantoin: 3
+imidazolidinyl: 4
+diazolidinyl: 3
+methylchloroisothiazolinone: 3
+methylisothiazolinone: 3
+nonoxynol: 4
+poloxamer: 3
+trimonium: 3
+dimonium: 3
+glycol: 3

data/lib/inci_score/recognizer.rb CHANGED Viewed

@@ -4,8 +4,6 @@ module InciScore
   class Recognizer
     DEFAULT_RULES = [Rules::Key, Rules::Levenshtein, Rules::Hazard, Rules::Prefix, Rules::Tokens].freeze
-    Component = Struct.new(:name, :hazard)
     attr_reader :ingredient, :rules, :applied
     def initialize(ingredient, rules = DEFAULT_RULES)
@@ -17,9 +15,7 @@ module InciScore
     def call
       return if ingredient.to_s.empty?
-      component = find_component
-      return unless component
-      Component.new(component, Config::CATALOG[component])
+      find_component
     end
     private

data/lib/inci_score/recognizer_rules.rb CHANGED Viewed

@@ -7,14 +7,23 @@ module InciScore
     module Rules
       TOLERANCE = 3
-      Key = ->(src) { src if Config::CATALOG.has_key?(src) }
+      Component = Struct.new(:name, :hazard)
-      Hazard = ->(src) { 'generic-hazard' if Config::HAZARDS.any? { |h| src.include?(h) } }
+      Key = ->(src) do
+        score = Config::CATALOG[src]
+        Component.new(src, score) if score
+      end
+      Hazard = ->(src) do
+        if hazard = Config::HAZARDS.detect { |name, _| src.include?(name) }
+          Component.new(src, hazard.last)
+        end
+      end
       module Levenshtein
         extend self
-        Result = Struct.new(:name, :distance) do
+        Result = Struct.new(:name, :distance, :score) do
           def tolerable?(size)
             distance < TOLERANCE && distance <= (size-1)
           end
@@ -25,14 +34,14 @@ module InciScore
           size = src.size
           farthest = Result.new(nil, size)
           initial = src[0]
-          result = Config::CATALOG.reduce(farthest) do |nearest, (component, _)|
-            next nearest unless component.start_with?(initial)
-            next nearest if component.size > (size + TOLERANCE)
-            d = src.distance(component)
-            nearest = Result.new(component, d) if d < nearest.distance
+          result = Config::CATALOG.reduce(farthest) do |nearest, (name, score)|
+            next nearest unless name.start_with?(initial)
+            next nearest if name.size > (size + TOLERANCE)
+            d = src.distance(name)
+            nearest = Result.new(name, d, score) if d < nearest.distance
             nearest
           end
-          result.name if result.tolerable?(size)
+          Component.new(result.name, result.score) if result.tolerable?(size)
         end
       end
@@ -44,7 +53,8 @@ module InciScore
         def call(src)
           return if src.size < TOLERANCE
           digits = src[0, MIN_MEANINGFUL]
-          Config::CATALOG.detect { |component, _| component.start_with?(digits) }.to_a.first
+          pairs = Config::CATALOG.detect { |name, _| name.start_with?(digits) }.to_a.first
+          Component.new(*pairs) if pairs
         end
       end
@@ -56,8 +66,8 @@ module InciScore
         def call(src)
           return if src.size <= TOLERANCE
           tokens(src).each do |token|
-            Config::CATALOG.each do |component, _|
-              return component if component.include?(token)
+            Config::CATALOG.each do |name, score|
+              return Component.new(name, score) if name.include?(token)
             end
           end
           nil

data/lib/inci_score/response.rb CHANGED Viewed

@@ -20,16 +20,35 @@ module InciScore
     end
     def to_s
+      [score_str, precision_str, components_str, unrecognized_str].join
+    end
+    private
+    def score_str
       %Q{
 TOTAL SCORE:
-      \t#{score}
+      \t#{score}}
+    end
+    def precision_str
+      %Q{
 PRECISION:
-      \t#{precision}
+      \t#{precision}}
+    end
+    def components_str
+      return '' if components.empty?
+      %Q{
 COMPONENTS:
-      \t#{components.map { |c| "#{c.name} (#{c.hazard})" }.join(', ')}
+      \t#{components.map { |c| "#{c.name} (#{c.hazard})" }.join(', ')}}
+    end
+    def unrecognized_str
+      return '' if unrecognized.empty?
+      %Q{
 UNRECOGNIZED:
-      \t#{unrecognized.join(', ')}
-      }
+      \t#{unrecognized.join(', ')}}
     end
   end
 end

data/lib/inci_score/scorer.rb CHANGED Viewed

@@ -2,7 +2,7 @@
 module InciScore
   class Scorer
-    HAZARD_PERCENT = 25
+    HAZARD_RATIO = 25
     WEIGHT_FACTOR = 5
     attr_reader :hazards, :size
@@ -15,7 +15,7 @@ module InciScore
     def call
       return 0 if hazards.empty?
-      (100 - avg * HAZARD_PERCENT).round(4)
+      (100 - avg * HAZARD_RATIO).round(4)
     end
     private
@@ -25,10 +25,8 @@ module InciScore
     end
     def avg_weighted
-      return hazards.reduce(&:+) if same_hazard?
-      weighted.reduce(0.0) do |acc,score|
-        acc += score.value
-      end
+      return hazards.sum if same_hazard?
+      weighted.sum(&:value)
     end
     def same_hazard?

data/lib/inci_score/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module InciScore
-  VERSION = '4.3.0'
+  VERSION = '4.5.0'
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: inci_score
 version: !ruby/object:Gem::Version
-  version: 4.3.0
+  version: 4.5.0
 platform: ruby
 authors:
 - costajob