RubyGems - fselector - Versions diffs - 1.0.0 → 1.0.1 - Mend

fselector 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

data/ChangeLog +5 -0
data/README.md +14 -10
data/lib/fselector.rb +1 -1
data/lib/fselector/algo_base/base.rb +37 -20
data/lib/fselector/ensemble.rb +97 -43
metadata +4 -4

data/ChangeLog CHANGED Viewed

@@ -1,3 +1,8 @@
+2012-05-08	version 1.0.1
+  * modify Ensemble module so that ensemble\_by\_score() and ensemble\_by\_rank() now take Symbol, instead of Method, as argument. This allows easier and clearer function call
+  * enable select_feature! interface in Ensemble module for the type of subset selection algorithms
 2012-05-04	version 1.0.0
   * add new algorithm INTERACT for discrete feature

data/README.md CHANGED Viewed

@@ -8,8 +8,8 @@ FSelector: a Ruby gem for feature selection and ranking
 **Email**: [need47@gmail.com](mailto:need47@gmail.com)
 **Copyright**: 2012
 **License**: MIT License
-**Latest Version**: 1.0.0
-**Release Date**: 2012-05-04
+**Latest Version**: 1.0.1
+**Release Date**: 2012-05-08
 Synopsis
 --------
@@ -85,8 +85,8 @@ Feature List
     TScore                            TS          weighting   continuous    two-class
     WilcoxonRankSum                   WRS         weighting   continuous    two-class
-  **note for feature selection interace:**
-  there are two types of filter methods, i.e., feature weighting algorithms and feature subset selection algorithms
+  **note for feature selection interface:**
+  there are two types of filter methods, i.e., weighting algorithms and subset selection algorithms
   - for weighting type: use either **select\_feature\_by\_rank!** or **select\_feature\_by\_score!**
   - for subset type: use **select\_feature!**
@@ -96,7 +96,7 @@ Feature List
  - by a single algorithm
  - by multiple algorithms in a tandem manner
- - by multiple algorithms in a consensus manner
+ - by multiple algorithms in an ensemble manner
 **4. availabe normalization and discretization algorithms for continuous feature**
@@ -183,9 +183,9 @@ Usage
     require 'fselector'
-	# use both Information and ChiSquaredTest
+	# use both InformationGain and Relief_d
     r1 = FSelector::InformationGain.new
-    r2 = FSelector::ChiSquaredTest.new
+    r2 = FSelector::Relief_d.new
     # ensemble ranker
     re = FSelector::Ensemble.new(r1, r2)
@@ -193,12 +193,16 @@ Usage
     # read random data
     re.data_from_random(100, 2, 15, 3, true)
+    # replace missing value because Relief_d
+    # does not allow missing value
+    re.replace_by_most_seen_value!
     # number of features before feature selection
     puts '# features (before): ' + re.get_features.size.to_s
-    # based on the min feature rank among
-    # ensemble feature selection algorithms
-    re.ensemble_by_rank(re.method(:by_min))
+    # based on the max feature score (z-score standardized) among
+    # an ensemble of feature selection algorithms
+    re.ensemble_by_score(:by_max, :by_zscore)
     # select the top-ranked 3 features
     re.select_feature_by_rank!('<=3')

data/lib/fselector.rb CHANGED Viewed

@@ -6,7 +6,7 @@ require 'rinruby'
 #
 module FSelector
   # module version
-  VERSION = '1.0.0'
+  VERSION = '1.0.1'
 end
 # the root dir of FSelector

data/lib/fselector/algo_base/base.rb CHANGED Viewed

@@ -3,7 +3,7 @@
 #
 module FSelector
   #
-  # base class
+  # base class for a single feature selection algorithm
   #
   class Base
     # include FileIO
@@ -271,19 +271,8 @@ module FSelector
     def get_feature_ranks
       return @ranks if @ranks # already done
-      scores = get_feature_scores
-      # get the ranked features
-      @ranks = {} # feature => rank
-      # the larger, the better
-      sorted_features = scores.keys.sort do |x,y|
-        scores[y][:BEST] <=> scores[x][:BEST]
-      end
-      sorted_features.each_with_index do |sf, si|
-        @ranks[sf] = si+1
-      end
+      # make feature ranks from feature scores
+      set_ranks_from_scores
       @ranks
     end
@@ -292,11 +281,12 @@ module FSelector
     #
     # reconstruct data with selected features
     #
-    # @note data structure will be altered. Dderived class must
-    #   implement its own get\_subset(). This is only available for
-    #   the feature subset selection type of algorithms
+    # @note data structure will be altered. Derived class must
+    #   implement its own get\_feature_subset(). This is only available for
+    #   the subset selection type of algorithms, see {file:README.md}
     #
     def select_feature!
+      # derived class must implement its own one
       subset = get_feature_subset
       return if subset.empty?
@@ -320,7 +310,7 @@ module FSelector
     # @param [Hash] my_scores
     #   user customized feature scores
     # @note data structure will be altered. This is only available for
-    #   the feature weighting type of algorithms
+    #   the weighting type of algorithms, see {file:README.md}
     #
     def select_feature_by_score!(criterion, my_scores=nil)
       # user scores or internal scores
@@ -346,7 +336,7 @@ module FSelector
     # @param [Hash] my_ranks
     #   user customized feature ranks
     # @note data structure will be altered. This is only available for
-    #   the feature weighting type of algorithms
+    #   the weighting type of algorithms, see {file:README.md}
     #
     def select_feature_by_rank!(criterion, my_ranks=nil)
       # user ranks or internal ranks
@@ -382,7 +372,34 @@ module FSelector
     end
-    # get subset of feature
+    #
+    # set feature ranks from feature scores
+    #
+    # @param [Hash] scores feature scores
+    # @return [Hash] feature scores
+    # @note  the larger the score, the smaller (better) its rank
+    #
+    def set_ranks_from_scores
+      # get feature scores
+      scores = get_feature_scores
+      # get the ranked features
+      @ranks = {} # feature => rank
+      # the larger the score, the smaller (better) its rank
+      sorted_features = scores.keys.sort do |x,y|
+        scores[y][:BEST] <=> scores[x][:BEST] # use :BEST feature score
+      end
+      sorted_features.each_with_index do |sf, si|
+        @ranks[sf] = si+1
+      end
+      @ranks
+    end
+    # get subset of feature, for the type of subset selection algorithms
     def get_feature_subset
       abort "[#{__FILE__}@#{__LINE__}]: "+
               "derived class must implement its own get_feature_subset()"

data/lib/fselector/ensemble.rb CHANGED Viewed

@@ -2,12 +2,26 @@
 # FSelector: a Ruby gem for feature selection and ranking
 #
 module FSelector
-  # select feature by an ensemble of ranking algorithms
+  #
+  # feature selection by an ensemble of algorithms,
+  # sharing the same interface as single algo
+  #
+  # for the type of weighting algorithms,  you must call one of
+  # the following two functions before calling select\_feature\_by\_score! or
+  # select\_feature\_by\_rank! for feature selection:
+  # - ensemble\_by\_score()  if ensemble scores are based on those of individual algos
+  # - ensemble\_by\_rank()   if ensemble ranks are based on those of individual algos
+  #
+  # for the type of subset selection algorithm, use
+  # select\_feature! for feature selection (based on consensus features)
+  #
   class Ensemble < Base
     #
     # initialize from multiple algorithms
     #
     # @param [Array] algos multiple feature selection algorithms
+    # @note different algorithms must be of the same type,
+    #   either weighting or subset selection (see {file:README.md})
     #
     def initialize(*algos)
       super(nil)
@@ -20,8 +34,9 @@ module FSelector
     #
-    # reload set\_data
+    # reload set\_data() for Ensemble
     #
+    # @param [Hash] data source data structure
     # @note all algos share the same data structure
     #
     def set_data(data)
@@ -34,7 +49,7 @@ module FSelector
     #
-    # reload get\_feature\_scores
+    # reload get\_feature\_scores() for Ensemble
     #
     def get_feature_scores
       return @scores if @scores
@@ -45,70 +60,88 @@ module FSelector
     #
-    # reload get\_feature\_ranks
+    # reload get\_feature\_ranks() for Ensemble
     #
     def get_feature_ranks
       return @ranks if @ranks
-      abort "[#{__FILE__}@#{__LINE__}]: "+
+      if @scores # calc ranks based on scores
+        set_ranks_from_scores
+        return @ranks
+      else
+        abort "[#{__FILE__}@#{__LINE__}]: "+
               "please call one consensus ranking method first!"
+      end
     end
-    # ensemble based on score
     #
-    # @param [Method] by_what by what criterion that ensemble
-    #   score should be obtained from those of individual algorithms
+    # ensemble scores are made from those of individual algorithms
+    #
+    # @param [Symbol] ensem_method how the ensemble score should
+    #   be derived from those of individual algorithms
     #   allowed values are:
-    #   - method(:by\_min) # by min score
-    #   - method(:by\_max) # by max score
-    #   - method(:by\_ave) # by ave score
-    # @param [Integer] norm normalization
-    #   :min\_max, score scaled to [0, 1]
-    #   :zscore, score converted to zscore
+    #   - :by\_min # use min score
+    #   - :by\_max # use max score
+    #   - :by\_ave # use ave score
+    # @param [Symbol] norm_method score normalization method
+    #   :by\_min\_max, score scaled to [0, 1]
+    #   :by\_zscore, score converted to zscore
     #
     # @note scores from different algos are usually incompatible with
-    #   each other, we have to normalize it first
+    #   each other, so we need to normalize it first
     #
-    def ensemble_by_score(by_what=method(:by_max), norm=:min_max)
+    def ensemble_by_score(ensem_method=:by_max, norm_method=:by_zscore)
+      if not [:by_min, :by_max, :by_ave].include? ensem_method
+        abort "[#{__FILE__}@#{__LINE__}]: "+
+              "only :by_min, :by_max and :by_ave are supported ensemble methods!"
+      end
+      if not [:by_min_max, :by_zscore].include? norm_method
+        abort "[#{__FILE__}@#{__LINE__}]: "+
+              "only :by_min_max and :by_zscore are supported normalization methods!"
+      end
+      # normalization
       @algos.each do |r|
-        if norm == :min_max
-          normalize_min_max!(r)
-        elsif norm == :zscore
-          normalize_zscore!(r)
-        else
-          abort "[#{__FILE__}@#{__LINE__}]: "+
-              "invalid normalizer, only :min_max and :zscore supported!"
-        end
+        self.send(norm_method, r)
       end
       @scores = {}
       each_feature do |f|
         @scores[f] = {}
-        @scores[f][:BEST] = by_what.call(
-          @algos.collect { |r| r.get_feature_scores[f][:BEST] }
-        )
-      end
+        # score from individual algo
+        score_arr = @algos.collect { |r| r.get_feature_scores[f][:BEST] }
+        # ensemble score
+        @scores[f][:BEST] = self.send(ensem_method, score_arr)
+      end
     end
-    # ensemble based on rank
     #
-    # @param [Method] by_what by what criterion that ensemble
-    #   rank should be obtained from those of individual algorithms
-    #   allowed values are:
-    #   - method(:by\_min) # by min rank
-    #   - method(:by\_max) # by max rank
-    #   - method(:by\_ave) # by ave rank
+    # ensemble ranks are made from those of individual algorithms
     #
-    def ensemble_by_rank(by_what=method(:by_min))
+    # @param [Symbol] ensem_method how the ensemble rank should
+    #   be derived from those of individual algorithms
+    #   allowed values are:
+    #   - :by\_min # use min rank
+    #   - :by\_max # use max rank
+    #   - :by\_ave # use ave rank
+    #
+    def ensemble_by_rank(ensem_method=:by_min)
+      if not [:by_min, :by_max, :by_ave].include? ensem_method
+        abort "[#{__FILE__}@#{__LINE__}]: "+
+              "only :by_min, :by_max and :by_ave are supported ensemble methods!"
+      end
       ranks = {}
       each_feature do |f|
-        ranks[f] = by_what.call(
-          @algos.collect { |r| r.get_feature_ranks[f] }
-        )
+        # score from individual algo
+        rank_arr = @algos.collect { |r| r.get_feature_ranks[f] }
+        # ensemble rank
+        ranks[f] = self.send(ensem_method, rank_arr)
       end
       new_ranks = {}
@@ -123,6 +156,29 @@ module FSelector
       @ranks = new_ranks
     end
+    private
+    #
+    # reload get\_feature\_subset() for Ensemble
+    #
+    # select a subset of consensus features selected by multiple algos
+    #
+    # @note the subset of features are based on the consensus features
+    #   selected by multiple algos. This is suitable only for the type
+    #   of subset selection algorithms
+    #
+    def get_feature_subset
+      subset = get_features.dup
+      @algos.each do |r|
+        # note we call a private method here
+        r_subset = r.send(:get_feature_subset)
+        subset = subset & r_subset
+      end
+      subset
+    end
     # by average value of an array
     def by_ave(arr)
@@ -141,15 +197,13 @@ module FSelector
       arr.max if arr.class == Array
     end
-    private
     #
     # normalize feature scores of each individual alogrithm (r)
     # by scaling to [0, 1]
     #
     # @note original scores will be altered in place
     #
-    def normalize_min_max!(r)
+    def by_min_max(r)
       scores = r.get_feature_scores
       scores_best = scores.collect { |f, ks|  ks[:BEST] }
       min, max = scores_best.min, scores_best.max
@@ -166,7 +220,7 @@ module FSelector
     #
     # @note original scores will be altered in place
     #
-    def normalize_zscore!(r)
+    def by_zscore(r)
       scores = r.get_feature_scores
       scores_best = scores.collect { |f, ks|  ks[:BEST] }
       ave, sd = scores_best.ave, scores_best.sd

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: fselector
 version: !ruby/object:Gem::Version
-  version: 1.0.0
+  version: 1.0.1
   prerelease:
 platform: ruby
 authors:
@@ -9,11 +9,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-05-04 00:00:00.000000000 Z
+date: 2012-05-08 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rinruby
-  requirement: &25438824 !ruby/object:Gem::Requirement
+  requirement: &25797132 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -21,7 +21,7 @@ dependencies:
         version: 2.0.2
   type: :runtime
   prerelease: false
-  version_requirements: *25438824
+  version_requirements: *25797132
 description: FSelector is a Ruby gem that aims to integrate various feature selection/ranking
   algorithms and related functions into one single package. Welcome to contact me
   (need47@gmail.com) if you'd like to contribute your own algorithms or report a bug.