RubyGems - fselector - Versions diffs - 0.2.0 → 0.3.0 - Mend

fselector 0.2.0 → 0.3.0

Files changed (15) hide show

data/README.md +46 -33
data/lib/fselector.rb +6 -1
data/lib/fselector/algo_base/base.rb +14 -3
data/lib/fselector/algo_base/base_CFS.rb +12 -0
data/lib/fselector/algo_base/base_continuous.rb +2 -2
data/lib/fselector/algo_continuous/CFS_c.rb +10 -0
data/lib/fselector/algo_continuous/ReliefF_c.rb +3 -0
data/lib/fselector/algo_continuous/Relief_c.rb +3 -0
data/lib/fselector/algo_continuous/discretizer.rb +161 -7
data/lib/fselector/algo_continuous/normalizer.rb +3 -3
data/lib/fselector/algo_discrete/CFS_d.rb +6 -0
data/lib/fselector/entropy.rb +31 -31
data/lib/fselector/fileio.rb +15 -3
data/lib/fselector/replace_missing_values.rb +78 -0
metadata +13 -10

data/README.md CHANGED

@@ -8,30 +8,41 @@ FSelector: a Ruby gem for feature selection and ranking
 **Email**: [need47@gmail.com](mailto:need47@gmail.com)
 **Copyright**: 2012
 **License**: MIT License
-**Latest Version**: 0.2.0
-**Release Date**: April 1st 2012
+**Latest Version**: 0.3.0
+**Release Date**: April 3rd 2012
 Synopsis
 --------
-FSelector is a Ruby gem that aims to integrate various feature selection/ranking
-algorithms into one single package. Welcome to contact me (need47@gmail.com)
-if you want to contribute your own algorithms or report a bug. FSelector enables
-the user to perform feature selection by using either a single algorithm or an
-ensemble of algorithms. FSelector acts on a full-feature data set with CSV, LibSVM
-or WEKA file format and outputs a reduced data set with only selected subset of
-features, which can later be used as the input for various machine learning softwares
-including LibSVM and WEKA. FSelector, itself, does not implement any of the machine
-learning algorithms such as support vector machines and random forest. Below is a
-summary of FSelector's features.
+FSelector is a Ruby gem that aims to integrate various feature
+selection/ranking algorithms and related functions into one single
+package. Welcome to contact me (need47@gmail.com) if you'd like to
+contribute your own algorithms or report a bug. FSelector allows user
+to perform feature selection by using either a single algorithm or an
+ensemble of multiple algorithms, and other common tasks including
+normalization and discretization on continuous data, as well as replace
+missing feature values with certain criterion. FSelector acts on a
+full-feature data set in either CSV, LibSVM or WEKA file format and
+outputs a reduced data set with only selected subset of features, which
+can later be used as the input for various machine learning softwares
+including LibSVM and WEKA. FSelector, itself, does not implement
+any of the machine learning algorithms such as support vector machines
+and random forest. See below for a list of FSelector's features.
 Feature List
 ------------
-**1. available feature selection/ranking algorithms**
+**1. supported input/output file types**
+ - csv
+ - libsvm
+ - weka ARFF
+ - random data (for test purpose)
+**2. available feature selection/ranking algorithms**
-    algorithm                       alias      feature type
-    -------------------------------------------------------
+    algorithm                       alias       feature type
+    --------------------------------------------------------
     Accuracy                        Acc         discrete
     AccuracyBalanced                Acc2        discrete
     BiNormalSeparation              BNS         discrete
@@ -67,29 +78,31 @@ Feature List
     ReliefF_c                       ReliefF_c   continuous
     TScore                          TS          continuous
-**2. feature selection approaches**
+**3. feature selection approaches**
  - by a single algorithm
  - by multiple algorithms in a tandem manner
  - by multiple algorithms in a consensus manner
-**3. availabe normalization and discretization algorithms for continuous feature**
+**4. availabe normalization and discretization algorithms for continuous feature**
     algorithm          note
-    --------------------------------------------------------------------
-    log                normalization by logarithmic transformation
-    min_max            normalization by scaling into [min, max]
-    zscore             normalization by converting into zscore
-    equal_width        discretization by equal width among intervals
-    equal_frequency    discretization by equal frequency among intervals
-    ChiMerge           discretization by ChiMerge method
-**4. supported input/output file types**
- - csv
- - libsvm
- - weka ARFF
- - random data (for test purpose)
+    -----------------------------------------------------------------
+    log                normalize by logarithmic transformation
+    min_max            normalize by scaling into [min, max]
+    zscore             normalize by converting into zscore
+    equal_width        discretize by equal width among intervals
+    equal_frequency    discretize by equal frequency among intervals
+    ChiMerge           discretize by ChiMerge method
+    MID                discretize by Multi-Interval Discretization
+**5. availabe algorithms for replacing missing feature values**
+    algorithm          note                                  feature type
+    --------------------------------------------------------------------------------------
+    fixed_value        replace with a fixed value            discrete, continuous
+    mean_value         replace with the mean feature value   continuous
+    most_seen_value    replace with most seen feature value  discrete
 Installing
 ----------
@@ -187,11 +200,11 @@ Usage
     r1.data_from_csv('test/iris.csv')
     # normalization by log2 (optional)
-    # r1.normalize_log!(2)
+    # r1.normalize_by_log!(2)
     # discretization by ChiMerge algorithm
     # chi-squared value = 4.60 for a three-class problem at alpha=0.10
-    r1.discretize_by_chimerge!(4.60)
+    r1.discretize_by_ChiMerge!(4.60)
     # apply Fast Correlation-Based Filter (FCBF) algorithm for discrete feature
     # initialize with discretized data from r1

data/lib/fselector.rb CHANGED

@@ -3,7 +3,7 @@
 #
 module FSelector
   # module version
-  VERSION = '0.2.0'
+  VERSION = '0.3.0'
 end
 ROOT = File.expand_path(File.dirname(__FILE__))
@@ -11,9 +11,14 @@ ROOT = File.expand_path(File.dirname(__FILE__))
 #
 # include necessary files
 #
+# read and write file, supported formats include CSV, LibSVM and WEKA files
 require "#{ROOT}/fselector/fileio.rb"
+# extend Array and String class
 require "#{ROOT}/fselector/util.rb"
+# entropy-related functions
 require "#{ROOT}/fselector/entropy.rb"
+# replace missing values
+require "#{ROOT}/fselector/replace_missing_values.rb"
 #
 # base class

data/lib/fselector/algo_base/base.rb CHANGED

@@ -8,6 +8,8 @@ module FSelector
   class Base
     # include FileIO
     include FileIO
+    # include ReplaceMissingValues
+    include ReplaceMissingValues
     # initialize from an existing data structure
     def initialize(data=nil)
@@ -167,13 +169,13 @@ module FSelector
     def set_data(data)
       if data and data.class == Hash
         @data = data
-        # clear
-        @classes, @features, @fvs = nil, nil, nil
-        @scores, @ranks, @sz = nil, nil, nil
+        # clear variables
+        clear_vars
       else
         abort "[#{__FILE__}@#{__LINE__}]: "+
               "data must be a Hash object!"
       end
       data
     end
@@ -335,6 +337,14 @@ module FSelector
     private
+    # clear variables when data structure is altered
+    def clear_vars
+      @classes, @features, @fvs = nil, nil, nil
+      @scores, @ranks, @sz = nil, nil, nil
+      @cv, @fvs = nil, nil
+    end
     # set feature (f) score (s) for class (k)
     def set_feature_score(f, k, s)
       @scores ||= {}
@@ -342,6 +352,7 @@ module FSelector
       @scores[f][k] = s
     end
     # get subset of feature
     def get_feature_subset
       abort "[#{__FILE__}@#{__LINE__}]: "+

data/lib/fselector/algo_base/base_CFS.rb CHANGED

@@ -21,6 +21,9 @@ module FSelector
     # use sequential forward search
     def get_feature_subset
+      # handle missing values
+      handle_missing_value
       subset = []
       feats = get_features.dup
@@ -58,6 +61,15 @@ module FSelector
     end # get_feature_subset
+    # handle missing values
+    # CFS replaces missing values with the mean for continous features and
+    # the most seen value for discrete features
+    def handle_missing_values
+      abort "[#{__FILE__}@#{__LINE__}]: "+
+             "derived CFS algo must implement its own handle_missing_values()"
+    end
     # calc new merit of subset when adding feature (f)
     def calc_merit(subset, f)
       k = subset.size.to_f + 1

data/lib/fselector/algo_base/base_continuous.rb CHANGED

@@ -10,8 +10,8 @@ module FSelector
   class BaseContinuous < Base
     # include normalizer
     include Normalizer
-    # include discretilizer
-    include Discretilizer
+    # include discretizer
+    include Discretizer
     # initialize from an existing data structure
     def initialize(data=nil)

data/lib/fselector/algo_continuous/CFS_c.rb CHANGED

@@ -8,8 +8,18 @@ module FSelector
 # ref: [Feature Selection for Discrete and Numeric Class Machine Learning](http://www.cs.waikato.ac.nz/ml/publications/1999/99MH-Feature-Select.pdf)
 #
   class CFS_c < BaseCFS
+    # include normalizer and discretizer
+    include Normalizer
+    include Discretizer
     private
+    # replace missing values with mean feature value
+    def handle_missing_values
+      replace_with_mean_value!
+    end
     # calc the feature-class correlation of two vectors
     def do_rcf(cv, fv)

data/lib/fselector/algo_continuous/ReliefF_c.rb CHANGED

@@ -10,6 +10,9 @@ module FSelector
 # ref: [Estimating Attributes: Analysis and Extensions of RELIEF](http://www.springerlink.com/content/fp23jh2h0426ww45/)
 #
   class ReliefF_c < BaseReliefF
+    # include normalizer and discretizer
+    include Normalizer
+    include Discretizer
     private

data/lib/fselector/algo_continuous/Relief_c.rb CHANGED

@@ -10,6 +10,9 @@ module FSelector
 # ref: [The Feature Selection Problem: Traditional Methods and a New Algorithm](http://www.aaai.org/Papers/AAAI/1992/AAAI92-020.pdf)
 #
   class Relief_c < BaseRelief
+    # include normalizer and discretizer
+    include Normalizer
+    include Discretizer
     private

data/lib/fselector/algo_continuous/discretizer.rb CHANGED

@@ -1,7 +1,10 @@
 #
-# discretilize continous feature
+# discretize continous feature
 #
-module Discretilizer
+module Discretizer
+  # include Entropy module
+  include Entropy
   # discretize by equal-width intervals
   #
   # @param [Integer] n_interval
@@ -84,7 +87,7 @@ module Discretilizer
   #             2          4.60    5.99    9.21    13.82
   #             3          6.35    7.82    11.34   16.27
   #
-  def discretize_by_chimerge!(chisq)
+  def discretize_by_ChiMerge!(chisq)
     # chisq = 4.60 # for iris::Sepal.Length
     # for intialization
     hzero = {}
@@ -177,19 +180,71 @@ module Discretilizer
       end
     end
-  end # discretize_chimerge!
+  end # discretize_ChiMerge!
+  #
+  # discretize by Multi-Interval Discretization (MID) algorithm
+  # @note no missing feature values allowed and data structure will be altered
+  #
+  # ref: [Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning](http://www.ijcai.org/Past%20Proceedings/IJCAI-93-VOL2/PDF/022.pdf)
+  #
+  def discretize_by_MID!
+    # determine the final boundaries
+    f2cp = {} # cut points for each feature
+    each_feature do |f|
+      cv = get_class_labels
+      # we assume no missing feature values
+      fv = get_feature_values(f)
+      n = cv.size
+      # sort cv and fv according ascending order of fv
+      sis = (0...n).to_a.sort { |i,j| fv[i] <=> fv[j] }
+      cv = cv.values_at(*sis)
+      fv = fv.values_at(*sis)
+      # get initial boundaries
+      bs = []
+      fv.each_with_index do |v, i|
+        # cut point (Ta) for feature A must always be a value between
+        # two examples of different classes in the sequence of sorted examples
+        # see orginal reference
+        if i < n-1 and cv[i] != cv[i+1]
+          bs << (v+fv[i+1])/2.0
+        end
+      end
+      bs.uniq! # remove duplicates
+      # main algorithm, iteratively determine cut point
+      cp = []
+      partition(cv, fv, bs, cp)
+      # add the rightmost boundary for convenience
+      cp << fv.max+1.0
+      # record cut points for feature (f)
+      f2cp[f] = cp
+    end
+    # discretize based on cut points
+    each_sample do |k, s|
+      s.keys.each do |f|
+        s[f] = get_index(s[f], f2cp[f])
+      end
+    end
+  end # discretize_by_MID!
   private
   # get index from sorted boundaries
   #
   # min -- | -- | -- | ... max |
-  #        b0   b1   b2        bn(=max+1)
-  #      0    1    2   ...   n
+  #        b1   b2   b3        bn(=max+1)
+  #      1    2    3   ...   n
   #
   def get_index(v, boundaries)
     boundaries.each_with_index do |b, i|
-      return i if v < b
+      return i+1 if v < b
     end
   end # get_index
@@ -215,4 +270,103 @@ module Discretilizer
   end # calc_chisq
+  #
+  # Multi-Interval Discretization main algorithm
+  # recursively always selecting the best cut point
+  #
+  # @param [Array] cv class labels
+  # @param [Array] fv feature values
+  # @param [Array] bs potential cut points
+  # @param [Array] cp resultant cut points
+  def partition(cv, fv, bs, cp)
+    # best cut point
+    cp_best = nil
+    # binary subset at the best cut point
+    cv1_best, cv2_best = nil, nil
+    fv1_best, fv2_best = nil, nil
+    bs1_best, bs2_best = nil, nil
+    # best information gain
+    gain_best = -100.0
+    ent_best = -100.0
+    ent1_best = -100.0
+    ent2_best = -100.0
+    # try each potential cut point
+    bs.each do |b|
+      # binary split
+      cv1_try, cv2_try, fv1_try, fv2_try, bs1_try, bs2_try =
+        binary_split(cv, fv, bs, b)
+      # gain for this cut point
+      ent_try = get_marginal_entropy(cv)
+      ent1_try = get_marginal_entropy(cv1_try)
+      ent2_try = get_marginal_entropy(cv2_try)
+      gain_try = ent_try -
+                 (cv1_try.size.to_f/cv.size) * ent1_try -
+                 (cv2_try.size.to_f/cv.size) * ent2_try
+      #pp gain_try
+      if gain_try > gain_best
+        cp_best = b
+        cv1_best, cv2_best = cv1_try, cv2_try
+        fv1_best, fv2_best = fv1_try, fv2_try
+        bs1_best, bs2_best = bs1_try, bs2_try
+        gain_best = gain_try
+        ent_best = ent_try
+        ent1_best, ent2_best = ent1_try, ent2_try
+      end
+    end
+    # to cut or not to cut?
+    #
+    # Gain(A,T;S) > 1/N * log2(N-1) + 1/N * delta(A,T;S)
+    if cp_best
+      n = cv.size.to_f
+      k = cv.uniq.size.to_f
+      k1 = cv1_best.uniq.size.to_f
+      k2 = cv2_best.uniq.size.to_f
+      delta = Math.log2(3**k-2)-(k*ent_best - k1*ent1_best - k2*ent2_best)
+      # accept cut point
+      if gain_best > (Math.log2(n-1)/n + delta/n)
+        # a: record cut point
+        cp << cp_best
+        # b: recursively call on subset
+        partition(cv1_best, fv1_best, bs1_best, cp)
+        partition(cv2_best, fv2_best, bs2_best, cp)
+      end
+    end
+  end
+  # binarily split based on a cut point
+  def binary_split(cv, fv, bs, cut_point)
+    cv1, cv2, fv1, fv2, bs1, bs2 = [], [], [], [], [], []
+    fv.each_with_index do |v, i|
+      if v < cut_point
+        cv1 << cv[i]
+        fv1 << v
+      else
+        cv2 << cv[i]
+        fv2 << v
+      end
+    end
+    bs.each do |b|
+      if b < cut_point
+        bs1 << b
+      else
+        bs2 << b
+      end
+    end
+    # return subset
+    [cv1, cv2, fv1, fv2, bs1, bs2]
+  end
 end # module

data/lib/fselector/algo_continuous/normalizer.rb CHANGED

@@ -3,7 +3,7 @@
 #
 module Normalizer
    # log transformation, requires positive feature values
-   def normalize_log!(base=10)
+   def normalize_by_log!(base=10)
      each_sample do |k, s|
        s.keys.each do |f|
          s[f] = Math.log(s[f], base) if s[f] > 0.0
@@ -13,7 +13,7 @@ module Normalizer
    # scale to [min,max], max > min
-   def normalize_min_max!(min=0.0, max=1.0)
+   def normalize_by_min_max!(min=0.0, max=1.0)
      # first determine min and max for each feature
      f2min_max = {}
@@ -33,7 +33,7 @@ module Normalizer
    # by z-score
-   def normalize_zscore!
+   def normalize_by_zscore!
      # first determine mean and sd for each feature
      f2mean_sd = {}

data/lib/fselector/algo_discrete/CFS_d.rb CHANGED

@@ -12,6 +12,12 @@ module FSelector
     include Entropy
     private
+    # replace missing values with most seen feature value
+    def handle_missing_values
+      replace_with_most_seen_value!
+    end
     # calc the feature-class correlation of two vectors
     def do_rcf(cv, fv)

data/lib/fselector/entropy.rb CHANGED

@@ -7,16 +7,16 @@ module Entropy
   #
   # H(X) = -1 * sigma_i (P(x_i) logP(x_i))
   #
-  def get_marginal_entropy(arrX)
+   def get_marginal_entropy(arrX)
     h = 0.0
     n = arrX.size.to_f
-  arrX.uniq.each do |x_i|
-    p = arrX.count(x_i)/n
-    h += -1.0 * (p * Math.log2(p))
-  end
-  h
+    arrX.uniq.each do |x_i|
+      p = arrX.count(x_i)/n
+      h += -1.0 * (p * Math.log2(p))
+    end
+    h
   end # get_marginal_entropy
@@ -27,28 +27,28 @@ module Entropy
   #
   # where H(X|y_j) = -1 * sigma_i (P(x_i|y_j) logP(x_i|y_j))
   #
-  def get_conditional_entropy(arrX, arrY)
-  abort "[#{__FILE__}@#{__LINE__}]: "+
-        "array must be of same length" if not arrX.size == arrY.size
+   def get_conditional_entropy(arrX, arrY)
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "array must be of same length" if not arrX.size == arrY.size
     hxy = 0.0
-  n = arrX.size.to_f
-  arrY.uniq.each do |y_j|
-    p1 = arrY.count(y_j)/n
-    indices = (0...n).to_a.select { |k| arrY[k] == y_j }
-    xvs = arrX.values_at(*indices)
-    m = xvs.size.to_f
-    xvs.uniq.each do |x_i|
-      p2 = xvs.count(x_i)/m
-    hxy += -1.0 * p1 * (p2 * Math.log2(p2))
+    n = arrX.size.to_f
+    arrY.uniq.each do |y_j|
+      p1 = arrY.count(y_j)/n
+      indices = (0...n).to_a.select { |k| arrY[k] == y_j }
+      xvs = arrX.values_at(*indices)
+      m = xvs.size.to_f
+      xvs.uniq.each do |x_i|
+        p2 = xvs.count(x_i)/m
+        hxy += -1.0 * p1 * (p2 * Math.log2(p2))
+      end
     end
-  end
-  hxy
+    hxy
   end # get_conditional_entropy
@@ -60,11 +60,11 @@ module Entropy
   #
   # i.e. H(X,Y) == H(Y,X)
   #
-  def get_joint_entropy(arrX, arrY)
+   def get_joint_entropy(arrX, arrY)
     abort "[#{__FILE__}@#{__LINE__}]: "+
         "array must be of same length" if not arrX.size == arrY.size
-    get_marginal_entropy(arrY) + get_conditional_entropy(arrX, arrY)
+    get_marginal_entropy(arrY) + get_conditional_entropy(arrX, arrY)
   end # get_joint_entropy

data/lib/fselector/fileio.rb CHANGED

@@ -110,10 +110,22 @@ module FileIO
       ofs = File.open(fname, 'w')
     end
+    # convert class label to integer type
+    k2idx = {}
+    get_classes.each_with_index do |k, i|
+      k2idx[k] = i+1
+    end
+    # convert feature to integer type
+    f2idx = {}
+    get_features.each_with_index do |f, i|
+      f2idx[f] = i+1
+    end
     each_sample do |k, s|
-      ofs.print "#{k} "
+      ofs.print "#{k2idx[k]} "
       s.keys.sort { |x, y| x.to_s.to_i <=> y.to_s.to_i }.each do |i|
-        ofs.print " #{i}:#{s[i]}" if not s[i].zero?
+        ofs.print " #{f2idx[i]}:#{s[i]}" if not s[i].zero? # implicit mode
       end
       ofs.puts
     end
@@ -171,7 +183,7 @@ module FileIO
           end
         else
           abort "[#{__FILE__}@#{__LINE__}]: "+
-                "1st and 2nd row must have same fields"
+                "the first two rows must have same number of fields"
         end
       else # data rows
         label, *fvs = ln.chomp.split(/,/)

data/lib/fselector/replace_missing_values.rb ADDED

@@ -0,0 +1,78 @@
+#
+# replace missing feature values
+#
+module ReplaceMissingValues
+  #
+  # replace missing feature value with a fixed value
+  # applicable for both discrete and continuous feature
+  # @note data structure will be altered
+  #
+  def replace_with_fixed_value!(val)
+    each_sample do |k, s|
+      each_feature do |f|
+        if not s.has_key? f
+          s[f] = my_value
+        end
+      end
+    end
+    # clear variables
+    clear_vars
+  end # replace_fixed_value
+  #
+  # replace missing feature value with mean feature value
+  # applicable only to continuous feature
+  # @note data structure will be altered
+  #
+  def replace_with_mean_value!
+    each_sample do |k, s|
+      each_feature do |f|
+        fv = get_feature_values(f)
+        next if fv.size == get_sample_size # no missing values
+        mean = fv.ave
+        if not s.has_key? f
+          s[f] = mean
+        end
+      end
+    end
+    # clear variables
+    clear_vars
+  end # replace_with_mean_value!
+  #
+  # replace missing feature value with most seen feature value
+  # applicable only to discrete feature
+  # @note data structure will be altered
+  #
+  def replace_with_most_seen_value!
+    each_sample do |k, s|
+      each_feature do |f|
+        fv = get_feature_values(f)
+        next if fv.size == get_sample_size # no missing values
+        seen_count, seen_value = 0, nil
+        fv.uniq.each do |v|
+          count = fv.count(v)
+          if count > seen_count
+            seen_count = count
+            seen_value = v
+          end
+        end
+        if not s.has_key? f
+          s[f] = seen_value
+        end
+      end
+    end
+    # clear variables
+    clear_vars
+  end # replace_with_mean_value!
+end # ReplaceMissingValues

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: fselector
 version: !ruby/object:Gem::Version
-  version: 0.2.0
+  version: 0.3.0
   prerelease:
 platform: ruby
 authors:
@@ -9,17 +9,19 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-04-02 00:00:00.000000000 Z
+date: 2012-04-03 00:00:00.000000000 Z
 dependencies: []
 description: FSelector is a Ruby gem that aims to integrate various feature selection/ranking
-  algorithms into one single package. Welcome to contact me (need47@gmail.com) if
-  you want to contribute your own algorithms or report a bug. FSelector enables the
-  user to perform feature selection by using either a single algorithm or an ensemble
-  of algorithms. FSelector acts on a full-feature data set with CSV, LibSVM or WEKA
-  file format and outputs a reduced data set with only selected subset of features,
-  which can later be used as the input for various machine learning softwares including
-  LibSVM and WEKA. FSelector, itself, does not implement any of the machine learning
-  algorithms such as support vector machines and random forest.
+  algorithms and related functions into one single package. Welcome to contact me
+  (need47@gmail.com) if you'd like to contribute your own algorithms or report a bug.
+  FSelector allows user to perform feature selection by using either a single algorithm
+  or an ensemble of multiple algorithms, and other common tasks including normalization
+  and discretization on continuous data, as well as replace missing feature values
+  with certain criterion. FSelector acts on a full-feature data set in either CSV,
+  LibSVM or WEKA file format and outputs a reduced data set with only selected subset
+  of features, which can later be used as the input for various machine learning softwares
+  including LibSVM and WEKA. FSelector, itself, does not implement any of the machine
+  learning algorithms such as support vector machines and random forest.
 email: need47@gmail.com
 executables: []
 extensions: []
@@ -73,6 +75,7 @@ files:
 - lib/fselector/ensemble.rb
 - lib/fselector/entropy.rb
 - lib/fselector/fileio.rb
+- lib/fselector/replace_missing_values.rb
 - lib/fselector/util.rb
 - lib/fselector.rb
 homepage: http://github.com/need47/fselector