RubyGems - fselector - Versions diffs - 0.2.0 → 0.3.0 - Mend

fselector 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

data/README.md +46 -33
data/lib/fselector.rb +6 -1
data/lib/fselector/algo_base/base.rb +14 -3
data/lib/fselector/algo_base/base_CFS.rb +12 -0
data/lib/fselector/algo_base/base_continuous.rb +2 -2
data/lib/fselector/algo_continuous/CFS_c.rb +10 -0
data/lib/fselector/algo_continuous/ReliefF_c.rb +3 -0
data/lib/fselector/algo_continuous/Relief_c.rb +3 -0
data/lib/fselector/algo_continuous/discretizer.rb +161 -7
data/lib/fselector/algo_continuous/normalizer.rb +3 -3
data/lib/fselector/algo_discrete/CFS_d.rb +6 -0
data/lib/fselector/entropy.rb +31 -31
data/lib/fselector/fileio.rb +15 -3
data/lib/fselector/replace_missing_values.rb +78 -0
metadata +13 -10

data/README.md CHANGED

@@ -8,30 +8,41 @@ FSelector: a Ruby gem for feature selection and ranking
 **Email**: [need47@gmail.com](mailto:need47@gmail.com)
 **Copyright**: 2012
 **License**: MIT License
-**Latest Version**: 0.2.0
-**Release Date**: April 1st 2012
+**Latest Version**: 0.3.0
+**Release Date**: April 3rd 2012
 Synopsis
 --------
-FSelector is a Ruby gem that aims to integrate various feature selection/ranking
-algorithms into one single package. Welcome to contact me (need47@gmail.com)
-if you want to contribute your own algorithms or report a bug. FSelector enables
-the user to perform feature selection by using either a single algorithm or an
-ensemble of algorithms. FSelector acts on a full-feature data set with CSV, LibSVM
-or WEKA file format and outputs a reduced data set with only selected subset of
-features, which can later be used as the input for various machine learning softwares
-including LibSVM and WEKA. FSelector, itself, does not implement any of the machine
-learning algorithms such as support vector machines and random forest. Below is a
-summary of FSelector's features.
+FSelector is a Ruby gem that aims to integrate various feature
+selection/ranking algorithms and related functions into one single
+package. Welcome to contact me (need47@gmail.com) if you'd like to
+contribute your own algorithms or report a bug. FSelector allows user
+to perform feature selection by using either a single algorithm or an
+ensemble of multiple algorithms, and other common tasks including
+normalization and discretization on continuous data, as well as replace
+missing feature values with certain criterion. FSelector acts on a
+full-feature data set in either CSV, LibSVM or WEKA file format and
+outputs a reduced data set with only selected subset of features, which
+can later be used as the input for various machine learning softwares
+including LibSVM and WEKA. FSelector, itself, does not implement
+any of the machine learning algorithms such as support vector machines
+and random forest. See below for a list of FSelector's features.
 Feature List
 ------------
-**1. available feature selection/ranking algorithms**
+**1. supported input/output file types**
+ - csv
+ - libsvm
+ - weka ARFF
+ - random data (for test purpose)
+**2. available feature selection/ranking algorithms**
-    algorithm                       alias      feature type
-    -------------------------------------------------------
+    algorithm                       alias       feature type
+    --------------------------------------------------------
     Accuracy                        Acc         discrete
     AccuracyBalanced                Acc2        discrete
     BiNormalSeparation              BNS         discrete
@@ -67,29 +78,31 @@ Feature List
     ReliefF_c                       ReliefF_c   continuous
     TScore                          TS          continuous
-**2. feature selection approaches**
+**3. feature selection approaches**
  - by a single algorithm
  - by multiple algorithms in a tandem manner
  - by multiple algorithms in a consensus manner
-**3. availabe normalization and discretization algorithms for continuous feature**
+**4. availabe normalization and discretization algorithms for continuous feature**
     algorithm          note
-    --------------------------------------------------------------------
-    log                normalization by logarithmic transformation
-    min_max            normalization by scaling into [min, max]
-    zscore             normalization by converting into zscore
-    equal_width        discretization by equal width among intervals
-    equal_frequency    discretization by equal frequency among intervals
-    ChiMerge           discretization by ChiMerge method
-**4. supported input/output file types**
- - csv
- - libsvm
- - weka ARFF
- - random data (for test purpose)
+    -----------------------------------------------------------------
+    log                normalize by logarithmic transformation
+    min_max            normalize by scaling into [min, max]
+    zscore             normalize by converting into zscore
+    equal_width        discretize by equal width among intervals
+    equal_frequency    discretize by equal frequency among intervals
+    ChiMerge           discretize by ChiMerge method
+    MID                discretize by Multi-Interval Discretization
+**5. availabe algorithms for replacing missing feature values**
+    algorithm          note                                  feature type
+    --------------------------------------------------------------------------------------
+    fixed_value        replace with a fixed value            discrete, continuous
+    mean_value         replace with the mean feature value   continuous
+    most_seen_value    replace with most seen feature value  discrete
 Installing
 ----------
@@ -187,11 +200,11 @@ Usage
     r1.data_from_csv('test/iris.csv')
     # normalization by log2 (optional)
-    # r1.normalize_log!(2)
+    # r1.normalize_by_log!(2)
     # discretization by ChiMerge algorithm
     # chi-squared value = 4.60 for a three-class problem at alpha=0.10
-    r1.discretize_by_chimerge!(4.60)
+    r1.discretize_by_ChiMerge!(4.60)
     # apply Fast Correlation-Based Filter (FCBF) algorithm for discrete feature
     # initialize with discretized data from r1

data/lib/fselector.rb CHANGED

@@ -3,7 +3,7 @@
 #
 module FSelector
   # module version
-  VERSION = '0.2.0'
+  VERSION = '0.3.0'
 end
 ROOT = File.expand_path(File.dirname(__FILE__))
@@ -11,9 +11,14 @@ ROOT = File.expand_path(File.dirname(__FILE__))
 #
 # include necessary files
 #
+# read and write file, supported formats include CSV, LibSVM and WEKA files
 require "#{ROOT}/fselector/fileio.rb"
+# extend Array and String class
 require "#{ROOT}/fselector/util.rb"
+# entropy-related functions
 require "#{ROOT}/fselector/entropy.rb"
+# replace missing values
+require "#{ROOT}/fselector/replace_missing_values.rb"
 #
 # base class

data/lib/fselector/algo_base/base.rb CHANGED

@@ -8,6 +8,8 @@ module FSelector
   class Base
     # include FileIO
     include FileIO
+    # include ReplaceMissingValues
+    include ReplaceMissingValues
     # initialize from an existing data structure
     def initialize(data=nil)
@@ -167,13 +169,13 @@ module FSelector
     def set_data(data)
       if data and data.class == Hash
         @data = data
-        # clear
-        @classes, @features, @fvs = nil, nil, nil
-        @scores, @ranks, @sz = nil, nil, nil
+        # clear variables
+        clear_vars
       else
         abort "[#{__FILE__}@#{__LINE__}]: "+
               "data must be a Hash object!"
       end
       data
     end
@@ -335,6 +337,14 @@ module FSelector
     private
+    # clear variables when data structure is altered
+    def clear_vars
+      @classes, @features, @fvs = nil, nil, nil
+      @scores, @ranks, @sz = nil, nil, nil
+      @cv, @fvs = nil, nil
+    end
     # set feature (f) score (s) for class (k)
     def set_feature_score(f, k, s)
       @scores ||= {}
@@ -342,6 +352,7 @@ module FSelector
       @scores[f][k] = s
     end
     # get subset of feature
     def get_feature_subset
       abort "[#{__FILE__}@#{__LINE__}]: "+

data/lib/fselector/algo_base/base_CFS.rb CHANGED

@@ -21,6 +21,9 @@ module FSelector
     # use sequential forward search
     def get_feature_subset
+      # handle missing values
+      handle_missing_value
       subset = []
       feats = get_features.dup
@@ -58,6 +61,15 @@ module FSelector
     end # get_feature_subset
+    # handle missing values
+    # CFS replaces missing values with the mean for continous features and
+    # the most seen value for discrete features
+    def handle_missing_values
+      abort "[#{__FILE__}@#{__LINE__}]: "+
+             "derived CFS algo must implement its own handle_missing_values()"
+    end
     # calc new merit of subset when adding feature (f)
     def calc_merit(subset, f)
       k = subset.size.to_f + 1

data/lib/fselector/algo_base/base_continuous.rb CHANGED

@@ -10,8 +10,8 @@ module FSelector
   class BaseContinuous < Base
     # include normalizer
     include Normalizer
-    # include discretilizer
-    include Discretilizer
+    # include discretizer
+    include Discretizer
     # initialize from an existing data structure
     def initialize(data=nil)

data/lib/fselector/algo_continuous/CFS_c.rb CHANGED

@@ -8,8 +8,18 @@ module FSelector
 # ref: [Feature Selection for Discrete and Numeric Class Machine Learning](http://www.cs.waikato.ac.nz/ml/publications/1999/99MH-Feature-Select.pdf)
 #
   class CFS_c < BaseCFS
+    # include normalizer and discretizer
+    include Normalizer
+    include Discretizer
     private
+    # replace missing values with mean feature value
+    def handle_missing_values
+      replace_with_mean_value!
+    end
     # calc the feature-class correlation of two vectors
     def do_rcf(cv, fv)

data/lib/fselector/algo_continuous/ReliefF_c.rb CHANGED

@@ -10,6 +10,9 @@ module FSelector
 # ref: [Estimating Attributes: Analysis and Extensions of RELIEF](http://www.springerlink.com/content/fp23jh2h0426ww45/)
 #
   class ReliefF_c < BaseReliefF
+    # include normalizer and discretizer
+    include Normalizer
+    include Discretizer
     private

data/lib/fselector/algo_continuous/Relief_c.rb CHANGED

@@ -10,6 +10,9 @@ module FSelector
 # ref: [The Feature Selection Problem: Traditional Methods and a New Algorithm](http://www.aaai.org/Papers/AAAI/1992/AAAI92-020.pdf)
 #
   class Relief_c < BaseRelief
+    # include normalizer and discretizer
+    include Normalizer
+    include Discretizer
     private

data/lib/fselector/algo_continuous/discretizer.rb CHANGED

@@ -1,7 +1,10 @@
 #
-# discretilize continous feature
+# discretize continous feature
 #
-module Discretilizer
+module Discretizer
+  # include Entropy module
+  include Entropy
   # discretize by equal-width intervals
   #
   # @param [Integer] n_interval
@@ -84,7 +87,7 @@ module Discretilizer
   #             2          4.60    5.99    9.21    13.82
   #             3          6.35    7.82    11.34   16.27
   #
-  def discretize_by_chimerge!(chisq)
+  def discretize_by_ChiMerge!(chisq)
     # chisq = 4.60 # for iris::Sepal.Length
     # for intialization
     hzero = {}
@@ -177,19 +180,71 @@ module Discretilizer
       end
     end
-  end # discretize_chimerge!
+  end # discretize_ChiMerge!
+  #
+  # discretize by Multi-Interval Discretization (MID) algorithm
+  # @note no missing feature values allowed and data structure will be altered
+  #
+  # ref: [Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning](http://www.ijcai.org/Past%20Proceedings/IJCAI-93-VOL2/PDF/022.pdf)
+  #
+  def discretize_by_MID!
+    # determine the final boundaries
+    f2cp = {} # cut points for each feature
+    each_feature do |f|
+      cv = get_class_labels
+      # we assume no missing feature values
+      fv = get_feature_values(f)
+      n = cv.size
+      # sort cv and fv according ascending order of fv
+      sis = (0...n).to_a.sort { |i,j| fv[i] <=> fv[j] }
+      cv = cv.values_at(*sis)
+      fv = fv.values_at(*sis)
+      # get initial boundaries
+      bs = []
+      fv.each_with_index do |v, i|
+        # cut point (Ta) for feature A must always be a value between
+        # two examples of different classes in the sequence of sorted examples
+        # see orginal reference
+        if i < n-1 and cv[i] != cv[i+1]
+          bs << (v+fv[i+1])/2.0
+        end
+      end
+      bs.uniq! # remove duplicates
+      # main algorithm, iteratively determine cut point
+      cp = []
+      partition(cv, fv, bs, cp)
+      # add the rightmost boundary for convenience
+      cp << fv.max+1.0
+      # record cut points for feature (f)
+      f2cp[f] = cp
+    end
+    # discretize based on cut points
+    each_sample do |k, s|
+      s.keys.each do |f|
+        s[f] = get_index(s[f], f2cp[f])
+      end
+    end
+  end # discretize_by_MID!
   private
   # get index from sorted boundaries
   #
   # min -- | -- | -- | ... max |
-  #        b0   b1   b2        bn(=max+1)
-  #      0    1    2   ...   n
+  #        b1   b2   b3        bn(=max+1)
+  #      1    2    3   ...   n
   #
   def get_index(v, boundaries)
     boundaries.each_with_index do |b, i|
-      return i if v < b
+      return i+1 if v < b
     end
   end # get_index
@@ -215,4 +270,103 @@ module Discretilizer
   end # calc_chisq
+  #
+  # Multi-Interval Discretization main algorithm
+  # recursively always selecting the best cut point
+  #
+  # @param [Array] cv class labels
+  # @param [Array] fv feature values
+  # @param [Array] bs potential cut points
+  # @param [Array] cp resultant cut points
+  def partition(cv, fv, bs, cp)
+    # best cut point
+    cp_best = nil
+    # binary subset at the best cut point
+    cv1_best, cv2_best = nil, nil
+    fv1_best, fv2_best = nil, nil
+    bs1_best, bs2_best = nil, nil
+    # best information gain
+    gain_best = -100.0
+    ent_best = -100.0
+    ent1_best = -100.0
+    ent2_best = -100.0
+    # try each potential cut point
+    bs.each do |b|
+      # binary split
+      cv1_try, cv2_try, fv1_try, fv2_try, bs1_try, bs2_try =
+        binary_split(cv, fv, bs, b)
+      # gain for this cut point
+      ent_try = get_marginal_entropy(cv)
+      ent1_try = get_marginal_entropy(cv1_try)
+      ent2_try = get_marginal_entropy(cv2_try)
+      gain_try = ent_try -
+                 (cv1_try.size.to_f/cv.size) * ent1_try -
+                 (cv2_try.size.to_f/cv.size) * ent2_try
+      #pp gain_try
+      if gain_try > gain_best
+        cp_best = b
+        cv1_best, cv2_best = cv1_try, cv2_try
+        fv1_best, fv2_best = fv1_try, fv2_try
+        bs1_best, bs2_best = bs1_try, bs2_try
+        gain_best = gain_try
+        ent_best = ent_try
+        ent1_best, ent2_best = ent1_try, ent2_try
+      end
+    end
+    # to cut or not to cut?
+    #
+    # Gain(A,T;S) > 1/N * log2(N-1) + 1/N * delta(A,T;S)
+    if cp_best
+      n = cv.size.to_f
+      k = cv.uniq.size.to_f
+      k1 = cv1_best.uniq.size.to_f
+      k2 = cv2_best.uniq.size.to_f
+      delta = Math.log2(3**k-2)-(k*ent_best - k1*ent1_best - k2*ent2_best)
+      # accept cut point
+      if gain_best > (Math.log2(n-1)/n + delta/n)
+        # a: record cut point
+        cp << cp_best
+        # b: recursively call on subset
+        partition(cv1_best, fv1_best, bs1_best, cp)
+        partition(cv2_best, fv2_best, bs2_best, cp)
+      end
+    end
+  end
+  # binarily split based on a cut point
+  def binary_split(cv, fv, bs, cut_point)
+    cv1, cv2, fv1, fv2, bs1, bs2 = [], [], [], [], [], []
+    fv.each_with_index do |v, i|
+      if v < cut_point
+        cv1 << cv[i]
+        fv1 << v
+      else
+        cv2 << cv[i]
+        fv2 << v
+      end
+    end
+    bs.each do |b|
+      if b < cut_point
+        bs1 << b
+      else
+        bs2 << b
+      end
+    end
+    # return subset
+    [cv1, cv2, fv1, fv2, bs1, bs2]
+  end
 end # module

data/lib/fselector/algo_continuous/normalizer.rb CHANGED

@@ -3,7 +3,7 @@
 #
 module Normalizer
    # log transformation, requires positive feature values
-   def normalize_log!(base=10)
+   def normalize_by_log!(base=10)
      each_sample do |k, s|
        s.keys.each do |f|
          s[f] = Math.log(s[f], base) if s[f] > 0.0
@@ -13,7 +13,7 @@ module Normalizer
    # scale to [min,max], max > min
-   def normalize_min_max!(min=0.0, max=1.0)
+   def normalize_by_min_max!(min=0.0, max=1.0)
      # first determine min and max for each feature
      f2min_max = {}
@@ -33,7 +33,7 @@ module Normalizer
    # by z-score
-   def normalize_zscore!
+   def normalize_by_zscore!
      # first determine mean and sd for each feature
      f2mean_sd = {}

data/lib/fselector/algo_discrete/CFS_d.rb CHANGED

@@ -12,6 +12,12 @@ module FSelector
     include Entropy
     private
+    # replace missing values with most seen feature value
+    def handle_missing_values
+      replace_with_most_seen_value!
+    end
     # calc the feature-class correlation of two vectors
     def do_rcf(cv, fv)

data/lib/fselector/entropy.rb CHANGED

@@ -7,16 +7,16 @@ module Entropy
   #
   # H(X) = -1 * sigma_i (P(x_i) logP(x_i))
   #
-  def get_marginal_entropy(arrX)
+   def get_marginal_entropy(arrX)
     h = 0.0
     n = arrX.size.to_f
-  arrX.uniq.each do |x_i|
-    p = arrX.count(x_i)/n
-    h += -1.0 * (p * Math.log2(p))
-  end
-  h
+    arrX.uniq.each do |x_i|
+      p = arrX.count(x_i)/n
+      h += -1.0 * (p * Math.log2(p))
+    end
+    h
   end # get_marginal_entropy
@@ -27,28 +27,28 @@ module Entropy
   #
   # where H(X|y_j) = -1 * sigma_i (P(x_i|y_j) logP(x_i|y_j))
   #
-  def get_conditional_entropy(arrX, arrY)
-  abort "[#{__FILE__}@#{__LINE__}]: "+
-        "array must be of same length" if not arrX.size == arrY.size
+   def get_conditional_entropy(arrX, arrY)
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "array must be of same length" if not arrX.size == arrY.size
     hxy = 0.0
-  n = arrX.size.to_f
-  arrY.uniq.each do |y_j|
-    p1 = arrY.count(y_j)/n
-    indices = (0...n).to_a.select { |k| arrY[k] == y_j }
-    xvs = arrX.values_at(*indices)
-    m = xvs.size.to_f
-    xvs.uniq.each do |x_i|
-      p2 = xvs.count(x_i)/m
-    hxy += -1.0 * p1 * (p2 * Math.log2(p2))
+    n = arrX.size.to_f
+    arrY.uniq.each do |y_j|
+      p1 = arrY.count(y_j)/n
+      indices = (0...n).to_a.select { |k| arrY[k] == y_j }
+      xvs = arrX.values_at(*indices)
+      m = xvs.size.to_f
+      xvs.uniq.each do |x_i|
+        p2 = xvs.count(x_i)/m
+        hxy += -1.0 * p1 * (p2 * Math.log2(p2))
+      end
     end
-  end
-  hxy
+    hxy
   end # get_conditional_entropy
@@ -60,11 +60,11 @@ module Entropy
   #
   # i.e. H(X,Y) == H(Y,X)
   #
-  def get_joint_entropy(arrX, arrY)
+   def get_joint_entropy(arrX, arrY)
     abort "[#{__FILE__}@#{__LINE__}]: "+
         "array must be of same length" if not arrX.size == arrY.size
-    get_marginal_entropy(arrY) + get_conditional_entropy(arrX, arrY)
+    get_marginal_entropy(arrY) + get_conditional_entropy(arrX, arrY)
   end # get_joint_entropy

data/lib/fselector/fileio.rb CHANGED

@@ -110,10 +110,22 @@ module FileIO
       ofs = File.open(fname, 'w')
     end
+    # convert class label to integer type
+    k2idx = {}
+    get_classes.each_with_index do |k, i|
+      k2idx[k] = i+1
+    end
+    # convert feature to integer type
+    f2idx = {}
+    get_features.each_with_index do |f, i|
+      f2idx[f] = i+1
+    end
     each_sample do |k, s|
-      ofs.print "#{k} "
+      ofs.print "#{k2idx[k]} "
       s.keys.sort { |x, y| x.to_s.to_i <=> y.to_s.to_i }.each do |i|
-        ofs.print " #{i}:#{s[i]}" if not s[i].zero?
+        ofs.print " #{f2idx[i]}:#{s[i]}" if not s[i].zero? # implicit mode
       end
       ofs.puts
     end
@@ -171,7 +183,7 @@ module FileIO
           end
         else
           abort "[#{__FILE__}@#{__LINE__}]: "+
-                "1st and 2nd row must have same fields"
+                "the first two rows must have same number of fields"
         end
       else # data rows
         label, *fvs = ln.chomp.split(/,/)

data/lib/fselector/replace_missing_values.rb ADDED

@@ -0,0 +1,78 @@
+#
+# replace missing feature values
+#
+module ReplaceMissingValues
+  #
+  # replace missing feature value with a fixed value
+  # applicable for both discrete and continuous feature
+  # @note data structure will be altered
+  #
+  def replace_with_fixed_value!(val)
+    each_sample do |k, s|
+      each_feature do |f|
+        if not s.has_key? f
+          s[f] = my_value
+        end
+      end
+    end
+    # clear variables
+    clear_vars
+  end # replace_fixed_value
+  #
+  # replace missing feature value with mean feature value
+  # applicable only to continuous feature
+  # @note data structure will be altered
+  #
+  def replace_with_mean_value!
+    each_sample do |k, s|
+      each_feature do |f|
+        fv = get_feature_values(f)
+        next if fv.size == get_sample_size # no missing values
+        mean = fv.ave
+        if not s.has_key? f
+          s[f] = mean
+        end
+      end
+    end
+    # clear variables
+    clear_vars
+  end # replace_with_mean_value!
+  #
+  # replace missing feature value with most seen feature value
+  # applicable only to discrete feature
+  # @note data structure will be altered
+  #
+  def replace_with_most_seen_value!
+    each_sample do |k, s|
+      each_feature do |f|
+        fv = get_feature_values(f)
+        next if fv.size == get_sample_size # no missing values
+        seen_count, seen_value = 0, nil
+        fv.uniq.each do |v|
+          count = fv.count(v)
+          if count > seen_count
+            seen_count = count
+            seen_value = v
+          end
+        end
+        if not s.has_key? f
+          s[f] = seen_value
+        end
+      end
+    end
+    # clear variables
+    clear_vars
+  end # replace_with_mean_value!
+end # ReplaceMissingValues

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: fselector
 version: !ruby/object:Gem::Version
-  version: 0.2.0
+  version: 0.3.0
   prerelease:
 platform: ruby
 authors:
@@ -9,17 +9,19 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-04-02 00:00:00.000000000 Z
+date: 2012-04-03 00:00:00.000000000 Z
 dependencies: []
 description: FSelector is a Ruby gem that aims to integrate various feature selection/ranking
-  algorithms into one single package. Welcome to contact me (need47@gmail.com) if
-  you want to contribute your own algorithms or report a bug. FSelector enables the
-  user to perform feature selection by using either a single algorithm or an ensemble
-  of algorithms. FSelector acts on a full-feature data set with CSV, LibSVM or WEKA
-  file format and outputs a reduced data set with only selected subset of features,
-  which can later be used as the input for various machine learning softwares including
-  LibSVM and WEKA. FSelector, itself, does not implement any of the machine learning
-  algorithms such as support vector machines and random forest.
+  algorithms and related functions into one single package. Welcome to contact me
+  (need47@gmail.com) if you'd like to contribute your own algorithms or report a bug.
+  FSelector allows user to perform feature selection by using either a single algorithm
+  or an ensemble of multiple algorithms, and other common tasks including normalization
+  and discretization on continuous data, as well as replace missing feature values
+  with certain criterion. FSelector acts on a full-feature data set in either CSV,
+  LibSVM or WEKA file format and outputs a reduced data set with only selected subset
+  of features, which can later be used as the input for various machine learning softwares
+  including LibSVM and WEKA. FSelector, itself, does not implement any of the machine
+  learning algorithms such as support vector machines and random forest.
 email: need47@gmail.com
 executables: []
 extensions: []
@@ -73,6 +75,7 @@ files:
 - lib/fselector/ensemble.rb
 - lib/fselector/entropy.rb
 - lib/fselector/fileio.rb
+- lib/fselector/replace_missing_values.rb
 - lib/fselector/util.rb
 - lib/fselector.rb
 homepage: http://github.com/need47/fselector