fselector 1.3.0 → 1.3.1

Sign up to get free protection for your applications and to get access to all the features.
data/.yardopts CHANGED
@@ -3,4 +3,4 @@
3
3
  --default-return ""
4
4
  --title "FSelector Documentation"
5
5
  --no-private
6
- --files README.md,ChangeLog,LICENSE lib
6
+ --files README.md,ChangeLog,LICENSE,HowToContribute lib
data/ChangeLog CHANGED
@@ -1,11 +1,20 @@
1
- 2012-05-24 version 1.3.0
1
+ 2012-05-31 v1.3.1
2
+ ------------------
3
+
4
+ * add HowToContribute section with test examples in README to show how to write one's own feature selection algorithms and/or contribute them to FSelector on GitHub.com
5
+ * all discretization algorithms now set feature type as CATEGORICAL, this will allow correct file format conversion after discretization
6
+ * add RandomSubset algorithm (primarily for test purpose)
7
+
8
+ 2012-05-24 v1.3.0
9
+ ------------------
2
10
 
3
11
  * update clear\_vars() in Base by use of Ruby metaprogramming, this trick avoids repetitive overriding it in each derived subclass
4
12
  * re-organize LasVegasFilter, LasVegasIncremental and Random into algo_both/, since they are applicable to dataset with either discrete or continuous features, even with mixed type
5
13
  * update data\_from\_csv() so that it can read CSV file more flexibly. note by default, the last column is class label
6
14
  * add data\_from\_url() to read on-line dataset (in CSV, LibSVM or Weka ARFF file format) specified by a url
7
15
 
8
- 2012-05-20 version 1.2.0
16
+ 2012-05-20 v1.2.0
17
+ ------------------
9
18
 
10
19
  * add KS-Test algorithm for continuous feature
11
20
  * add KS-CCBF algorithm for continuous feature
@@ -13,7 +22,8 @@
13
22
  * add KL-Divergence algorithm for discrete feature
14
23
  * include the Discretizer module for algorithms requiring data with discrete feature, which allows to deal with continuous feature after discretization. Those algorithms requiring data with continuous feature now do not include the Discretizer module
15
24
 
16
- 2012-05-15 version 1.1.0
25
+ 2012-05-15 v1.1.0
26
+ ------------------
17
27
 
18
28
  * add replace\_by\_median\_value! for replacing missing value with feature median value
19
29
  * add replace\_by\_knn\_value! for replacing missing value with weighted feature value from k-nearest neighbors
@@ -22,43 +32,51 @@
22
32
  * rename Ensemble to EnsembleMultiple for ensemble feature selection by creating an ensemble of feature selectors using multiple feature selection algorithms of the same type
23
33
  * bug fix in FileIO module
24
34
 
25
- 2012-05-08 version 1.0.1
35
+ 2012-05-08 v1.0.1
36
+ ------------------
26
37
 
27
38
  * modify Ensemble module so that ensemble\_by\_score() and ensemble\_by\_rank() now take Symbol, instead of Method, as argument. This allows easier and clearer function call
28
39
  * enable select_feature! interface in Ensemble module for the type of subset selection algorithms
29
40
 
30
- 2012-05-04 version 1.0.0
41
+ 2012-05-04 v1.0.0
42
+ ------------------
31
43
 
32
44
  * add new algorithm INTERACT for discrete feature
33
45
  * add Consistency module to deal with data inconsistency calculation, which bases on a Hash table and is efficient in both storage and speed
34
46
  * update the Chi2 algorithm to try to reproduce the results of the original Chi2 algorithm
35
47
  * update documentation whenever necessary
36
48
 
37
- 2012-04-25 version 0.9.0
49
+ 2012-04-25 v0.9.0
50
+ ------------------
38
51
 
39
52
  * add new discretization algorithm (Three-Interval Discretization, TID)
40
53
  * add new algorithm Las Vegas Filter (LVF) for discrete feature
41
54
  * add new algorithm Las Vegas Incremental (LVI) for discrete feature
42
55
 
43
- 2012-04-23 version 0.8.1
56
+ 2012-04-23 v0.8.1
57
+ ------------------
44
58
 
45
59
  * correct a bug in the example in the README file because discretize\_by\_ChiMerge!() now takes confidence alpha value as argument instead of chi-square value
46
60
 
47
- 2012-04-23 version 0.8.0
61
+ 2012-04-23 v0.8.0
62
+ ------------------
48
63
 
49
64
  * add new algorithm FTest (FT) for continuous feature
50
65
  * add .yardoc_opts to gem to use the MarkDown documentation syntax
51
66
 
52
- 2012-04-20 Tiejun Cheng <need47@gmail.com>
67
+ 2012-04-20 v0.7.0
68
+ ------------------
53
69
 
54
- * update to version 0.7.0
70
+ * update to v0.7.0
55
71
 
56
- 2012-04-19 Tiejun Cheng <need47@gmail.com>
72
+ 2012-04-19 v0.6.0
73
+ ------------------
57
74
 
58
75
  * add new algorithm BetweenWithinClassesSumOfSquare (BSS_WSS) for continuous feature
59
76
  * add new algorithm WilcoxonRankSum (WRS) for continuous feature
60
77
 
61
- 2012-04-18 Tiejun Cheng <need47@gmail.com>
78
+ 2012-04-18 v0.5.0
79
+ ------------------
62
80
 
63
81
  * require the RinRuby gem (http://rinruby.ddahl.org) to access the
64
82
  statistical routines in the R package (http://www.r-project.org/)
data/HowToContribute ADDED
@@ -0,0 +1,116 @@
1
+ How to add your own feature selection algorithms
2
+ ------------------------------------------------
3
+
4
+ **Baisc steps**
5
+
6
+ 1. Require the FSelector gem
7
+
8
+ 2. Derive a subclass from a base class (Base, BaseDiscrete, BaseContinuous and etc.)
9
+
10
+ 3. Set your own feature selection algorithm with one of the following two types:
11
+ :feature_weighting # if it outputs weight for each feature
12
+ :feature_subset_selection # if it outputs a subset of original feature set
13
+
14
+ 4. Depending on your algorithm type, override one of the following two interfaces:
15
+ calc_contribution() # if it belongs to the type of feature weighting algorithms
16
+ get_feature_subset() # if it belongs to the type of feature subset selection algorithms
17
+
18
+ **Example**
19
+
20
+ require 'fselector' # step 1
21
+
22
+ module FSelector
23
+ # create a new algorithm belonging to the type of feature weighting
24
+ # to this end, simply override the calc_contribution() in Base class
25
+ class NewAlgo_Weight < Base # step 2
26
+ # set the algorithm type
27
+ @algo_type = :feature_weighting # step 3
28
+
29
+ # add your own initialize() here if necessary
30
+
31
+ private
32
+
33
+ # the algorithm assigns feature weight randomly
34
+ def calc_contribution(f) # step 4
35
+ s = rand
36
+
37
+ # set the score (s) of feature (f) for class (:BEST is the best score among all classes)
38
+ set_feature_score(f, :BEST, s)
39
+ end
40
+ end # NewAlgo_Weight
41
+
42
+
43
+ # create a new algorithm belonging to the type of feature subset selection
44
+ # to this end, simly override the get_feature_subset() in Base class
45
+ class NewAlgo_Subset < Base # step 2
46
+ # set the algorithm type
47
+ @algo_type = :feature_subset_selection # step 3
48
+
49
+ # add your own initialize() here if necessary
50
+
51
+ private
52
+
53
+ # the algorithm returns a random half-size subset of the orignal one
54
+ def get_feature_subset # step 4
55
+ org_features = get_features
56
+ subset = org_features.sample(org_features.size/2)
57
+
58
+ subset
59
+ end
60
+
61
+ end # NewAlgo_Subset
62
+ end # module
63
+
64
+ **Test your algorithms**
65
+
66
+ require 'fselector'
67
+
68
+ # example 1
69
+
70
+ # use NewAlgo_Weighting
71
+ r1 = FSelector::NewAlgo_Weight.new
72
+ r1.data_from_csv('test/iris.csv')
73
+ r1.select_feature_by_rank!('<=1')
74
+ puts r1.get_features
75
+
76
+ # example 2
77
+
78
+ # use NewAlgo_Subset
79
+ r2 = FSelector::NewAlgo_Subset.new
80
+ r2.data_from_csv('test/iris.csv')
81
+ r2.select_feature!
82
+ puts r2.get_features
83
+
84
+ How to become a contributor of FSelector on GitHub
85
+ --------------------------------------------------
86
+
87
+ **Set up your repository**
88
+
89
+ 1. Go to https://github.com/need47/fselector and click the "Fork" button
90
+
91
+ 2. Clone your fork to your local machine:
92
+ git clone git@github.com:yourGitUserName/fselector.git
93
+
94
+ 3. Assign the original repository to a remote called "upstream":
95
+ cd fselector
96
+ git remote add upstream git://github.com/need47/fselector.git
97
+
98
+ 4. Get updates from the "upstream" and merge to your local repository:
99
+ git fetch upstream
100
+ git merge upstream/master
101
+
102
+ **Develop features**
103
+
104
+ 1. Create and checkout a feature branch to house your edits:
105
+ git checkout -b branchName
106
+
107
+ 2. Add your own feature selection algorithm:
108
+ git add yourAlgorithm.rb
109
+ git commit -m "your commit message"
110
+
111
+ 3. Push your branch to GitHub:
112
+ git push origin branchName
113
+
114
+ 4. Visit your forked project on GitHub and switch to your branhName branch
115
+
116
+ 5. Click the "Pull Request" button to request that your features to be merged to the "upstream" master
data/README.md CHANGED
@@ -1,15 +1,15 @@
1
1
  FSelector: a Ruby gem for feature selection and ranking
2
2
  ===========================================================
3
3
 
4
- **Home** [https://rubygems.org/gems/fselector](https://rubygems.org/gems/fselector)
4
+ **Home**: [https://rubygems.org/gems/fselector](https://rubygems.org/gems/fselector)
5
5
  **Source Code**: [https://github.com/need47/fselector](https://github.com/need47/fselector)
6
6
  **Documentation** [http://rubydoc.info/gems/fselector/frames](http://rubydoc.info/gems/fselector/frames)
7
7
  **Author**: Tiejun Cheng
8
8
  **Email**: [need47@gmail.com](mailto:need47@gmail.com)
9
9
  **Copyright**: 2012
10
10
  **License**: MIT License
11
- **Latest Version**: 1.3.0
12
- **Release Date**: 2012-05-24
11
+ **Latest Version**: 1.3.1
12
+ **Release Date**: 2012-05-31
13
13
 
14
14
  Synopsis
15
15
  --------
@@ -27,8 +27,9 @@ outputs a reduced data set with only selected subset of features, which
27
27
  can later be used as the input for various machine learning softwares
28
28
  such as LibSVM and WEKA. FSelector, as a collection of filter methods,
29
29
  does not implement any classifier like support vector machines or
30
- random forest. See below for a list of FSelector's features and
31
- {file:ChangeLog} for updates.
30
+ random forest. Check below for a list of FSelector's features,
31
+ {file:ChangeLog} for updates, and {file:HowToContribute} if you want
32
+ to contribute.
32
33
 
33
34
  Feature List
34
35
  ------------
@@ -88,7 +89,8 @@ Feature List
88
89
  WilcoxonRankSum WRS weighting two-class continuous
89
90
  LasVegasFilter LVF subset multi-class discrete, continuous, mixed
90
91
  LasVegasIncremental LVI subset multi-class discrete, continuous, mixed
91
- Random Random weighting multi-class discrete, continuous, mixed
92
+ Random Rand weighting multi-class discrete, continuous, mixed
93
+ RandomSubset RandS subset multi-class discrete, continuous, mixed
92
94
 
93
95
  **note for feature selection interface:**
94
96
  there are two types of filter methods, i.e., feature weighting algorithms and feature subset selection algorithms
@@ -143,7 +145,7 @@ Usage
143
145
  -----
144
146
 
145
147
  **1. feature selection by a single algorithm**
146
-
148
+
147
149
  require 'fselector'
148
150
 
149
151
  # use InformationGain (IG) as a feature selection algorithm
@@ -186,7 +188,7 @@ Usage
186
188
 
187
189
 
188
190
  **2. feature selection by an ensemble of multiple feature selectors**
189
-
191
+
190
192
  require 'fselector'
191
193
 
192
194
  # example 1
@@ -253,7 +255,7 @@ Usage
253
255
  puts ' # features (after): ' + re.get_features.size.to_s
254
256
 
255
257
  **3. feature selection after discretization**
256
-
258
+
257
259
  require 'fselector'
258
260
 
259
261
  # the Information Gain (IG) algorithm requires data with discrete feature
@@ -277,13 +279,19 @@ Usage
277
279
 
278
280
  **4. see more examples test_*.rb under the test/ directory**
279
281
 
282
+ How to contribute
283
+ -----------------
284
+ check {file:HowToContribute} to see how to write your own feature selection algorithms and/or make contribution to FSelector.
285
+
280
286
  Change Log
281
287
  ----------
288
+
282
289
  A {file:ChangeLog} is available from version 0.5.0 and upward to refelect
283
- what's new and what's changed
290
+ what's new and what's changed.
284
291
 
285
292
  Copyright
286
293
  ---------
294
+
287
295
  FSelector &copy; 2012 by [Tiejun Cheng](mailto:need47@gmail.com).
288
296
  FSelector is licensed under the MIT license. Please see the {file:LICENSE} for
289
297
  more information.
data/lib/fselector.rb CHANGED
@@ -7,7 +7,7 @@ R.eval 'options(warn = -1)' # suppress R warnings
7
7
  #
8
8
  module FSelector
9
9
  # module version
10
- VERSION = '1.3.0'
10
+ VERSION = '1.3.1'
11
11
  end
12
12
 
13
13
  # the root dir of FSelector
@@ -506,7 +506,16 @@ module FSelector
506
506
  end
507
507
 
508
508
 
509
- # get feature subset, for the type of subset selection algorithms
509
+ # calculate each feature's contribution
510
+ # override it in derived subclass for the type of feature weighting algorithms
511
+ def calc_contribution(f)
512
+ abort "[#{__FILE__}@#{__LINE__}]: \n"+
513
+ " derived subclass must implement its own calc_contribution()"
514
+ end
515
+
516
+
517
+ # get feature subset
518
+ # override it in derived subclass for the type of feature subset selection algorithms
510
519
  def get_feature_subset
511
520
  abort "[#{__FILE__}@#{__LINE__}]: \n"+
512
521
  " derived subclass must implement its own get_feature_subset()"
@@ -4,9 +4,9 @@
4
4
  module FSelector
5
5
  #
6
6
  # Random (Rand) for discrete, continuous or mixed feature,
7
- # no pratical use but can be used as a baseline
7
+ # no pratical use but can be used for test purpose
8
8
  #
9
- # Rand = rand numbers within [0..1)
9
+ # Rand assignes random score of [0..1) to each feature
10
10
  #
11
11
  # ref: [An extensive empirical study of feature selection metrics for text classification](http://dl.acm.org/citation.cfm?id=944974)
12
12
  #
@@ -40,4 +40,8 @@ module FSelector
40
40
  end # class
41
41
 
42
42
 
43
+ # shortcut so that you can use FSelector::Rand instead of FSelector::Random
44
+ Rand = Random
45
+
46
+
43
47
  end # module
@@ -0,0 +1,52 @@
1
+ #
2
+ # FSelector: a Ruby gem for feature selection and ranking
3
+ #
4
+ module FSelector
5
+ #
6
+ # RandomSubset (RandS) for discrete, continuous or mixed feature,
7
+ # no pratical use but can be used for test purpose
8
+ #
9
+ # RandS generates a random subset of the original one
10
+ #
11
+ class RandomSubset < Base
12
+ # this algo outputs a subset of feature
13
+ @algo_type = :feature_subset_selection
14
+
15
+ #
16
+ # initialize from an existing data structure
17
+ #
18
+ # @param [Integer] nfeature number of feature required
19
+ # use random number if nil
20
+ #
21
+ def initialize(nfeature=nil, data=nil)
22
+ super(data)
23
+
24
+ @nfeature = nfeature
25
+ end
26
+
27
+ private
28
+
29
+ # RandomSubset algorithm
30
+ def get_feature_subset
31
+ subset = []
32
+
33
+ if @nfeature and @nfeature > 0
34
+ subset = get_features.sample(@nfeature)
35
+ else
36
+ n = rand(get_features.size)
37
+ n += 1 if n == 0
38
+ subset = get_features.sample(n)
39
+ end
40
+
41
+ subset
42
+ end # get_feature_subset
43
+
44
+
45
+ end # class
46
+
47
+
48
+ # shortcut so that you can use FSelector::RandS instead of FSelector::RandomSubset
49
+ RandS = RandomSubset
50
+
51
+
52
+ end # module
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fselector
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.3.0
4
+ version: 1.3.1
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-05-24 00:00:00.000000000 Z
12
+ date: 2012-05-31 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rinruby
16
- requirement: &24934116 !ruby/object:Gem::Requirement
16
+ requirement: &25065000 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ! '>='
@@ -21,7 +21,7 @@ dependencies:
21
21
  version: 2.0.2
22
22
  type: :runtime
23
23
  prerelease: false
24
- version_requirements: *24934116
24
+ version_requirements: *25065000
25
25
  description: FSelector is a Ruby gem that aims to integrate various feature selection/ranking
26
26
  algorithms and related functions into one single package. Welcome to contact me
27
27
  (need47@gmail.com) if you'd like to contribute your own algorithms or report a bug.
@@ -41,6 +41,7 @@ files:
41
41
  - README.md
42
42
  - ChangeLog
43
43
  - LICENSE
44
+ - HowToContribute
44
45
  - .yardopts
45
46
  - lib/fselector/algo_base/base.rb
46
47
  - lib/fselector/algo_base/base_CFS.rb
@@ -51,6 +52,7 @@ files:
51
52
  - lib/fselector/algo_both/LasVegasFilter.rb
52
53
  - lib/fselector/algo_both/LasVegasIncremental.rb
53
54
  - lib/fselector/algo_both/Random.rb
55
+ - lib/fselector/algo_both/RandomSubset.rb
54
56
  - lib/fselector/algo_continuous/BSS_WSS.rb
55
57
  - lib/fselector/algo_continuous/CFS_c.rb
56
58
  - lib/fselector/algo_continuous/F-Test.rb