multi_armed_bandit 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +9 -0
- data/.rspec +2 -0
- data/.travis.yml +4 -0
- data/Gemfile +4 -0
- data/README.md +61 -0
- data/Rakefile +6 -0
- data/bin/console +14 -0
- data/bin/setup +7 -0
- data/lib/multi_armed_bandit/epsilon_greedy.rb +89 -0
- data/lib/multi_armed_bandit/mp_ts.rb +46 -0
- data/lib/multi_armed_bandit/softmax.rb +80 -0
- data/lib/multi_armed_bandit/version.rb +3 -0
- data/lib/multi_armed_bandit.rb +9 -0
- data/multi_armed_bandit.gemspec +34 -0
- metadata +115 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: ad5425a07617c9d5751a037c73e3aa65cceb8bee
|
4
|
+
data.tar.gz: 1a06c8077da76fe19d9f03f2fcb2b686c416e66d
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 7ec3c6c029b9838baf3456c00617cb9b536be95fe6324ab1a86b6760912e6b792ce7933ca1bddbad343f52119ae055353af7a7f015f1c3444be202e756365a9a
|
7
|
+
data.tar.gz: d3cba9d4509715242d1a8068701e46062661f30b30c54877be499595b6725178ddc2b723c47fad52c6331a39d988ba4d1ffbebb1a94a609a31f18a3306402a1b
|
data/.gitignore
ADDED
data/.rspec
ADDED
data/.travis.yml
ADDED
data/Gemfile
ADDED
data/README.md
ADDED
@@ -0,0 +1,61 @@
|
|
1
|
+
# MultiArmedBandit
|
2
|
+
|
3
|
+
This repo contains Ruby code for solving Multi-Armed Bandit problems. This includes the following algorithms:
|
4
|
+
|
5
|
+
* Epsilon-Greedy
|
6
|
+
* Softmax
|
7
|
+
* Thomson Sampling with Multiple Play
|
8
|
+
|
9
|
+
Othrer major algorithms such as UCB and Bayesian Bandit will be forthcoming.
|
10
|
+
|
11
|
+
## Installation
|
12
|
+
|
13
|
+
By executing the following line, you can install the gem from the GitHub repo.
|
14
|
+
|
15
|
+
$ gem specific_install -l 'git://github.com/vasilyjp/multi_armed_bandit.git'
|
16
|
+
|
17
|
+
|
18
|
+
## Usage
|
19
|
+
|
20
|
+
Include MultiArmedBandit module by putting the following code.
|
21
|
+
```ruby
|
22
|
+
require 'multi_armed_bandit'
|
23
|
+
include MultiArmedBandit
|
24
|
+
```
|
25
|
+
|
26
|
+
Then create an object of Softmax class. The first param is temperature. If we set temperature = 0.0, this will give us deterministic choice of the arm which has highest value. In contrast, if we set temperature = ∞, all actions have nearly the same probability. In a pracitcal sense, temperature tend to be between 0.01 and 1.0.
|
27
|
+
|
28
|
+
The second param is number of arms.
|
29
|
+
```ruby
|
30
|
+
sm = MultiArmedBandit::Softmax.new(0.01, 3)
|
31
|
+
```
|
32
|
+
|
33
|
+
By giving lists of number of trials and rewards to bulk_update method, it returns the predicted probabilities.
|
34
|
+
```ruby
|
35
|
+
# Trial 1
|
36
|
+
probs = sm.bulk_update([1000,1000,1000], [72,57,49])
|
37
|
+
counts = probs.map{|p| (p*3000).round }
|
38
|
+
|
39
|
+
# Trial 2
|
40
|
+
probs = sm.bulk_update(counts, [154,17,32])
|
41
|
+
```
|
42
|
+
|
43
|
+
## Development
|
44
|
+
|
45
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
46
|
+
|
47
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
48
|
+
|
49
|
+
## Contributing
|
50
|
+
|
51
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/vasilyjp/multi_armed_bandit. This project is intended to be a safe, welcoming space for collaboration.
|
52
|
+
|
53
|
+
|
54
|
+
## License
|
55
|
+
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
|
56
|
+
|
57
|
+
## Reference
|
58
|
+
```
|
59
|
+
[1] John Myles White: Bandit Algorithms for Website Optimization. O'Reilly Media
|
60
|
+
[2] J. Komiyama, J. Honda, and H.Nakagawa: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays. ICML 2015
|
61
|
+
```
|
data/Rakefile
ADDED
data/bin/console
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "multi_armed_bandit"
|
5
|
+
|
6
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
7
|
+
# with your gem easier. You can also use a different console, if you like.
|
8
|
+
|
9
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
10
|
+
# require "pry"
|
11
|
+
# Pry.start
|
12
|
+
|
13
|
+
require "irb"
|
14
|
+
IRB.start
|
data/bin/setup
ADDED
@@ -0,0 +1,89 @@
|
|
1
|
+
|
2
|
+
module MultiArmedBandit
|
3
|
+
|
4
|
+
class EpsilonGreedy
|
5
|
+
|
6
|
+
attr_accessor :epsilon, :counts, :values, :probs, :n_arms
|
7
|
+
|
8
|
+
# Initialize an object
|
9
|
+
def initialize(epsilon, n_arms)
|
10
|
+
@epsilon = epsilon
|
11
|
+
@n_arms = n_arms
|
12
|
+
reset()
|
13
|
+
end
|
14
|
+
|
15
|
+
# Reset instance variables
|
16
|
+
def reset()
|
17
|
+
@counts = Array.new(@n_arms, 0)
|
18
|
+
@values = Array.new(@n_arms, 0.0)
|
19
|
+
@probs = Array.new(@n_arms, 0.0)
|
20
|
+
end
|
21
|
+
|
22
|
+
# Update in a lump. new_counts is a list of each arm's trial number and
|
23
|
+
# new_rewards means a list of rewards.
|
24
|
+
def bulk_update(new_counts, new_rewards)
|
25
|
+
|
26
|
+
# update the numbers of each arm's trial
|
27
|
+
@counts = new_counts
|
28
|
+
|
29
|
+
# update expectations of each arm
|
30
|
+
new_values = []
|
31
|
+
@counts.zip( new_rewards ).each do |n, r|
|
32
|
+
new_values << r / n.to_f
|
33
|
+
end
|
34
|
+
@values = new_values
|
35
|
+
|
36
|
+
# calcurate probabilities
|
37
|
+
j = ind_max(@values)
|
38
|
+
for i in 0..@n_arms-1 do
|
39
|
+
if i == j
|
40
|
+
@probs[i] = 1-@epsilon
|
41
|
+
else
|
42
|
+
@probs[i] = (@epsilon)/(@n_arms-1)
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
return @probs
|
47
|
+
end
|
48
|
+
|
49
|
+
def update(chosen_arm, reward)
|
50
|
+
@counts[chosen_arm] = @counts[chosen_arm] + 1
|
51
|
+
n = @counts[chosen_arm]
|
52
|
+
|
53
|
+
value = @values[chosen_arm]
|
54
|
+
new_value = ((n - 1) / n.to_f) * value + (1 / n.to_f) * reward
|
55
|
+
@values[chosen_arm] = new_value
|
56
|
+
return
|
57
|
+
end
|
58
|
+
|
59
|
+
|
60
|
+
def select_arm
|
61
|
+
if rand > @epsilon
|
62
|
+
return ind_max(@values)
|
63
|
+
else
|
64
|
+
return rand(@values.size)
|
65
|
+
end
|
66
|
+
end
|
67
|
+
|
68
|
+
private
|
69
|
+
def ind_max(x)
|
70
|
+
m = x.max
|
71
|
+
return x.index(m)
|
72
|
+
end
|
73
|
+
|
74
|
+
def categorical_draw(probs)
|
75
|
+
z = rand()
|
76
|
+
cum_prob = 0.0
|
77
|
+
|
78
|
+
probs.size().times do |i|
|
79
|
+
prob = probs[i]
|
80
|
+
cum_prob += prob
|
81
|
+
if cum_prob > z
|
82
|
+
return i
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
return probs.size() - 1
|
87
|
+
end
|
88
|
+
end
|
89
|
+
end
|
@@ -0,0 +1,46 @@
|
|
1
|
+
require 'simple-random'
|
2
|
+
|
3
|
+
module MultiArmedBandit
|
4
|
+
|
5
|
+
class MultiplePlayTS
|
6
|
+
|
7
|
+
attr_accessor :k, :l, :alpha, :beta, :arm_ids
|
8
|
+
|
9
|
+
# k: num of arms
|
10
|
+
# l: num of selected arms
|
11
|
+
def initialize(k, l, setseed=TRUE)
|
12
|
+
@k = k
|
13
|
+
@l = l
|
14
|
+
@r = SimpleRandom.new
|
15
|
+
# By default the same random seed is used, so we change it
|
16
|
+
@r.set_seed if setseed==TRUE
|
17
|
+
reset
|
18
|
+
end
|
19
|
+
|
20
|
+
def reset
|
21
|
+
@alpha = Array.new(@k, 1)
|
22
|
+
@beta = Array.new(@k, 1)
|
23
|
+
@arm_ids = Array.new(@k, '')
|
24
|
+
end
|
25
|
+
|
26
|
+
# Get selected arm ids
|
27
|
+
def get_selected_arms
|
28
|
+
selected_arms = @alpha.zip(@beta).zip(@arm_ids)
|
29
|
+
.map{|c,i| [i, @r.beta(c[0],c[1])]}
|
30
|
+
.sort_by{|v| -v[1]}
|
31
|
+
.map{|v| v[0]}[0..@l-1]
|
32
|
+
end
|
33
|
+
|
34
|
+
# selected_arms: List of selected drawn arms
|
35
|
+
def update_params_draw(selected_arms)
|
36
|
+
selected_arms.map{|i| @beta[i]+=1}
|
37
|
+
end
|
38
|
+
|
39
|
+
# idx: Index number of rewarded arm
|
40
|
+
def update_params_reward(idx)
|
41
|
+
@alpha[idx]+=1
|
42
|
+
@beta[idx]-=1
|
43
|
+
end
|
44
|
+
|
45
|
+
end
|
46
|
+
end
|
@@ -0,0 +1,80 @@
|
|
1
|
+
|
2
|
+
|
3
|
+
module MultiArmedBandit
|
4
|
+
|
5
|
+
class Softmax
|
6
|
+
|
7
|
+
attr_accessor :temperature, :counts, :values, :probs, :n_arms
|
8
|
+
|
9
|
+
# Initialize an object
|
10
|
+
def initialize(temperature, n_arms)
|
11
|
+
@n_arms = n_arms
|
12
|
+
@temperature = temperature
|
13
|
+
reset()
|
14
|
+
end
|
15
|
+
|
16
|
+
# Reset instance variables
|
17
|
+
def reset()
|
18
|
+
@counts = Array.new(@n_arms, 0)
|
19
|
+
@values = Array.new(@n_arms, 0.0)
|
20
|
+
@probs = Array.new(@n_arms, 0.0)
|
21
|
+
end
|
22
|
+
|
23
|
+
# Update in a lump. new_counts is a list of each arm's trial number and
|
24
|
+
# new_rewards means a list of rewards.
|
25
|
+
# both each num in new_counts and new_rewards should be accumulated numbers
|
26
|
+
def bulk_update(new_counts, new_rewards)
|
27
|
+
|
28
|
+
# update the numbers of each arm's trial
|
29
|
+
@counts = new_counts
|
30
|
+
|
31
|
+
# update expectations of each arm
|
32
|
+
new_values = []
|
33
|
+
@counts.zip( new_rewards ).each do |n, reward|
|
34
|
+
new_values << reward / n.to_f
|
35
|
+
end
|
36
|
+
@values = new_values
|
37
|
+
|
38
|
+
# calcurate probabilities
|
39
|
+
z = @values.collect{|i| Math.exp(i/@temperature)}.reduce(:+)
|
40
|
+
@probs = @values.collect{|i| Math.exp(i/@temperature)/z}
|
41
|
+
|
42
|
+
return probs
|
43
|
+
end
|
44
|
+
|
45
|
+
|
46
|
+
def update(chosen_arm, reward)
|
47
|
+
@counts[chosen_arm] = @counts[chosen_arm] + 1
|
48
|
+
n = @counts[chosen_arm]
|
49
|
+
|
50
|
+
value = @values[chosen_arm]
|
51
|
+
new_value = ((n - 1) / n.to_f) * value + (1 / n.to_f) * reward
|
52
|
+
@values[chosen_arm] = new_value
|
53
|
+
return
|
54
|
+
end
|
55
|
+
|
56
|
+
|
57
|
+
def select_arm
|
58
|
+
z = @values.collect{|i| Math.exp(i/@temperature)}.reduce(:+)
|
59
|
+
@probs = @values.collect{|i| Map.exp(i/@temperature)/z}
|
60
|
+
return categorical_draw(@probs)
|
61
|
+
end
|
62
|
+
|
63
|
+
private
|
64
|
+
def categorical_draw(probs)
|
65
|
+
z = rand()
|
66
|
+
cum_prob = 0.0
|
67
|
+
|
68
|
+
probs.size().times do |i|
|
69
|
+
prob = probs[i]
|
70
|
+
cum_prob += prob
|
71
|
+
if cum_prob > z
|
72
|
+
return i
|
73
|
+
end
|
74
|
+
end
|
75
|
+
|
76
|
+
return probs.size() - 1
|
77
|
+
end
|
78
|
+
|
79
|
+
end
|
80
|
+
end
|
@@ -0,0 +1,34 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'multi_armed_bandit/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "multi_armed_bandit"
|
8
|
+
spec.version = MultiArmedBandit::VERSION
|
9
|
+
spec.authors = ["kndt84"]
|
10
|
+
spec.email = ["takashi.kaneda@vasily.jp"]
|
11
|
+
|
12
|
+
spec.summary = %q{multi-armed bandit algorithms}
|
13
|
+
# spec.description = %q{TODO: Write a longer description or delete this line.}
|
14
|
+
spec.homepage = "https://github.com/vasilyjp/multi_armed_bandit"
|
15
|
+
spec.license = "MIT"
|
16
|
+
|
17
|
+
# Prevent pushing this gem to RubyGems.org by setting 'allowed_push_host', or
|
18
|
+
# delete this section to allow pushing this gem to any host.
|
19
|
+
if spec.respond_to?(:metadata)
|
20
|
+
spec.metadata['allowed_push_host'] = "TODO: Set to 'http://mygemserver.com'"
|
21
|
+
else
|
22
|
+
raise "RubyGems 2.0 or newer is required to protect against public gem pushes."
|
23
|
+
end
|
24
|
+
|
25
|
+
spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
|
26
|
+
spec.bindir = "exe"
|
27
|
+
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
28
|
+
spec.require_paths = ["lib"]
|
29
|
+
|
30
|
+
spec.add_development_dependency "bundler", "~> 1.10"
|
31
|
+
spec.add_development_dependency "rake", "~> 10.0"
|
32
|
+
spec.add_development_dependency "rspec"
|
33
|
+
spec.add_dependency "simple-random"
|
34
|
+
end
|
metadata
ADDED
@@ -0,0 +1,115 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: multi_armed_bandit
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.2.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- kndt84
|
8
|
+
autorequire:
|
9
|
+
bindir: exe
|
10
|
+
cert_chain: []
|
11
|
+
date: 2016-04-14 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: bundler
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - ~>
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.10'
|
20
|
+
type: :development
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - ~>
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.10'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: rake
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ~>
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '10.0'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ~>
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '10.0'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: rspec
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - '>='
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - '>='
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: simple-random
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - '>='
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '0'
|
62
|
+
type: :runtime
|
63
|
+
prerelease: false
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
65
|
+
requirements:
|
66
|
+
- - '>='
|
67
|
+
- !ruby/object:Gem::Version
|
68
|
+
version: '0'
|
69
|
+
description:
|
70
|
+
email:
|
71
|
+
- takashi.kaneda@vasily.jp
|
72
|
+
executables: []
|
73
|
+
extensions: []
|
74
|
+
extra_rdoc_files: []
|
75
|
+
files:
|
76
|
+
- .gitignore
|
77
|
+
- .rspec
|
78
|
+
- .travis.yml
|
79
|
+
- Gemfile
|
80
|
+
- README.md
|
81
|
+
- Rakefile
|
82
|
+
- bin/console
|
83
|
+
- bin/setup
|
84
|
+
- lib/multi_armed_bandit.rb
|
85
|
+
- lib/multi_armed_bandit/epsilon_greedy.rb
|
86
|
+
- lib/multi_armed_bandit/mp_ts.rb
|
87
|
+
- lib/multi_armed_bandit/softmax.rb
|
88
|
+
- lib/multi_armed_bandit/version.rb
|
89
|
+
- multi_armed_bandit.gemspec
|
90
|
+
homepage: https://github.com/vasilyjp/multi_armed_bandit
|
91
|
+
licenses:
|
92
|
+
- MIT
|
93
|
+
metadata:
|
94
|
+
allowed_push_host: 'TODO: Set to ''http://mygemserver.com'''
|
95
|
+
post_install_message:
|
96
|
+
rdoc_options: []
|
97
|
+
require_paths:
|
98
|
+
- lib
|
99
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
100
|
+
requirements:
|
101
|
+
- - '>='
|
102
|
+
- !ruby/object:Gem::Version
|
103
|
+
version: '0'
|
104
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
105
|
+
requirements:
|
106
|
+
- - '>='
|
107
|
+
- !ruby/object:Gem::Version
|
108
|
+
version: '0'
|
109
|
+
requirements: []
|
110
|
+
rubyforge_project:
|
111
|
+
rubygems_version: 2.0.14
|
112
|
+
signing_key:
|
113
|
+
specification_version: 4
|
114
|
+
summary: multi-armed bandit algorithms
|
115
|
+
test_files: []
|