multi_armed_bandit 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: ad5425a07617c9d5751a037c73e3aa65cceb8bee
4
+ data.tar.gz: 1a06c8077da76fe19d9f03f2fcb2b686c416e66d
5
+ SHA512:
6
+ metadata.gz: 7ec3c6c029b9838baf3456c00617cb9b536be95fe6324ab1a86b6760912e6b792ce7933ca1bddbad343f52119ae055353af7a7f015f1c3444be202e756365a9a
7
+ data.tar.gz: d3cba9d4509715242d1a8068701e46062661f30b30c54877be499595b6725178ddc2b723c47fad52c6331a39d988ba4d1ffbebb1a94a609a31f18a3306402a1b
data/.gitignore ADDED
@@ -0,0 +1,9 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
data/.travis.yml ADDED
@@ -0,0 +1,4 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.0.0
4
+ before_install: gem install bundler -v 1.10.6
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in multi_armed_bandit.gemspec
4
+ gemspec
data/README.md ADDED
@@ -0,0 +1,61 @@
1
+ # MultiArmedBandit
2
+
3
+ This repo contains Ruby code for solving Multi-Armed Bandit problems. This includes the following algorithms:
4
+
5
+ * Epsilon-Greedy
6
+ * Softmax
7
+ * Thomson Sampling with Multiple Play
8
+
9
+ Othrer major algorithms such as UCB and Bayesian Bandit will be forthcoming.
10
+
11
+ ## Installation
12
+
13
+ By executing the following line, you can install the gem from the GitHub repo.
14
+
15
+ $ gem specific_install -l 'git://github.com/vasilyjp/multi_armed_bandit.git'
16
+
17
+
18
+ ## Usage
19
+
20
+ Include MultiArmedBandit module by putting the following code.
21
+ ```ruby
22
+ require 'multi_armed_bandit'
23
+ include MultiArmedBandit
24
+ ```
25
+
26
+ Then create an object of Softmax class. The first param is temperature. If we set temperature = 0.0, this will give us deterministic choice of the arm which has highest value. In contrast, if we set temperature = ∞, all actions have nearly the same probability. In a pracitcal sense, temperature tend to be between 0.01 and 1.0.
27
+
28
+ The second param is number of arms.
29
+ ```ruby
30
+ sm = MultiArmedBandit::Softmax.new(0.01, 3)
31
+ ```
32
+
33
+ By giving lists of number of trials and rewards to bulk_update method, it returns the predicted probabilities.
34
+ ```ruby
35
+ # Trial 1
36
+ probs = sm.bulk_update([1000,1000,1000], [72,57,49])
37
+ counts = probs.map{|p| (p*3000).round }
38
+
39
+ # Trial 2
40
+ probs = sm.bulk_update(counts, [154,17,32])
41
+ ```
42
+
43
+ ## Development
44
+
45
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
46
+
47
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
48
+
49
+ ## Contributing
50
+
51
+ Bug reports and pull requests are welcome on GitHub at https://github.com/vasilyjp/multi_armed_bandit. This project is intended to be a safe, welcoming space for collaboration.
52
+
53
+
54
+ ## License
55
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
56
+
57
+ ## Reference
58
+ ```
59
+ [1] John Myles White: Bandit Algorithms for Website Optimization. O'Reilly Media
60
+ [2] J. Komiyama, J. Honda, and H.Nakagawa: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays. ICML 2015
61
+ ```
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "multi_armed_bandit"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
data/bin/setup ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
6
+
7
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,89 @@
1
+
2
+ module MultiArmedBandit
3
+
4
+ class EpsilonGreedy
5
+
6
+ attr_accessor :epsilon, :counts, :values, :probs, :n_arms
7
+
8
+ # Initialize an object
9
+ def initialize(epsilon, n_arms)
10
+ @epsilon = epsilon
11
+ @n_arms = n_arms
12
+ reset()
13
+ end
14
+
15
+ # Reset instance variables
16
+ def reset()
17
+ @counts = Array.new(@n_arms, 0)
18
+ @values = Array.new(@n_arms, 0.0)
19
+ @probs = Array.new(@n_arms, 0.0)
20
+ end
21
+
22
+ # Update in a lump. new_counts is a list of each arm's trial number and
23
+ # new_rewards means a list of rewards.
24
+ def bulk_update(new_counts, new_rewards)
25
+
26
+ # update the numbers of each arm's trial
27
+ @counts = new_counts
28
+
29
+ # update expectations of each arm
30
+ new_values = []
31
+ @counts.zip( new_rewards ).each do |n, r|
32
+ new_values << r / n.to_f
33
+ end
34
+ @values = new_values
35
+
36
+ # calcurate probabilities
37
+ j = ind_max(@values)
38
+ for i in 0..@n_arms-1 do
39
+ if i == j
40
+ @probs[i] = 1-@epsilon
41
+ else
42
+ @probs[i] = (@epsilon)/(@n_arms-1)
43
+ end
44
+ end
45
+
46
+ return @probs
47
+ end
48
+
49
+ def update(chosen_arm, reward)
50
+ @counts[chosen_arm] = @counts[chosen_arm] + 1
51
+ n = @counts[chosen_arm]
52
+
53
+ value = @values[chosen_arm]
54
+ new_value = ((n - 1) / n.to_f) * value + (1 / n.to_f) * reward
55
+ @values[chosen_arm] = new_value
56
+ return
57
+ end
58
+
59
+
60
+ def select_arm
61
+ if rand > @epsilon
62
+ return ind_max(@values)
63
+ else
64
+ return rand(@values.size)
65
+ end
66
+ end
67
+
68
+ private
69
+ def ind_max(x)
70
+ m = x.max
71
+ return x.index(m)
72
+ end
73
+
74
+ def categorical_draw(probs)
75
+ z = rand()
76
+ cum_prob = 0.0
77
+
78
+ probs.size().times do |i|
79
+ prob = probs[i]
80
+ cum_prob += prob
81
+ if cum_prob > z
82
+ return i
83
+ end
84
+ end
85
+
86
+ return probs.size() - 1
87
+ end
88
+ end
89
+ end
@@ -0,0 +1,46 @@
1
+ require 'simple-random'
2
+
3
+ module MultiArmedBandit
4
+
5
+ class MultiplePlayTS
6
+
7
+ attr_accessor :k, :l, :alpha, :beta, :arm_ids
8
+
9
+ # k: num of arms
10
+ # l: num of selected arms
11
+ def initialize(k, l, setseed=TRUE)
12
+ @k = k
13
+ @l = l
14
+ @r = SimpleRandom.new
15
+ # By default the same random seed is used, so we change it
16
+ @r.set_seed if setseed==TRUE
17
+ reset
18
+ end
19
+
20
+ def reset
21
+ @alpha = Array.new(@k, 1)
22
+ @beta = Array.new(@k, 1)
23
+ @arm_ids = Array.new(@k, '')
24
+ end
25
+
26
+ # Get selected arm ids
27
+ def get_selected_arms
28
+ selected_arms = @alpha.zip(@beta).zip(@arm_ids)
29
+ .map{|c,i| [i, @r.beta(c[0],c[1])]}
30
+ .sort_by{|v| -v[1]}
31
+ .map{|v| v[0]}[0..@l-1]
32
+ end
33
+
34
+ # selected_arms: List of selected drawn arms
35
+ def update_params_draw(selected_arms)
36
+ selected_arms.map{|i| @beta[i]+=1}
37
+ end
38
+
39
+ # idx: Index number of rewarded arm
40
+ def update_params_reward(idx)
41
+ @alpha[idx]+=1
42
+ @beta[idx]-=1
43
+ end
44
+
45
+ end
46
+ end
@@ -0,0 +1,80 @@
1
+
2
+
3
+ module MultiArmedBandit
4
+
5
+ class Softmax
6
+
7
+ attr_accessor :temperature, :counts, :values, :probs, :n_arms
8
+
9
+ # Initialize an object
10
+ def initialize(temperature, n_arms)
11
+ @n_arms = n_arms
12
+ @temperature = temperature
13
+ reset()
14
+ end
15
+
16
+ # Reset instance variables
17
+ def reset()
18
+ @counts = Array.new(@n_arms, 0)
19
+ @values = Array.new(@n_arms, 0.0)
20
+ @probs = Array.new(@n_arms, 0.0)
21
+ end
22
+
23
+ # Update in a lump. new_counts is a list of each arm's trial number and
24
+ # new_rewards means a list of rewards.
25
+ # both each num in new_counts and new_rewards should be accumulated numbers
26
+ def bulk_update(new_counts, new_rewards)
27
+
28
+ # update the numbers of each arm's trial
29
+ @counts = new_counts
30
+
31
+ # update expectations of each arm
32
+ new_values = []
33
+ @counts.zip( new_rewards ).each do |n, reward|
34
+ new_values << reward / n.to_f
35
+ end
36
+ @values = new_values
37
+
38
+ # calcurate probabilities
39
+ z = @values.collect{|i| Math.exp(i/@temperature)}.reduce(:+)
40
+ @probs = @values.collect{|i| Math.exp(i/@temperature)/z}
41
+
42
+ return probs
43
+ end
44
+
45
+
46
+ def update(chosen_arm, reward)
47
+ @counts[chosen_arm] = @counts[chosen_arm] + 1
48
+ n = @counts[chosen_arm]
49
+
50
+ value = @values[chosen_arm]
51
+ new_value = ((n - 1) / n.to_f) * value + (1 / n.to_f) * reward
52
+ @values[chosen_arm] = new_value
53
+ return
54
+ end
55
+
56
+
57
+ def select_arm
58
+ z = @values.collect{|i| Math.exp(i/@temperature)}.reduce(:+)
59
+ @probs = @values.collect{|i| Map.exp(i/@temperature)/z}
60
+ return categorical_draw(@probs)
61
+ end
62
+
63
+ private
64
+ def categorical_draw(probs)
65
+ z = rand()
66
+ cum_prob = 0.0
67
+
68
+ probs.size().times do |i|
69
+ prob = probs[i]
70
+ cum_prob += prob
71
+ if cum_prob > z
72
+ return i
73
+ end
74
+ end
75
+
76
+ return probs.size() - 1
77
+ end
78
+
79
+ end
80
+ end
@@ -0,0 +1,3 @@
1
+ module MultiArmedBandit
2
+ VERSION = "0.2.0"
3
+ end
@@ -0,0 +1,9 @@
1
+ require "multi_armed_bandit/version"
2
+ require "multi_armed_bandit/softmax"
3
+ require "multi_armed_bandit/epsilon_greedy"
4
+ require "multi_armed_bandit/mp_ts"
5
+
6
+ module MultiArmedBandit
7
+ # Your code goes here...
8
+
9
+ end
@@ -0,0 +1,34 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'multi_armed_bandit/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "multi_armed_bandit"
8
+ spec.version = MultiArmedBandit::VERSION
9
+ spec.authors = ["kndt84"]
10
+ spec.email = ["takashi.kaneda@vasily.jp"]
11
+
12
+ spec.summary = %q{multi-armed bandit algorithms}
13
+ # spec.description = %q{TODO: Write a longer description or delete this line.}
14
+ spec.homepage = "https://github.com/vasilyjp/multi_armed_bandit"
15
+ spec.license = "MIT"
16
+
17
+ # Prevent pushing this gem to RubyGems.org by setting 'allowed_push_host', or
18
+ # delete this section to allow pushing this gem to any host.
19
+ if spec.respond_to?(:metadata)
20
+ spec.metadata['allowed_push_host'] = "TODO: Set to 'http://mygemserver.com'"
21
+ else
22
+ raise "RubyGems 2.0 or newer is required to protect against public gem pushes."
23
+ end
24
+
25
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
26
+ spec.bindir = "exe"
27
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
28
+ spec.require_paths = ["lib"]
29
+
30
+ spec.add_development_dependency "bundler", "~> 1.10"
31
+ spec.add_development_dependency "rake", "~> 10.0"
32
+ spec.add_development_dependency "rspec"
33
+ spec.add_dependency "simple-random"
34
+ end
metadata ADDED
@@ -0,0 +1,115 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: multi_armed_bandit
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.2.0
5
+ platform: ruby
6
+ authors:
7
+ - kndt84
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2016-04-14 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ~>
18
+ - !ruby/object:Gem::Version
19
+ version: '1.10'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ~>
25
+ - !ruby/object:Gem::Version
26
+ version: '1.10'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ~>
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ~>
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - '>='
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - '>='
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: simple-random
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - '>='
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - '>='
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ description:
70
+ email:
71
+ - takashi.kaneda@vasily.jp
72
+ executables: []
73
+ extensions: []
74
+ extra_rdoc_files: []
75
+ files:
76
+ - .gitignore
77
+ - .rspec
78
+ - .travis.yml
79
+ - Gemfile
80
+ - README.md
81
+ - Rakefile
82
+ - bin/console
83
+ - bin/setup
84
+ - lib/multi_armed_bandit.rb
85
+ - lib/multi_armed_bandit/epsilon_greedy.rb
86
+ - lib/multi_armed_bandit/mp_ts.rb
87
+ - lib/multi_armed_bandit/softmax.rb
88
+ - lib/multi_armed_bandit/version.rb
89
+ - multi_armed_bandit.gemspec
90
+ homepage: https://github.com/vasilyjp/multi_armed_bandit
91
+ licenses:
92
+ - MIT
93
+ metadata:
94
+ allowed_push_host: 'TODO: Set to ''http://mygemserver.com'''
95
+ post_install_message:
96
+ rdoc_options: []
97
+ require_paths:
98
+ - lib
99
+ required_ruby_version: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - '>='
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ required_rubygems_version: !ruby/object:Gem::Requirement
105
+ requirements:
106
+ - - '>='
107
+ - !ruby/object:Gem::Version
108
+ version: '0'
109
+ requirements: []
110
+ rubyforge_project:
111
+ rubygems_version: 2.0.14
112
+ signing_key:
113
+ specification_version: 4
114
+ summary: multi-armed bandit algorithms
115
+ test_files: []