RedditPostClassifierBot 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 647bcb069e655189eccf082375f55290a6540f02
4
+ data.tar.gz: 5c635dcbe95002f31d47a76ecc3ef9b86feac06a
5
+ SHA512:
6
+ metadata.gz: 39d3cc771c573738bab103e72260588fe52682ca4830eb77e001c6f224b20b6efd4c4c507ca698a42ce0e8775c4ee34e169f20d86c46762a48db0df5493d8ab9
7
+ data.tar.gz: c2ef74b74934a8626db598fe966e25313143b445c4833c9f63712ba262fddb6ddefa552b555fa33180da770ffd0b520bb5bab3659a0033bdbafcb120aa62b2b9
data/.gitignore ADDED
@@ -0,0 +1,10 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ *.gem
data/.travis.yml ADDED
@@ -0,0 +1,3 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.2.0
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in RedditPostClassifierBot.gemspec
4
+ gemspec
data/README.md ADDED
@@ -0,0 +1,86 @@
1
+ # RedditPostClassifierBot
2
+
3
+ This gem wraps the Ruby [nbayes](https://github.com/oasic/nbayes) gem to run Naive Bayes text classification on Reddit posts. It fetches posts from the front, controversial, top, and new pages and classifies them according to what page they were found on. Once trained, it can be used to try and predict if a new post is frontpage material, or looks as such. It is not however, responsible for your karma whoring.
4
+
5
+ ## Installation
6
+
7
+ Add this line to your application's Gemfile:
8
+
9
+ ```ruby
10
+ gem 'RedditPostClassifierBot'
11
+ ```
12
+
13
+ And then execute:
14
+
15
+ $ bundle
16
+
17
+ Or install it yourself as:
18
+
19
+ $ gem install RedditPostClassifierBot
20
+
21
+ ## Usage
22
+
23
+ Currently this gem is not so much a bot, but a library to help build one. You can play around with it in the terminal:
24
+
25
+ ```shell
26
+ irb -r RedditPostClassifierBot
27
+ ```
28
+
29
+ The gem comes with a small training data set and you don't have to train it your self, unless you want it to get smarter.
30
+
31
+ ### Classify a single post
32
+
33
+ ```ruby
34
+ c = RedditPostClassifierBot.classifier
35
+ c.classify "subreddit", "title", "self text or url"
36
+ # => "classification"
37
+ ```
38
+
39
+ The classification will be one of the pages it thinks this post will be on e.g. hot, controversial, top hour etc. See `RedditTrainer.trained_on` for a full list.
40
+
41
+ ### Classify a whole page or subreddit
42
+
43
+ ```ruby
44
+ classifications = RedditPostClassifierBot.classify_posts "/r/all"
45
+ # => { front: ["permalinks to front page classed posts", ...], ... }
46
+ ```
47
+ `classifications` will be a hash where the keys are the predicted classes e.g. `:hot` and the values are arrays of permalinks to the posts it classified under that class.
48
+
49
+ ### Training
50
+
51
+ ```ruby
52
+ c = RedditPostClassifierBot.train_classifier
53
+ ```
54
+
55
+ After training, the bot will dump its training data to a yml file "./RPCB-nbayes.yml". You can customize the file path by setting `ENV["NBAYES_FILE_PATH"]`.
56
+
57
+ Further training customization can be done by instantiating `RedditTrainer` directly.
58
+
59
+ ```ruby
60
+ trainer = RedditPostClassifierBot::RedditTrainer.new trials, per_page, debug?
61
+ ```
62
+
63
+ Arguments:
64
+ - `trials` is an integer to specify how many pages to paginate to. Default: 10
65
+ - `per_page` is how many posts to fetch from each page. Default: 200
66
+ - `debug` toggles `puts`ing out what page it's currently classifying. Default: true
67
+
68
+ You can also customize what classes i.e. pages and/or subreddits, the trainer will use for classification by modifying the `RedditPostClassifierBot::RedditTrainer::CLASSES` hash where the keys are the classifications and the values are the relative path to the page to fetch from.
69
+
70
+ ## Todo
71
+
72
+ - Figure out a way to fetch low scoring posts to classify as such.
73
+
74
+ ## Development
75
+
76
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `bin/console` for an interactive prompt that will allow you to experiment.
77
+
78
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release` to create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
79
+
80
+ ## Contributing
81
+
82
+ 1. Fork it ( https://github.com/[my-github-username]/RedditPostClassifierBot/fork )
83
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
84
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
85
+ 4. Push to the branch (`git push origin my-new-feature`)
86
+ 5. Create a new Pull Request