RedditPostClassifierBot 0.1.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +10 -0
- data/.travis.yml +3 -0
- data/Gemfile +4 -0
- data/README.md +86 -0
- data/RPCB-nbayes.yml +4568 -0
- data/Rakefile +1 -0
- data/RedditPostClassifierBot.gemspec +26 -0
- data/bin/console +14 -0
- data/bin/setup +7 -0
- data/lib/RedditPostClassifierBot/nbayes_classifier.rb +34 -0
- data/lib/RedditPostClassifierBot/reddit_trainer.rb +121 -0
- data/lib/RedditPostClassifierBot/version.rb +3 -0
- data/lib/RedditPostClassifierBot.rb +20 -0
- metadata +101 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 647bcb069e655189eccf082375f55290a6540f02
|
4
|
+
data.tar.gz: 5c635dcbe95002f31d47a76ecc3ef9b86feac06a
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 39d3cc771c573738bab103e72260588fe52682ca4830eb77e001c6f224b20b6efd4c4c507ca698a42ce0e8775c4ee34e169f20d86c46762a48db0df5493d8ab9
|
7
|
+
data.tar.gz: c2ef74b74934a8626db598fe966e25313143b445c4833c9f63712ba262fddb6ddefa552b555fa33180da770ffd0b520bb5bab3659a0033bdbafcb120aa62b2b9
|
data/.gitignore
ADDED
data/.travis.yml
ADDED
data/Gemfile
ADDED
data/README.md
ADDED
@@ -0,0 +1,86 @@
|
|
1
|
+
# RedditPostClassifierBot
|
2
|
+
|
3
|
+
This gem wraps the Ruby [nbayes](https://github.com/oasic/nbayes) gem to run Naive Bayes text classification on Reddit posts. It fetches posts from the front, controversial, top, and new pages and classifies them according to what page they were found on. Once trained, it can be used to try and predict if a new post is frontpage material, or looks as such. It is not however, responsible for your karma whoring.
|
4
|
+
|
5
|
+
## Installation
|
6
|
+
|
7
|
+
Add this line to your application's Gemfile:
|
8
|
+
|
9
|
+
```ruby
|
10
|
+
gem 'RedditPostClassifierBot'
|
11
|
+
```
|
12
|
+
|
13
|
+
And then execute:
|
14
|
+
|
15
|
+
$ bundle
|
16
|
+
|
17
|
+
Or install it yourself as:
|
18
|
+
|
19
|
+
$ gem install RedditPostClassifierBot
|
20
|
+
|
21
|
+
## Usage
|
22
|
+
|
23
|
+
Currently this gem is not so much a bot, but a library to help build one. You can play around with it in the terminal:
|
24
|
+
|
25
|
+
```shell
|
26
|
+
irb -r RedditPostClassifierBot
|
27
|
+
```
|
28
|
+
|
29
|
+
The gem comes with a small training data set and you don't have to train it your self, unless you want it to get smarter.
|
30
|
+
|
31
|
+
### Classify a single post
|
32
|
+
|
33
|
+
```ruby
|
34
|
+
c = RedditPostClassifierBot.classifier
|
35
|
+
c.classify "subreddit", "title", "self text or url"
|
36
|
+
# => "classification"
|
37
|
+
```
|
38
|
+
|
39
|
+
The classification will be one of the pages it thinks this post will be on e.g. hot, controversial, top hour etc. See `RedditTrainer.trained_on` for a full list.
|
40
|
+
|
41
|
+
### Classify a whole page or subreddit
|
42
|
+
|
43
|
+
```ruby
|
44
|
+
classifications = RedditPostClassifierBot.classify_posts "/r/all"
|
45
|
+
# => { front: ["permalinks to front page classed posts", ...], ... }
|
46
|
+
```
|
47
|
+
`classifications` will be a hash where the keys are the predicted classes e.g. `:hot` and the values are arrays of permalinks to the posts it classified under that class.
|
48
|
+
|
49
|
+
### Training
|
50
|
+
|
51
|
+
```ruby
|
52
|
+
c = RedditPostClassifierBot.train_classifier
|
53
|
+
```
|
54
|
+
|
55
|
+
After training, the bot will dump its training data to a yml file "./RPCB-nbayes.yml". You can customize the file path by setting `ENV["NBAYES_FILE_PATH"]`.
|
56
|
+
|
57
|
+
Further training customization can be done by instantiating `RedditTrainer` directly.
|
58
|
+
|
59
|
+
```ruby
|
60
|
+
trainer = RedditPostClassifierBot::RedditTrainer.new trials, per_page, debug?
|
61
|
+
```
|
62
|
+
|
63
|
+
Arguments:
|
64
|
+
- `trials` is an integer to specify how many pages to paginate to. Default: 10
|
65
|
+
- `per_page` is how many posts to fetch from each page. Default: 200
|
66
|
+
- `debug` toggles `puts`ing out what page it's currently classifying. Default: true
|
67
|
+
|
68
|
+
You can also customize what classes i.e. pages and/or subreddits, the trainer will use for classification by modifying the `RedditPostClassifierBot::RedditTrainer::CLASSES` hash where the keys are the classifications and the values are the relative path to the page to fetch from.
|
69
|
+
|
70
|
+
## Todo
|
71
|
+
|
72
|
+
- Figure out a way to fetch low scoring posts to classify as such.
|
73
|
+
|
74
|
+
## Development
|
75
|
+
|
76
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run `bin/console` for an interactive prompt that will allow you to experiment.
|
77
|
+
|
78
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release` to create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
79
|
+
|
80
|
+
## Contributing
|
81
|
+
|
82
|
+
1. Fork it ( https://github.com/[my-github-username]/RedditPostClassifierBot/fork )
|
83
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
84
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
85
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
86
|
+
5. Create a new Pull Request
|