twitter_ebooks 2.1.3 → 2.1.4
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +101 -2
- data/bin/ebooks +24 -23
- data/lib/twitter_ebooks/model.rb +13 -5
- data/lib/twitter_ebooks/version.rb +1 -1
- data/skeleton/Gemfile +4 -0
- data/skeleton/Gemfile.lock +70 -0
- data/skeleton/corpus/.gitignore +0 -0
- data/skeleton/model/.gitignore +0 -0
- data/skeleton/run.rb +0 -0
- metadata +6 -2
data/README.md
CHANGED
@@ -1,9 +1,108 @@
|
|
1
|
-
# twitter\_ebooks 2.1.
|
1
|
+
# twitter\_ebooks 2.1.4
|
2
2
|
|
3
|
-
|
3
|
+
Rewrite of my twitter\_ebooks code. While the original was solely a tweeting Markov generator, this framework helps you build any kind of interactive twitterbot which responds to mentions/DMs.
|
4
4
|
|
5
5
|
## Installation
|
6
6
|
|
7
7
|
```bash
|
8
8
|
gem install twitter_ebooks
|
9
9
|
```
|
10
|
+
|
11
|
+
## Setting up a bot
|
12
|
+
|
13
|
+
Run `ebooks new <reponame>` to generate a new repository containing a sample bots.rb file, which looks like this:
|
14
|
+
|
15
|
+
``` ruby
|
16
|
+
# This is an example bot definition with event handlers commented out
|
17
|
+
# You can define as many of these as you like; they will run simultaneously
|
18
|
+
|
19
|
+
Ebooks::Bot.new("./test") do |bot|
|
20
|
+
# Consumer details come from registering an app at https://dev.twitter.com/
|
21
|
+
# OAuth details can be fetched with https://github.com/marcel/twurl
|
22
|
+
bot.consumer_key = "" # Your app consumer key
|
23
|
+
bot.consumer_secret = "" # Your app consumer secret
|
24
|
+
bot.oauth_token = "" # Token connecting the app to this account
|
25
|
+
bot.oauth_token_secret = "" # Secret connecting the app to this account
|
26
|
+
|
27
|
+
bot.on_message do |dm|
|
28
|
+
# Reply to a DM
|
29
|
+
# bot.reply(dm, "secret secrets")
|
30
|
+
end
|
31
|
+
|
32
|
+
bot.on_follow do |user|
|
33
|
+
# Follow a user back
|
34
|
+
# bot.follow(user[:screen_name])
|
35
|
+
end
|
36
|
+
|
37
|
+
bot.on_mention do |tweet, meta|
|
38
|
+
# Reply to a mention
|
39
|
+
# bot.reply(tweet, meta[:reply_prefix] + "oh hullo")
|
40
|
+
end
|
41
|
+
|
42
|
+
bot.on_timeline do |tweet, meta|
|
43
|
+
# Reply to a tweet in the bot's timeline
|
44
|
+
# bot.reply(tweet, meta[:reply_prefix] + "nice tweet")
|
45
|
+
end
|
46
|
+
|
47
|
+
bot.scheduler.every '24h' do
|
48
|
+
# Tweet something every 24 hours
|
49
|
+
# See https://github.com/jmettraux/rufus-scheduler
|
50
|
+
# bot.tweet("hi")
|
51
|
+
end
|
52
|
+
end
|
53
|
+
```
|
54
|
+
|
55
|
+
Bots defined like this can be spawned by executing `run.rb` in the same directory, and will operate together in a single eventmachine loop. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.
|
56
|
+
|
57
|
+
## Archiving accounts
|
58
|
+
|
59
|
+
twitter\_ebooks comes with a syncing tool to download and then incrementally update a local json archive of a user's tweets.
|
60
|
+
|
61
|
+
``` zsh
|
62
|
+
➜ ebooks-ebooks git:(master) ebooks archive 0xabad1dea corpus/0xabad1dea.json
|
63
|
+
Currently 20209 tweets for 0xabad1dea
|
64
|
+
Received 67 new tweets
|
65
|
+
```
|
66
|
+
|
67
|
+
The first time you'll run this, it'll ask for auth details to connect with. Due to API limitations, for users with high numbers of tweets it may not be possible to get their entire history in the initial download. However, so long as you run it frequently enough you can maintain a perfect copy indefinitely into the future.
|
68
|
+
|
69
|
+
## Text models
|
70
|
+
|
71
|
+
In order to use the included text modeling, you'll first need to preprocess your archive into a more efficient form:
|
72
|
+
|
73
|
+
``` zsh
|
74
|
+
➜ ebooks-ebooks git:(master) ebooks consume corpus/0xabad1dea.json
|
75
|
+
Reading json corpus from corpus/0xabad1dea.json
|
76
|
+
Removing commented lines and sorting mentions
|
77
|
+
Segmenting text into sentences
|
78
|
+
Tokenizing 7075 statements and 17947 mentions
|
79
|
+
Ranking keywords
|
80
|
+
Corpus consumed to model/0xabad1dea.model
|
81
|
+
```
|
82
|
+
|
83
|
+
Notably, this works with both json tweet archives and plaintext files (based on file extension), so you can make a model out of any kind of text.
|
84
|
+
|
85
|
+
Once you have a model, the primary use is to produce statements and related responses to input, using a pseudo-Markov generator:
|
86
|
+
|
87
|
+
``` ruby
|
88
|
+
> model = Ebooks::Model.load("model/0xabad1dea.model")
|
89
|
+
> model.make_statement(140)
|
90
|
+
=> "My Terrible Netbook may be the kind of person who buys Starbucks, but this Rackspace vuln is pretty straight up a backdoor"
|
91
|
+
> model.make_response("The NSA is coming!", 130)
|
92
|
+
=> "Hey - someone who claims to be an NSA conspiracy"
|
93
|
+
```
|
94
|
+
|
95
|
+
The secondary function is the "interesting keywords" list. For example, I use this to determine whether a bot wants to fav/retweet/reply to something in its timeline:
|
96
|
+
|
97
|
+
``` ruby
|
98
|
+
top100 = model.keywords.top(100)
|
99
|
+
tokens = Ebooks::NLP.tokenize(tweet[:text])
|
100
|
+
|
101
|
+
if tokens.find { |t| top100.include?(t) }
|
102
|
+
bot.twitter.favorite(tweet[:id])
|
103
|
+
end
|
104
|
+
```
|
105
|
+
|
106
|
+
## Other notes
|
107
|
+
|
108
|
+
If you're using Heroku, which has no persistent filesystem, automating the process of archiving, consuming and updating can be tricky. My current solution is just a daily cron job which commits and pushes for me, which is pretty hacky.
|
data/bin/ebooks
CHANGED
@@ -38,10 +38,9 @@ module Ebooks
|
|
38
38
|
shortname = filename.split('.')[0..-2].join('.')
|
39
39
|
hash = Digest::MD5.hexdigest(File.read(path))
|
40
40
|
|
41
|
-
log "Consuming text corpus: #{filename}"
|
42
41
|
outpath = File.join(APP_PATH, 'model', "#{shortname}.model")
|
43
42
|
Model.consume(path).save(outpath)
|
44
|
-
log "Corpus consumed"
|
43
|
+
log "Corpus consumed to #{outpath}"
|
45
44
|
end
|
46
45
|
end
|
47
46
|
|
@@ -73,28 +72,30 @@ module Ebooks
|
|
73
72
|
bot.tweet(statement)
|
74
73
|
end
|
75
74
|
|
76
|
-
def self.jsonify(
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
75
|
+
def self.jsonify(paths)
|
76
|
+
paths.each do |path|
|
77
|
+
name = File.basename(path).split('.')[0]
|
78
|
+
new_path = name + ".json"
|
79
|
+
|
80
|
+
tweets = []
|
81
|
+
id = nil
|
82
|
+
File.read(path).split("\n").each do |l|
|
83
|
+
if l.start_with?('# ')
|
84
|
+
id = l.split('# ')[-1]
|
85
|
+
else
|
86
|
+
tweet = { text: l }
|
87
|
+
if id
|
88
|
+
tweet[:id] = id
|
89
|
+
id = nil
|
90
|
+
end
|
91
|
+
tweets << tweet
|
90
92
|
end
|
91
|
-
tweets << tweet
|
92
93
|
end
|
93
|
-
end
|
94
94
|
|
95
|
-
|
96
|
-
|
97
|
-
|
95
|
+
File.open(new_path, 'w') do |f|
|
96
|
+
log "Writing #{tweets.length} tweets to #{new_path}"
|
97
|
+
f.write(JSON.pretty_generate(tweets))
|
98
|
+
end
|
98
99
|
end
|
99
100
|
end
|
100
101
|
|
@@ -106,7 +107,7 @@ module Ebooks
|
|
106
107
|
ebooks score <model_path> <input>
|
107
108
|
ebooks archive <@user> <outpath>
|
108
109
|
ebooks tweet <model_path> <@bot>
|
109
|
-
ebooks jsonify <old_corpus_path> [
|
110
|
+
ebooks jsonify <old_corpus_path> [...]
|
110
111
|
"""
|
111
112
|
|
112
113
|
if args.length == 0
|
@@ -121,7 +122,7 @@ module Ebooks
|
|
121
122
|
when "score" then score(args[1], args[2..-1].join(' '))
|
122
123
|
when "archive" then archive(args[1], args[2])
|
123
124
|
when "tweet" then tweet(args[1], args[2])
|
124
|
-
when "jsonify" then jsonify(args[1]
|
125
|
+
when "jsonify" then jsonify(args[1..-1])
|
125
126
|
end
|
126
127
|
end
|
127
128
|
end
|
data/lib/twitter_ebooks/model.rb
CHANGED
@@ -17,14 +17,22 @@ module Ebooks
|
|
17
17
|
Marshal.load(File.read(path))
|
18
18
|
end
|
19
19
|
|
20
|
-
def consume(
|
21
|
-
|
22
|
-
@hash = Digest::MD5.hexdigest(
|
20
|
+
def consume(path)
|
21
|
+
content = File.read(path)
|
22
|
+
@hash = Digest::MD5.hexdigest(content)
|
23
|
+
|
24
|
+
if path.split('.')[-1] == "json"
|
25
|
+
log "Reading json corpus from #{path}"
|
26
|
+
lines = JSON.parse(content, symbolize_names: true).map do |tweet|
|
27
|
+
tweet[:text]
|
28
|
+
end
|
29
|
+
else
|
30
|
+
log "Reading plaintext corpus from #{path}"
|
31
|
+
lines = content.split("\n")
|
32
|
+
end
|
23
33
|
|
24
|
-
text = File.read(txtpath)
|
25
34
|
log "Removing commented lines and sorting mentions"
|
26
35
|
|
27
|
-
lines = text.split("\n")
|
28
36
|
keeping = []
|
29
37
|
mentions = []
|
30
38
|
lines.each do |l|
|
data/skeleton/Gemfile
ADDED
@@ -0,0 +1,70 @@
|
|
1
|
+
GEM
|
2
|
+
remote: http://rubygems.org/
|
3
|
+
specs:
|
4
|
+
addressable (2.3.5)
|
5
|
+
atomic (1.1.14)
|
6
|
+
awesome_print (1.2.0)
|
7
|
+
cookiejar (0.3.0)
|
8
|
+
daemons (1.1.9)
|
9
|
+
em-http-request (1.0.3)
|
10
|
+
addressable (>= 2.2.3)
|
11
|
+
cookiejar
|
12
|
+
em-socksify
|
13
|
+
eventmachine (>= 1.0.0.beta.4)
|
14
|
+
http_parser.rb (>= 0.5.3)
|
15
|
+
em-socksify (0.3.0)
|
16
|
+
eventmachine (>= 1.0.0.beta.4)
|
17
|
+
em-twitter (0.2.2)
|
18
|
+
eventmachine (~> 1.0)
|
19
|
+
http_parser.rb (~> 0.5)
|
20
|
+
simple_oauth (~> 0.1)
|
21
|
+
engtagger (0.1.2)
|
22
|
+
eventmachine (1.0.3)
|
23
|
+
faraday (0.8.8)
|
24
|
+
multipart-post (~> 1.2.0)
|
25
|
+
fast-stemmer (1.0.2)
|
26
|
+
gingerice (1.2.1)
|
27
|
+
addressable
|
28
|
+
awesome_print
|
29
|
+
highscore (1.1.0)
|
30
|
+
whatlanguage (>= 1.0.0)
|
31
|
+
htmlentities (4.3.1)
|
32
|
+
http_parser.rb (0.5.3)
|
33
|
+
minitest (5.0.8)
|
34
|
+
multi_json (1.8.2)
|
35
|
+
multipart-post (1.2.0)
|
36
|
+
rufus-scheduler (3.0.2)
|
37
|
+
tzinfo
|
38
|
+
simple_oauth (0.2.0)
|
39
|
+
thread_safe (0.1.3)
|
40
|
+
atomic
|
41
|
+
tweetstream (2.5.0)
|
42
|
+
daemons (~> 1.1)
|
43
|
+
em-http-request (~> 1.0.2)
|
44
|
+
em-twitter (~> 0.2)
|
45
|
+
twitter (~> 4.5)
|
46
|
+
yajl-ruby (~> 1.1)
|
47
|
+
twitter (4.8.1)
|
48
|
+
faraday (~> 0.8, < 0.10)
|
49
|
+
multi_json (~> 1.0)
|
50
|
+
simple_oauth (~> 0.2)
|
51
|
+
twitter_ebooks (2.1.2)
|
52
|
+
engtagger
|
53
|
+
fast-stemmer
|
54
|
+
gingerice
|
55
|
+
highscore
|
56
|
+
htmlentities
|
57
|
+
minitest
|
58
|
+
rufus-scheduler
|
59
|
+
tweetstream (= 2.5)
|
60
|
+
twitter (~> 4.5)
|
61
|
+
tzinfo (1.1.0)
|
62
|
+
thread_safe (~> 0.1)
|
63
|
+
whatlanguage (1.0.5)
|
64
|
+
yajl-ruby (1.1.0)
|
65
|
+
|
66
|
+
PLATFORMS
|
67
|
+
ruby
|
68
|
+
|
69
|
+
DEPENDENCIES
|
70
|
+
twitter_ebooks
|
File without changes
|
File without changes
|
data/skeleton/run.rb
CHANGED
File without changes
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: twitter_ebooks
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.1.
|
4
|
+
version: 2.1.4
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-11-
|
12
|
+
date: 2013-11-27 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: minitest
|
@@ -183,8 +183,12 @@ files:
|
|
183
183
|
- lib/twitter_ebooks/version.rb
|
184
184
|
- script/process_anc_data.rb
|
185
185
|
- skeleton/.gitignore
|
186
|
+
- skeleton/Gemfile
|
187
|
+
- skeleton/Gemfile.lock
|
186
188
|
- skeleton/Procfile
|
187
189
|
- skeleton/bots.rb
|
190
|
+
- skeleton/corpus/.gitignore
|
191
|
+
- skeleton/model/.gitignore
|
188
192
|
- skeleton/run.rb
|
189
193
|
- test/corpus/0xabad1dea.tweets
|
190
194
|
- test/keywords.rb
|