twitter_ebooks 2.1.3 → 2.1.4

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -1,9 +1,108 @@
1
- # twitter\_ebooks 2.1.3
1
+ # twitter\_ebooks 2.1.4
2
2
 
3
- Complete rewrite of twitter\_ebooks. Allows context-sensitive responsive bots via the Twitter streaming API, along with higher-quality ngram modeling. Still needs a bit of cleaning and documenting.
3
+ Rewrite of my twitter\_ebooks code. While the original was solely a tweeting Markov generator, this framework helps you build any kind of interactive twitterbot which responds to mentions/DMs.
4
4
 
5
5
  ## Installation
6
6
 
7
7
  ```bash
8
8
  gem install twitter_ebooks
9
9
  ```
10
+
11
+ ## Setting up a bot
12
+
13
+ Run `ebooks new <reponame>` to generate a new repository containing a sample bots.rb file, which looks like this:
14
+
15
+ ``` ruby
16
+ # This is an example bot definition with event handlers commented out
17
+ # You can define as many of these as you like; they will run simultaneously
18
+
19
+ Ebooks::Bot.new("./test") do |bot|
20
+ # Consumer details come from registering an app at https://dev.twitter.com/
21
+ # OAuth details can be fetched with https://github.com/marcel/twurl
22
+ bot.consumer_key = "" # Your app consumer key
23
+ bot.consumer_secret = "" # Your app consumer secret
24
+ bot.oauth_token = "" # Token connecting the app to this account
25
+ bot.oauth_token_secret = "" # Secret connecting the app to this account
26
+
27
+ bot.on_message do |dm|
28
+ # Reply to a DM
29
+ # bot.reply(dm, "secret secrets")
30
+ end
31
+
32
+ bot.on_follow do |user|
33
+ # Follow a user back
34
+ # bot.follow(user[:screen_name])
35
+ end
36
+
37
+ bot.on_mention do |tweet, meta|
38
+ # Reply to a mention
39
+ # bot.reply(tweet, meta[:reply_prefix] + "oh hullo")
40
+ end
41
+
42
+ bot.on_timeline do |tweet, meta|
43
+ # Reply to a tweet in the bot's timeline
44
+ # bot.reply(tweet, meta[:reply_prefix] + "nice tweet")
45
+ end
46
+
47
+ bot.scheduler.every '24h' do
48
+ # Tweet something every 24 hours
49
+ # See https://github.com/jmettraux/rufus-scheduler
50
+ # bot.tweet("hi")
51
+ end
52
+ end
53
+ ```
54
+
55
+ Bots defined like this can be spawned by executing `run.rb` in the same directory, and will operate together in a single eventmachine loop. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.
56
+
57
+ ## Archiving accounts
58
+
59
+ twitter\_ebooks comes with a syncing tool to download and then incrementally update a local json archive of a user's tweets.
60
+
61
+ ``` zsh
62
+ ➜ ebooks-ebooks git:(master) ebooks archive 0xabad1dea corpus/0xabad1dea.json
63
+ Currently 20209 tweets for 0xabad1dea
64
+ Received 67 new tweets
65
+ ```
66
+
67
+ The first time you'll run this, it'll ask for auth details to connect with. Due to API limitations, for users with high numbers of tweets it may not be possible to get their entire history in the initial download. However, so long as you run it frequently enough you can maintain a perfect copy indefinitely into the future.
68
+
69
+ ## Text models
70
+
71
+ In order to use the included text modeling, you'll first need to preprocess your archive into a more efficient form:
72
+
73
+ ``` zsh
74
+ ➜ ebooks-ebooks git:(master) ebooks consume corpus/0xabad1dea.json
75
+ Reading json corpus from corpus/0xabad1dea.json
76
+ Removing commented lines and sorting mentions
77
+ Segmenting text into sentences
78
+ Tokenizing 7075 statements and 17947 mentions
79
+ Ranking keywords
80
+ Corpus consumed to model/0xabad1dea.model
81
+ ```
82
+
83
+ Notably, this works with both json tweet archives and plaintext files (based on file extension), so you can make a model out of any kind of text.
84
+
85
+ Once you have a model, the primary use is to produce statements and related responses to input, using a pseudo-Markov generator:
86
+
87
+ ``` ruby
88
+ > model = Ebooks::Model.load("model/0xabad1dea.model")
89
+ > model.make_statement(140)
90
+ => "My Terrible Netbook may be the kind of person who buys Starbucks, but this Rackspace vuln is pretty straight up a backdoor"
91
+ > model.make_response("The NSA is coming!", 130)
92
+ => "Hey - someone who claims to be an NSA conspiracy"
93
+ ```
94
+
95
+ The secondary function is the "interesting keywords" list. For example, I use this to determine whether a bot wants to fav/retweet/reply to something in its timeline:
96
+
97
+ ``` ruby
98
+ top100 = model.keywords.top(100)
99
+ tokens = Ebooks::NLP.tokenize(tweet[:text])
100
+
101
+ if tokens.find { |t| top100.include?(t) }
102
+ bot.twitter.favorite(tweet[:id])
103
+ end
104
+ ```
105
+
106
+ ## Other notes
107
+
108
+ If you're using Heroku, which has no persistent filesystem, automating the process of archiving, consuming and updating can be tricky. My current solution is just a daily cron job which commits and pushes for me, which is pretty hacky.
data/bin/ebooks CHANGED
@@ -38,10 +38,9 @@ module Ebooks
38
38
  shortname = filename.split('.')[0..-2].join('.')
39
39
  hash = Digest::MD5.hexdigest(File.read(path))
40
40
 
41
- log "Consuming text corpus: #{filename}"
42
41
  outpath = File.join(APP_PATH, 'model', "#{shortname}.model")
43
42
  Model.consume(path).save(outpath)
44
- log "Corpus consumed"
43
+ log "Corpus consumed to #{outpath}"
45
44
  end
46
45
  end
47
46
 
@@ -73,28 +72,30 @@ module Ebooks
73
72
  bot.tweet(statement)
74
73
  end
75
74
 
76
- def self.jsonify(old_path, new_path)
77
- name = File.basename(old_path).split('.')[0]
78
- new_path ||= name + ".json"
79
-
80
- tweets = []
81
- id = nil
82
- File.read(old_path).split("\n").each do |l|
83
- if l.start_with?('# ')
84
- id = l.split('# ')[-1]
85
- else
86
- tweet = { text: l }
87
- if id
88
- tweet[:id] = id
89
- id = nil
75
+ def self.jsonify(paths)
76
+ paths.each do |path|
77
+ name = File.basename(path).split('.')[0]
78
+ new_path = name + ".json"
79
+
80
+ tweets = []
81
+ id = nil
82
+ File.read(path).split("\n").each do |l|
83
+ if l.start_with?('# ')
84
+ id = l.split('# ')[-1]
85
+ else
86
+ tweet = { text: l }
87
+ if id
88
+ tweet[:id] = id
89
+ id = nil
90
+ end
91
+ tweets << tweet
90
92
  end
91
- tweets << tweet
92
93
  end
93
- end
94
94
 
95
- File.open(new_path, 'w') do |f|
96
- log "Writing #{tweets.length} tweets to #{new_path}"
97
- f.write(JSON.pretty_generate(tweets))
95
+ File.open(new_path, 'w') do |f|
96
+ log "Writing #{tweets.length} tweets to #{new_path}"
97
+ f.write(JSON.pretty_generate(tweets))
98
+ end
98
99
  end
99
100
  end
100
101
 
@@ -106,7 +107,7 @@ module Ebooks
106
107
  ebooks score <model_path> <input>
107
108
  ebooks archive <@user> <outpath>
108
109
  ebooks tweet <model_path> <@bot>
109
- ebooks jsonify <old_corpus_path> [new_corpus_path]
110
+ ebooks jsonify <old_corpus_path> [...]
110
111
  """
111
112
 
112
113
  if args.length == 0
@@ -121,7 +122,7 @@ module Ebooks
121
122
  when "score" then score(args[1], args[2..-1].join(' '))
122
123
  when "archive" then archive(args[1], args[2])
123
124
  when "tweet" then tweet(args[1], args[2])
124
- when "jsonify" then jsonify(args[1], args[2])
125
+ when "jsonify" then jsonify(args[1..-1])
125
126
  end
126
127
  end
127
128
  end
@@ -17,14 +17,22 @@ module Ebooks
17
17
  Marshal.load(File.read(path))
18
18
  end
19
19
 
20
- def consume(txtpath)
21
- # Record hash of source file so we know to update later
22
- @hash = Digest::MD5.hexdigest(File.read(txtpath))
20
+ def consume(path)
21
+ content = File.read(path)
22
+ @hash = Digest::MD5.hexdigest(content)
23
+
24
+ if path.split('.')[-1] == "json"
25
+ log "Reading json corpus from #{path}"
26
+ lines = JSON.parse(content, symbolize_names: true).map do |tweet|
27
+ tweet[:text]
28
+ end
29
+ else
30
+ log "Reading plaintext corpus from #{path}"
31
+ lines = content.split("\n")
32
+ end
23
33
 
24
- text = File.read(txtpath)
25
34
  log "Removing commented lines and sorting mentions"
26
35
 
27
- lines = text.split("\n")
28
36
  keeping = []
29
37
  mentions = []
30
38
  lines.each do |l|
@@ -1,3 +1,3 @@
1
1
  module Ebooks
2
- VERSION = "2.1.3"
2
+ VERSION = "2.1.4"
3
3
  end
data/skeleton/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'http://rubygems.org'
2
+ ruby '1.9.3'
3
+
4
+ gem 'twitter_ebooks'
@@ -0,0 +1,70 @@
1
+ GEM
2
+ remote: http://rubygems.org/
3
+ specs:
4
+ addressable (2.3.5)
5
+ atomic (1.1.14)
6
+ awesome_print (1.2.0)
7
+ cookiejar (0.3.0)
8
+ daemons (1.1.9)
9
+ em-http-request (1.0.3)
10
+ addressable (>= 2.2.3)
11
+ cookiejar
12
+ em-socksify
13
+ eventmachine (>= 1.0.0.beta.4)
14
+ http_parser.rb (>= 0.5.3)
15
+ em-socksify (0.3.0)
16
+ eventmachine (>= 1.0.0.beta.4)
17
+ em-twitter (0.2.2)
18
+ eventmachine (~> 1.0)
19
+ http_parser.rb (~> 0.5)
20
+ simple_oauth (~> 0.1)
21
+ engtagger (0.1.2)
22
+ eventmachine (1.0.3)
23
+ faraday (0.8.8)
24
+ multipart-post (~> 1.2.0)
25
+ fast-stemmer (1.0.2)
26
+ gingerice (1.2.1)
27
+ addressable
28
+ awesome_print
29
+ highscore (1.1.0)
30
+ whatlanguage (>= 1.0.0)
31
+ htmlentities (4.3.1)
32
+ http_parser.rb (0.5.3)
33
+ minitest (5.0.8)
34
+ multi_json (1.8.2)
35
+ multipart-post (1.2.0)
36
+ rufus-scheduler (3.0.2)
37
+ tzinfo
38
+ simple_oauth (0.2.0)
39
+ thread_safe (0.1.3)
40
+ atomic
41
+ tweetstream (2.5.0)
42
+ daemons (~> 1.1)
43
+ em-http-request (~> 1.0.2)
44
+ em-twitter (~> 0.2)
45
+ twitter (~> 4.5)
46
+ yajl-ruby (~> 1.1)
47
+ twitter (4.8.1)
48
+ faraday (~> 0.8, < 0.10)
49
+ multi_json (~> 1.0)
50
+ simple_oauth (~> 0.2)
51
+ twitter_ebooks (2.1.2)
52
+ engtagger
53
+ fast-stemmer
54
+ gingerice
55
+ highscore
56
+ htmlentities
57
+ minitest
58
+ rufus-scheduler
59
+ tweetstream (= 2.5)
60
+ twitter (~> 4.5)
61
+ tzinfo (1.1.0)
62
+ thread_safe (~> 0.1)
63
+ whatlanguage (1.0.5)
64
+ yajl-ruby (1.1.0)
65
+
66
+ PLATFORMS
67
+ ruby
68
+
69
+ DEPENDENCIES
70
+ twitter_ebooks
File without changes
File without changes
data/skeleton/run.rb CHANGED
File without changes
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: twitter_ebooks
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.3
4
+ version: 2.1.4
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-11-24 00:00:00.000000000 Z
12
+ date: 2013-11-27 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: minitest
@@ -183,8 +183,12 @@ files:
183
183
  - lib/twitter_ebooks/version.rb
184
184
  - script/process_anc_data.rb
185
185
  - skeleton/.gitignore
186
+ - skeleton/Gemfile
187
+ - skeleton/Gemfile.lock
186
188
  - skeleton/Procfile
187
189
  - skeleton/bots.rb
190
+ - skeleton/corpus/.gitignore
191
+ - skeleton/model/.gitignore
188
192
  - skeleton/run.rb
189
193
  - test/corpus/0xabad1dea.tweets
190
194
  - test/keywords.rb