tweetabout 0.0.5.0 → 0.0.6.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +46 -79
- data/Rakefile +10 -2
- data/lib/tweetabout.rb +4 -3
- data/lib/tweetabout/tweet_words.rb +47 -0
- data/lib/tweetabout/twit_api.rb +41 -0
- data/lib/tweetabout/version.rb +1 -1
- data/lib/tweetabout/words_hash.rb +27 -0
- data/test/test_tweetabout.rb +60 -0
- data/test/test_wordhash.rb +27 -0
- data/tweetabout.gemspec +2 -1
- metadata +29 -6
- data/lib/tweetabout/sorted_tweets.rb +0 -84
data/README.md
CHANGED
@@ -1,12 +1,10 @@
|
|
1
|
-
##
|
1
|
+
## Tweet About Gem
|
2
2
|
|
3
|
-
|
3
|
+
Takes a twitter username and returns a list of words. Can also return a Hash with the count for each word. Options to not include retweets, to set the number of tweets to be processed, and to return all words, without removing the useless ones. See `Options Hash`.
|
4
4
|
|
5
|
-
The words are are ordered list of the most frequently tweeted words
|
6
|
-
* Retweets are included
|
5
|
+
The words are are ordered list of the most frequently tweeted words.
|
7
6
|
* The casing of words doesn't matter (the = The = THE). Output is downcased.
|
8
|
-
* URLs are removed
|
9
|
-
|
7
|
+
* URLs are removed.
|
10
8
|
|
11
9
|
## Installation
|
12
10
|
|
@@ -14,92 +12,61 @@ The words are are ordered list of the most frequently tweeted words based on the
|
|
14
12
|
|
15
13
|
### Gemfile
|
16
14
|
|
17
|
-
gem
|
15
|
+
gem 'tweetabout'
|
18
16
|
|
19
17
|
### Dependencies
|
20
18
|
|
21
|
-
`httparty`
|
19
|
+
`httparty`
|
22
20
|
|
23
21
|
## Useage
|
24
|
-
|
25
|
-
example: Let's use `@jack`:
|
22
|
+
example: For `@jack`:
|
26
23
|
```ruby
|
27
|
-
#args: a single string
|
28
|
-
|
29
|
-
#Invoke the TweetAbout Module:
|
30
|
-
#call the tweetabout method
|
31
|
-
|
32
|
-
@words = TweetAbout.tweetabout("jack")
|
33
|
-
|
34
|
-
#returns an array of words sorted from most frequent to least
|
35
|
-
#returns an empty array if the user doesn't have any tweets or:
|
36
|
-
# 400 (most likely twitter api limit is exceeded) https://dev.twitter.com/docs/rate-limiting
|
37
|
-
# 404 (most likely user doesn't exist, or twitter's api is down)
|
38
|
-
|
24
|
+
#args: a single string, optional options hash
|
39
25
|
```
|
40
26
|
|
41
|
-
##
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
##Speed
|
67
|
-
All these measurments are for processing the maximum of 1,000 tweets. If the user has less than 1,000 tweets, obviously these processes will be faster.
|
68
|
-
|
69
|
-
To see speeds yourself, checkout the speed branch and watch the server output.
|
70
|
-
|
71
|
-
### get_tweets method
|
72
|
-
This method does 5 GET requests to the Twitter api and stores them all in `@responses` These are the different time measurements:
|
73
|
-
|
74
|
-
4985.406 ms
|
75
|
-
3566.376
|
76
|
-
4071.794
|
77
|
-
7759.329
|
78
|
-
3656.680
|
79
|
-
4510.602
|
27
|
+
##Returns
|
28
|
+
```ruby
|
29
|
+
#default
|
30
|
+
{:status => :ok, :words => ["fisbee", "golf",...]}
|
31
|
+
```
|
32
|
+
```ruby
|
33
|
+
# {:with_count => true}
|
34
|
+
{:status => :ok, :words => {"fisbee" => 5,
|
35
|
+
"golf" => 3, ...} }`
|
36
|
+
```
|
37
|
+
```ruby
|
38
|
+
#errors
|
39
|
+
{:status => :api_limit_exceeded }
|
40
|
+
{:status => :invalid_username }
|
41
|
+
{:status => :no_tweets }
|
42
|
+
```
|
43
|
+
##Options Hash:
|
44
|
+
```ruby
|
45
|
+
#TweetAbout::TweetWords.new("@username").sort_words(:options => {})
|
46
|
+
tweets: n (default 200) #the number of tweets to process
|
47
|
+
include_rts: t/f (default true) #include re-tweets by default
|
48
|
+
with_count: t/f (default false) #if true, returns a Hash in the form of {word: n}, otherwise an Array of words is returned
|
49
|
+
all_words: t/f (default false) #see `Junk Words`. If set to true, no words are left out
|
50
|
+
```
|
80
51
|
|
81
|
-
|
82
|
-
|
83
|
-
|
52
|
+
#Examples
|
53
|
+
```ruby
|
54
|
+
#returns an Array of words sorted by most frequent.
|
55
|
+
TweetAbout::TweetWords.new("jack").sort
|
84
56
|
|
85
|
-
|
86
|
-
|
87
|
-
78.989
|
88
|
-
87.12
|
89
|
-
137.528
|
90
|
-
134.256
|
57
|
+
#returns a Hash of words with each count sorted by most frequent.
|
58
|
+
TweetAbout::TweetWords.new("jack").sort(with_count: true)
|
91
59
|
|
92
|
-
|
60
|
+
#returns an Array of words sorted by most frequent for last 1,000 tweets
|
61
|
+
TweetAbout::TweetWords.new("jack").sort(tweets: 1000)
|
62
|
+
```
|
93
63
|
|
94
|
-
|
64
|
+
##Junk Words
|
65
|
+
These words are removed. If you don't want them removed, pass the option `all_words => true`
|
95
66
|
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
4.016
|
100
|
-
4.098
|
101
|
-
5.509
|
102
|
-
3.994
|
67
|
+
articles: "the", "a", "an"
|
68
|
+
pronouns: "he", "him", "her", "she", "i", "you", "they", "them", "it"
|
69
|
+
other_junk: "for", "from", "not", "but"
|
103
70
|
|
104
71
|
|
105
72
|
|
data/Rakefile
CHANGED
data/lib/tweetabout.rb
CHANGED
@@ -0,0 +1,47 @@
|
|
1
|
+
module TweetAbout
|
2
|
+
|
3
|
+
class TweetWords
|
4
|
+
attr_accessor :status
|
5
|
+
attr_accessor :words
|
6
|
+
|
7
|
+
def initialize(username)
|
8
|
+
@username = username
|
9
|
+
end
|
10
|
+
|
11
|
+
def sort_words(options={})
|
12
|
+
to_i = options[:tweets].to_i
|
13
|
+
tweets_needed = to_i > 0? to_i : 200
|
14
|
+
options[:tweets] = tweets_needed
|
15
|
+
get_words
|
16
|
+
if self.status == :ok
|
17
|
+
words = self.words
|
18
|
+
sorted_words = words.sort
|
19
|
+
sorted_words = options[:with_count]? sorted_words : sorted_words.keys
|
20
|
+
self.words = sorted_words
|
21
|
+
end
|
22
|
+
self
|
23
|
+
end
|
24
|
+
|
25
|
+
private
|
26
|
+
|
27
|
+
def get_words(options={})
|
28
|
+
api_options = { :query => {:screen_name => @username, :include_rts => options[:include_rts]} }
|
29
|
+
|
30
|
+
twit_api = TwitApi.new(options[:tweets], api_options)
|
31
|
+
|
32
|
+
if tweets = twit_api.tweets
|
33
|
+
words = WordsHash.new(0)
|
34
|
+
|
35
|
+
tweets.each do |tweet|
|
36
|
+
tweet.split.each do |key|
|
37
|
+
words[key.gsub(/\W/, "").strip.downcase] += 1 unless key.start_with?('http') || key.empty?
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
words = words.sanitize unless options[:all_words]
|
42
|
+
self.words = words
|
43
|
+
end
|
44
|
+
self.status = twit_api.status
|
45
|
+
end
|
46
|
+
end
|
47
|
+
end
|
@@ -0,0 +1,41 @@
|
|
1
|
+
module TweetAbout
|
2
|
+
class TwitApi
|
3
|
+
include HTTParty
|
4
|
+
|
5
|
+
attr_accessor :status
|
6
|
+
attr_accessor :tweets
|
7
|
+
|
8
|
+
BASE_URL = "http://api.twitter.com/1/statuses/user_timeline.json?"
|
9
|
+
TWEETS_NEEDED = 200
|
10
|
+
|
11
|
+
def initialize(tweets_needed, api_options)
|
12
|
+
self.tweets = []
|
13
|
+
tweets_needed ||= TWEETS_NEEDED
|
14
|
+
begin
|
15
|
+
query_amt = tweets_needed <= 200? tweets_needed : 200
|
16
|
+
|
17
|
+
api_options[:query].merge!({:count => query_amt})
|
18
|
+
response = HTTParty.get(BASE_URL, api_options)
|
19
|
+
|
20
|
+
case response.code
|
21
|
+
when 200
|
22
|
+
self.status = :ok
|
23
|
+
when 400
|
24
|
+
self.status = :no_tweets
|
25
|
+
when 404
|
26
|
+
self.status = :invalid_username
|
27
|
+
else
|
28
|
+
self.status = :error
|
29
|
+
end
|
30
|
+
|
31
|
+
tweets_needed -= response.count
|
32
|
+
api_options[:query].merge!({:max_id => response.last["id"] + 1 })
|
33
|
+
response.each do |response|
|
34
|
+
self.tweets << response["text"]
|
35
|
+
end
|
36
|
+
|
37
|
+
end while tweets_needed > 0
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
end
|
data/lib/tweetabout/version.rb
CHANGED
@@ -0,0 +1,27 @@
|
|
1
|
+
module TweetAbout
|
2
|
+
class WordsHash < Hash
|
3
|
+
def sanitize
|
4
|
+
junk_words
|
5
|
+
end
|
6
|
+
|
7
|
+
def sort
|
8
|
+
WordsHash[self.sort_by { |key, frequency| frequency }.reverse]
|
9
|
+
end
|
10
|
+
|
11
|
+
private
|
12
|
+
|
13
|
+
def junk_words
|
14
|
+
articles = ["the", "a", "an"]
|
15
|
+
pronouns = ["we", "he", "him", "his", "her", "she", "i", "you", "they", "them", "it", "your",
|
16
|
+
"our", "us", "my", "this", "their"]
|
17
|
+
other_junk = ["for", "from", "not", "but", "is", "in", "and", "so",
|
18
|
+
"of", "if", "at", "rt", "", "all", "to", "that",
|
19
|
+
"are", "can", "by", "on", "as", "or", "as"]
|
20
|
+
|
21
|
+
all_junk = articles << pronouns << other_junk
|
22
|
+
all_junk.flatten!
|
23
|
+
|
24
|
+
self.delete_if { |key, value| all_junk.include?(key) }
|
25
|
+
end
|
26
|
+
end
|
27
|
+
end
|
@@ -0,0 +1,60 @@
|
|
1
|
+
require 'test/unit'
|
2
|
+
require 'minitest/spec'
|
3
|
+
require 'mocha/setup'
|
4
|
+
require 'tweetabout'
|
5
|
+
|
6
|
+
tw = TweetAbout::TweetWords.new("jack")
|
7
|
+
|
8
|
+
describe "the defaults" do
|
9
|
+
response = tw.sort_words(:tweets => 5)
|
10
|
+
|
11
|
+
it "should be an instance of TweetWords" do
|
12
|
+
assert_equal TweetAbout::TweetWords, tw.class
|
13
|
+
end
|
14
|
+
|
15
|
+
it "should be successful and should be an Array" do
|
16
|
+
assert_equal :ok, response.status
|
17
|
+
assert_equal Array, response.words.class
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
describe "the {:with_count => true} option" do
|
22
|
+
response = tw.sort_words(:with_count => true, :tweets => 5)
|
23
|
+
words = response.words
|
24
|
+
|
25
|
+
it "should be successful and should be a Hash" do
|
26
|
+
assert_equal :ok, response.status
|
27
|
+
assert_equal TweetAbout::WordsHash, words.class
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
describe ":with_count option" do
|
32
|
+
it "should handle with count true" do
|
33
|
+
response = tw.sort_words(:with_count => true, :tweets => 5)
|
34
|
+
assert_equal :ok, response.status
|
35
|
+
end
|
36
|
+
|
37
|
+
it "should handle with count false" do
|
38
|
+
response = tw.sort_words(:with_count => false, :tweets => 5)
|
39
|
+
assert_equal :ok, response.status
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
describe "a number string to tweets should work" do
|
44
|
+
it "should handle a number string for tweets" do
|
45
|
+
response = tw.sort_words(:tweets => "5")
|
46
|
+
assert_equal :ok, response.status
|
47
|
+
end
|
48
|
+
end
|
49
|
+
|
50
|
+
describe "include_rts options should work" do
|
51
|
+
it "should handle include rts true" do
|
52
|
+
response = tw.sort_words(:include_rts => true)
|
53
|
+
assert_equal :ok, response.status
|
54
|
+
end
|
55
|
+
|
56
|
+
it "should handle include rts true" do
|
57
|
+
response = tw.sort_words(:include_rts => false)
|
58
|
+
assert_equal :ok, response.status
|
59
|
+
end
|
60
|
+
end
|
@@ -0,0 +1,27 @@
|
|
1
|
+
require 'test/unit'
|
2
|
+
require 'minitest/spec'
|
3
|
+
require 'mocha/setup'
|
4
|
+
require 'tweetabout'
|
5
|
+
|
6
|
+
words_hash = TweetAbout::WordsHash[{"of" => 1, "compassion" => 2, "love" => 4, "humanity" => 5,
|
7
|
+
"together" => 4, "free" => 4}]
|
8
|
+
|
9
|
+
describe "#sanitize" do
|
10
|
+
it "should remove a useless word" do
|
11
|
+
words_hash = words_hash.sanitize
|
12
|
+
assert_equal true, !words_hash.has_key?("of")
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
describe "#sort" do
|
17
|
+
it "should sort the words" do
|
18
|
+
words_hash = words_hash.sort
|
19
|
+
values = words_hash.values
|
20
|
+
trues = []
|
21
|
+
values.each_index do |i|
|
22
|
+
val = values[i] >= values[i.next] if values[i.next]
|
23
|
+
trues << val if val
|
24
|
+
end
|
25
|
+
assert_equal false, trues.include?(false)
|
26
|
+
end
|
27
|
+
end
|
data/tweetabout.gemspec
CHANGED
@@ -5,10 +5,11 @@ Gem::Specification.new do |gem|
|
|
5
5
|
gem.authors = ["Dylan Jhaveri"]
|
6
6
|
gem.email = ["dylanjhaveri@gmail.com"]
|
7
7
|
gem.description = %q{Returns a list of frequently tweeted words}
|
8
|
-
gem.summary = %q{Takes a twitter username and
|
8
|
+
gem.summary = %q{Takes a twitter username and returns a list of words. Can also return a Hash with the count for each word. Has options to not include retweets and set the number of tweets to process}
|
9
9
|
gem.homepage = ""
|
10
10
|
|
11
11
|
gem.add_dependency "httparty"
|
12
|
+
gem.add_development_dependency('mocha', "~> 0.9.9")
|
12
13
|
|
13
14
|
gem.files = `git ls-files`.split($\)
|
14
15
|
gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: tweetabout
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.6.0
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date:
|
12
|
+
date: 2013-01-11 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: httparty
|
@@ -27,6 +27,22 @@ dependencies:
|
|
27
27
|
- - ! '>='
|
28
28
|
- !ruby/object:Gem::Version
|
29
29
|
version: '0'
|
30
|
+
- !ruby/object:Gem::Dependency
|
31
|
+
name: mocha
|
32
|
+
requirement: !ruby/object:Gem::Requirement
|
33
|
+
none: false
|
34
|
+
requirements:
|
35
|
+
- - ~>
|
36
|
+
- !ruby/object:Gem::Version
|
37
|
+
version: 0.9.9
|
38
|
+
type: :development
|
39
|
+
prerelease: false
|
40
|
+
version_requirements: !ruby/object:Gem::Requirement
|
41
|
+
none: false
|
42
|
+
requirements:
|
43
|
+
- - ~>
|
44
|
+
- !ruby/object:Gem::Version
|
45
|
+
version: 0.9.9
|
30
46
|
description: Returns a list of frequently tweeted words
|
31
47
|
email:
|
32
48
|
- dylanjhaveri@gmail.com
|
@@ -40,8 +56,12 @@ files:
|
|
40
56
|
- README.md
|
41
57
|
- Rakefile
|
42
58
|
- lib/tweetabout.rb
|
43
|
-
- lib/tweetabout/
|
59
|
+
- lib/tweetabout/tweet_words.rb
|
60
|
+
- lib/tweetabout/twit_api.rb
|
44
61
|
- lib/tweetabout/version.rb
|
62
|
+
- lib/tweetabout/words_hash.rb
|
63
|
+
- test/test_tweetabout.rb
|
64
|
+
- test/test_wordhash.rb
|
45
65
|
- tweetabout.gemspec
|
46
66
|
homepage: ''
|
47
67
|
licenses: []
|
@@ -66,6 +86,9 @@ rubyforge_project:
|
|
66
86
|
rubygems_version: 1.8.24
|
67
87
|
signing_key:
|
68
88
|
specification_version: 3
|
69
|
-
summary: Takes a twitter username and
|
70
|
-
|
71
|
-
|
89
|
+
summary: Takes a twitter username and returns a list of words. Can also return a
|
90
|
+
Hash with the count for each word. Has options to not include retweets and set
|
91
|
+
the number of tweets to process
|
92
|
+
test_files:
|
93
|
+
- test/test_tweetabout.rb
|
94
|
+
- test/test_wordhash.rb
|
@@ -1,84 +0,0 @@
|
|
1
|
-
module TweetAbout
|
2
|
-
|
3
|
-
class SortedTweets
|
4
|
-
include HTTParty
|
5
|
-
|
6
|
-
def initialize(username)
|
7
|
-
@username = username
|
8
|
-
end
|
9
|
-
|
10
|
-
def sort_tweets
|
11
|
-
get_tweets
|
12
|
-
hash = {}
|
13
|
-
|
14
|
-
@responses.each do |tweet|
|
15
|
-
tweet.split(" ").each do |key|
|
16
|
-
key = key.gsub(/\W/, "").downcase
|
17
|
-
if hash.has_key?(key)
|
18
|
-
hash["#{key}"] += 1
|
19
|
-
else
|
20
|
-
hash.merge!({"#{key}" => 1}) unless bad_key(key)
|
21
|
-
end
|
22
|
-
end
|
23
|
-
end
|
24
|
-
sorted_array = hash.sort_by { |keyword, frequency| frequency }.reverse
|
25
|
-
sorted_array.map { |keyword| keyword[0] }
|
26
|
-
end
|
27
|
-
|
28
|
-
def get_tweets
|
29
|
-
options = { :query => {:screen_name => @username, :include_rts => true, :count => 200} }
|
30
|
-
|
31
|
-
base_url = "http://api.twitter.com/1/statuses/user_timeline.json?"
|
32
|
-
|
33
|
-
@responses = []
|
34
|
-
response1 = HTTParty.get("#{base_url}", options)
|
35
|
-
return if response1.code == 404 #invalid username/bad url
|
36
|
-
return if response1.count == 0 #no tweets!
|
37
|
-
return if response1.code == 400 #api limt exceeded
|
38
|
-
start_at_1 = response1.last["id"]
|
39
|
-
response1.each do |object|
|
40
|
-
@responses << object["text"]
|
41
|
-
end
|
42
|
-
|
43
|
-
response2 = HTTParty.get("#{base_url}&max_id=#{start_at_1-1}", options)
|
44
|
-
return if response2.count == 0
|
45
|
-
return if response2.code == 400
|
46
|
-
start_at_2 = response2.last["id"]
|
47
|
-
response2.each do |object|
|
48
|
-
@responses << object["text"]
|
49
|
-
end
|
50
|
-
|
51
|
-
response3 = HTTParty.get("#{base_url}&max_id=#{start_at_2-1}", options)
|
52
|
-
return if response3.count == 0
|
53
|
-
return if response3.code == 400
|
54
|
-
start_at_3 = response3.last["id"]
|
55
|
-
response3.each do |object|
|
56
|
-
@responses << object["text"]
|
57
|
-
end
|
58
|
-
|
59
|
-
response4 = HTTParty.get("#{base_url}&max_id=#{start_at_3-1}", options)
|
60
|
-
return if response4.count == 0
|
61
|
-
return if response4.code == 400
|
62
|
-
start_at_4 = response4.last["id"]
|
63
|
-
response4.each do |object|
|
64
|
-
@responses << object["text"]
|
65
|
-
end
|
66
|
-
|
67
|
-
response5 = HTTParty.get("#{base_url}&max_id=#{start_at_4-1}", options)
|
68
|
-
return if response5.count == 0
|
69
|
-
return if response5.code == 400
|
70
|
-
start_at_5 = response4.last["id"]
|
71
|
-
response5.each do |object|
|
72
|
-
@responses << object["text"]
|
73
|
-
end
|
74
|
-
end
|
75
|
-
|
76
|
-
def bad_key(key)
|
77
|
-
return true if key.empty?
|
78
|
-
return true if key.start_with?('http')
|
79
|
-
end
|
80
|
-
|
81
|
-
def to_a
|
82
|
-
end
|
83
|
-
end
|
84
|
-
end
|