tweetabout 0.0.5.0 → 0.0.6.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -1,12 +1,10 @@
1
- ## Tweets About Gem
1
+ ## Tweet About Gem
2
2
 
3
- This gem takes a twitter username and returns a list of words.
3
+ Takes a twitter username and returns a list of words. Can also return a Hash with the count for each word. Options to not include retweets, to set the number of tweets to be processed, and to return all words, without removing the useless ones. See `Options Hash`.
4
4
 
5
- The words are are ordered list of the most frequently tweeted words based on the users last 1,000 tweets.
6
- * Retweets are included
5
+ The words are are ordered list of the most frequently tweeted words.
7
6
  * The casing of words doesn't matter (the = The = THE). Output is downcased.
8
- * URLs are removed, because that makes sense.
9
-
7
+ * URLs are removed.
10
8
 
11
9
  ## Installation
12
10
 
@@ -14,92 +12,61 @@ The words are are ordered list of the most frequently tweeted words based on the
14
12
 
15
13
  ### Gemfile
16
14
 
17
- gem "tweetabout", "~> 0.0.4.1"
15
+ gem 'tweetabout'
18
16
 
19
17
  ### Dependencies
20
18
 
21
- `httparty` because httparty > NET::HTTP
19
+ `httparty`
22
20
 
23
21
  ## Useage
24
-
25
- example: Let's use `@jack`:
22
+ example: For `@jack`:
26
23
  ```ruby
27
- #args: a single string
28
-
29
- #Invoke the TweetAbout Module:
30
- #call the tweetabout method
31
-
32
- @words = TweetAbout.tweetabout("jack")
33
-
34
- #returns an array of words sorted from most frequent to least
35
- #returns an empty array if the user doesn't have any tweets or:
36
- # 400 (most likely twitter api limit is exceeded) https://dev.twitter.com/docs/rate-limiting
37
- # 404 (most likely user doesn't exist, or twitter's api is down)
38
-
24
+ #args: a single string, optional options hash
39
25
  ```
40
26
 
41
- ## TODO
42
-
43
- * Test coverage. Actually this a requirement before using this gem in production. And I know I am a bad Rubyist for not writing tests first but I promise to get better.
44
- * Make this Gem useful. Right now it returns a list of all words, it might be interesting to strip out articles and pronouns from the list, that might actually give you an interesting insight into a person's interests.
45
- * More accurate response messages. Currently empty arrays get returned for three reasons:
46
- 1. Twitter api is down (404)
47
- 2. User does not exist (404)
48
- 3. Client has exceeded API limit (400). This happens when > 150 requests are made within an hour.
49
-
50
- ##Twitter API
51
-
52
- The Twitter API imposes a restriction on requests for users' timelines. Each request can only receive a maximum of 200 tweets. To get 1,000 tweets, that means we have to make 5 round trips to the api server. Let's see how longs these requests take. This is the measured response time of the GET request for 5 different twitter usernames:
53
-
54
- ex: GET `http://api.twitter.com/1/statuses/user_timeline.json?screen_name=#{user}&include_rts=true&count=200`
55
- (time in ms)
56
-
57
- trial1 trial2 trial3 trial4 trial5
58
- request1 806.521 740.214 1363.33 490.090 331.253
59
- request2 720.767 537.374 673.249 532.168 478.668
60
- request3 492.608 733.955 560.918 547.887 380.81
61
- request4 480.757 945.29 645.733 605.972 340.256
62
- request5 575.621 469.731 707.737 826.423 169.244
63
-
64
- Based on this small test we can see that response times from the api vary from a few hundred milliseconds up to a full second. Of course this is all influnced by time of day, network connection, and a variety of factors but it's good to know that if we have to make 5 trips in a row to the twitter api server we can't really count on it being very fast. In fact, 5 requests could easily take 2.5 to 3 seconds to complete. No doubt this is the slowest part of this gem.
65
-
66
- ##Speed
67
- All these measurments are for processing the maximum of 1,000 tweets. If the user has less than 1,000 tweets, obviously these processes will be faster.
68
-
69
- To see speeds yourself, checkout the speed branch and watch the server output.
70
-
71
- ### get_tweets method
72
- This method does 5 GET requests to the Twitter api and stores them all in `@responses` These are the different time measurements:
73
-
74
- 4985.406 ms
75
- 3566.376
76
- 4071.794
77
- 7759.329
78
- 3656.680
79
- 4510.602
27
+ ##Returns
28
+ ```ruby
29
+ #default
30
+ {:status => :ok, :words => ["fisbee", "golf",...]}
31
+ ```
32
+ ```ruby
33
+ # {:with_count => true}
34
+ {:status => :ok, :words => {"fisbee" => 5,
35
+ "golf" => 3, ...} }`
36
+ ```
37
+ ```ruby
38
+ #errors
39
+ {:status => :api_limit_exceeded }
40
+ {:status => :invalid_username }
41
+ {:status => :no_tweets }
42
+ ```
43
+ ##Options Hash:
44
+ ```ruby
45
+ #TweetAbout::TweetWords.new("@username").sort_words(:options => {})
46
+ tweets: n (default 200) #the number of tweets to process
47
+ include_rts: t/f (default true) #include re-tweets by default
48
+ with_count: t/f (default false) #if true, returns a Hash in the form of {word: n}, otherwise an Array of words is returned
49
+ all_words: t/f (default false) #see `Junk Words`. If set to true, no words are left out
50
+ ```
80
51
 
81
- ### Processing responses (tweets)
82
- `@responses.each do |tweet|` block
83
- This method essentially takes the @responses variable, which is all the tweets, splits the words apart, removes punctuation and creates a hash of keys and values, keys are words, values are the number of times that word has shown up. (the `bad_key` method below is part of this block.
52
+ #Examples
53
+ ```ruby
54
+ #returns an Array of words sorted by most frequent.
55
+ TweetAbout::TweetWords.new("jack").sort
84
56
 
85
- 107.844 ms
86
- 117.764
87
- 78.989
88
- 87.12
89
- 137.528
90
- 134.256
57
+ #returns a Hash of words with each count sorted by most frequent.
58
+ TweetAbout::TweetWords.new("jack").sort(with_count: true)
91
59
 
92
- #### bad_key method:
60
+ #returns an Array of words sorted by most frequent for last 1,000 tweets
61
+ TweetAbout::TweetWords.new("jack").sort(tweets: 1000)
62
+ ```
93
63
 
94
- .006ms - .013ms each word
64
+ ##Junk Words
65
+ These words are removed. If you don't want them removed, pass the option `all_words => true`
95
66
 
96
- #### Sorting
97
-
98
- 4.245 ms
99
- 4.016
100
- 4.098
101
- 5.509
102
- 3.994
67
+ articles: "the", "a", "an"
68
+ pronouns: "he", "him", "her", "she", "i", "you", "they", "them", "it"
69
+ other_junk: "for", "from", "not", "but"
103
70
 
104
71
 
105
72
 
data/Rakefile CHANGED
@@ -1,2 +1,10 @@
1
- #!/usr/bin/env rake
2
- require "bundler/gem_tasks"
1
+ # !/usr/bin/env rake
2
+ # require "bundler/gem_tasks"
3
+ require 'rake/testtask'
4
+
5
+ Rake::TestTask.new do |t|
6
+ t.libs << 'test'
7
+ end
8
+
9
+ desc "Run tests"
10
+ task :default => :test
@@ -2,6 +2,7 @@ require "tweetabout/version"
2
2
  require "httparty"
3
3
 
4
4
  module TweetAbout
5
- require 'tweetabout/sorted_tweets'
6
-
7
- end
5
+ require_relative 'tweetabout/tweet_words'
6
+ require_relative 'tweetabout/twit_api'
7
+ require_relative 'tweetabout/words_hash'
8
+ end
@@ -0,0 +1,47 @@
1
+ module TweetAbout
2
+
3
+ class TweetWords
4
+ attr_accessor :status
5
+ attr_accessor :words
6
+
7
+ def initialize(username)
8
+ @username = username
9
+ end
10
+
11
+ def sort_words(options={})
12
+ to_i = options[:tweets].to_i
13
+ tweets_needed = to_i > 0? to_i : 200
14
+ options[:tweets] = tweets_needed
15
+ get_words
16
+ if self.status == :ok
17
+ words = self.words
18
+ sorted_words = words.sort
19
+ sorted_words = options[:with_count]? sorted_words : sorted_words.keys
20
+ self.words = sorted_words
21
+ end
22
+ self
23
+ end
24
+
25
+ private
26
+
27
+ def get_words(options={})
28
+ api_options = { :query => {:screen_name => @username, :include_rts => options[:include_rts]} }
29
+
30
+ twit_api = TwitApi.new(options[:tweets], api_options)
31
+
32
+ if tweets = twit_api.tweets
33
+ words = WordsHash.new(0)
34
+
35
+ tweets.each do |tweet|
36
+ tweet.split.each do |key|
37
+ words[key.gsub(/\W/, "").strip.downcase] += 1 unless key.start_with?('http') || key.empty?
38
+ end
39
+ end
40
+
41
+ words = words.sanitize unless options[:all_words]
42
+ self.words = words
43
+ end
44
+ self.status = twit_api.status
45
+ end
46
+ end
47
+ end
@@ -0,0 +1,41 @@
1
+ module TweetAbout
2
+ class TwitApi
3
+ include HTTParty
4
+
5
+ attr_accessor :status
6
+ attr_accessor :tweets
7
+
8
+ BASE_URL = "http://api.twitter.com/1/statuses/user_timeline.json?"
9
+ TWEETS_NEEDED = 200
10
+
11
+ def initialize(tweets_needed, api_options)
12
+ self.tweets = []
13
+ tweets_needed ||= TWEETS_NEEDED
14
+ begin
15
+ query_amt = tweets_needed <= 200? tweets_needed : 200
16
+
17
+ api_options[:query].merge!({:count => query_amt})
18
+ response = HTTParty.get(BASE_URL, api_options)
19
+
20
+ case response.code
21
+ when 200
22
+ self.status = :ok
23
+ when 400
24
+ self.status = :no_tweets
25
+ when 404
26
+ self.status = :invalid_username
27
+ else
28
+ self.status = :error
29
+ end
30
+
31
+ tweets_needed -= response.count
32
+ api_options[:query].merge!({:max_id => response.last["id"] + 1 })
33
+ response.each do |response|
34
+ self.tweets << response["text"]
35
+ end
36
+
37
+ end while tweets_needed > 0
38
+ end
39
+ end
40
+
41
+ end
@@ -1,3 +1,3 @@
1
1
  module Tweetabout
2
- VERSION = "0.0.5.0"
2
+ VERSION = "0.0.6.0"
3
3
  end
@@ -0,0 +1,27 @@
1
+ module TweetAbout
2
+ class WordsHash < Hash
3
+ def sanitize
4
+ junk_words
5
+ end
6
+
7
+ def sort
8
+ WordsHash[self.sort_by { |key, frequency| frequency }.reverse]
9
+ end
10
+
11
+ private
12
+
13
+ def junk_words
14
+ articles = ["the", "a", "an"]
15
+ pronouns = ["we", "he", "him", "his", "her", "she", "i", "you", "they", "them", "it", "your",
16
+ "our", "us", "my", "this", "their"]
17
+ other_junk = ["for", "from", "not", "but", "is", "in", "and", "so",
18
+ "of", "if", "at", "rt", "", "all", "to", "that",
19
+ "are", "can", "by", "on", "as", "or", "as"]
20
+
21
+ all_junk = articles << pronouns << other_junk
22
+ all_junk.flatten!
23
+
24
+ self.delete_if { |key, value| all_junk.include?(key) }
25
+ end
26
+ end
27
+ end
@@ -0,0 +1,60 @@
1
+ require 'test/unit'
2
+ require 'minitest/spec'
3
+ require 'mocha/setup'
4
+ require 'tweetabout'
5
+
6
+ tw = TweetAbout::TweetWords.new("jack")
7
+
8
+ describe "the defaults" do
9
+ response = tw.sort_words(:tweets => 5)
10
+
11
+ it "should be an instance of TweetWords" do
12
+ assert_equal TweetAbout::TweetWords, tw.class
13
+ end
14
+
15
+ it "should be successful and should be an Array" do
16
+ assert_equal :ok, response.status
17
+ assert_equal Array, response.words.class
18
+ end
19
+ end
20
+
21
+ describe "the {:with_count => true} option" do
22
+ response = tw.sort_words(:with_count => true, :tweets => 5)
23
+ words = response.words
24
+
25
+ it "should be successful and should be a Hash" do
26
+ assert_equal :ok, response.status
27
+ assert_equal TweetAbout::WordsHash, words.class
28
+ end
29
+ end
30
+
31
+ describe ":with_count option" do
32
+ it "should handle with count true" do
33
+ response = tw.sort_words(:with_count => true, :tweets => 5)
34
+ assert_equal :ok, response.status
35
+ end
36
+
37
+ it "should handle with count false" do
38
+ response = tw.sort_words(:with_count => false, :tweets => 5)
39
+ assert_equal :ok, response.status
40
+ end
41
+ end
42
+
43
+ describe "a number string to tweets should work" do
44
+ it "should handle a number string for tweets" do
45
+ response = tw.sort_words(:tweets => "5")
46
+ assert_equal :ok, response.status
47
+ end
48
+ end
49
+
50
+ describe "include_rts options should work" do
51
+ it "should handle include rts true" do
52
+ response = tw.sort_words(:include_rts => true)
53
+ assert_equal :ok, response.status
54
+ end
55
+
56
+ it "should handle include rts true" do
57
+ response = tw.sort_words(:include_rts => false)
58
+ assert_equal :ok, response.status
59
+ end
60
+ end
@@ -0,0 +1,27 @@
1
+ require 'test/unit'
2
+ require 'minitest/spec'
3
+ require 'mocha/setup'
4
+ require 'tweetabout'
5
+
6
+ words_hash = TweetAbout::WordsHash[{"of" => 1, "compassion" => 2, "love" => 4, "humanity" => 5,
7
+ "together" => 4, "free" => 4}]
8
+
9
+ describe "#sanitize" do
10
+ it "should remove a useless word" do
11
+ words_hash = words_hash.sanitize
12
+ assert_equal true, !words_hash.has_key?("of")
13
+ end
14
+ end
15
+
16
+ describe "#sort" do
17
+ it "should sort the words" do
18
+ words_hash = words_hash.sort
19
+ values = words_hash.values
20
+ trues = []
21
+ values.each_index do |i|
22
+ val = values[i] >= values[i.next] if values[i.next]
23
+ trues << val if val
24
+ end
25
+ assert_equal false, trues.include?(false)
26
+ end
27
+ end
@@ -5,10 +5,11 @@ Gem::Specification.new do |gem|
5
5
  gem.authors = ["Dylan Jhaveri"]
6
6
  gem.email = ["dylanjhaveri@gmail.com"]
7
7
  gem.description = %q{Returns a list of frequently tweeted words}
8
- gem.summary = %q{Takes a twitter username and outputs the most frequently tweeted words in their last 1,000 tweets. Includes Re-tweets and excludes links.}
8
+ gem.summary = %q{Takes a twitter username and returns a list of words. Can also return a Hash with the count for each word. Has options to not include retweets and set the number of tweets to process}
9
9
  gem.homepage = ""
10
10
 
11
11
  gem.add_dependency "httparty"
12
+ gem.add_development_dependency('mocha', "~> 0.9.9")
12
13
 
13
14
  gem.files = `git ls-files`.split($\)
14
15
  gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: tweetabout
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5.0
4
+ version: 0.0.6.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-08-22 00:00:00.000000000 Z
12
+ date: 2013-01-11 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: httparty
@@ -27,6 +27,22 @@ dependencies:
27
27
  - - ! '>='
28
28
  - !ruby/object:Gem::Version
29
29
  version: '0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: mocha
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ~>
36
+ - !ruby/object:Gem::Version
37
+ version: 0.9.9
38
+ type: :development
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ~>
44
+ - !ruby/object:Gem::Version
45
+ version: 0.9.9
30
46
  description: Returns a list of frequently tweeted words
31
47
  email:
32
48
  - dylanjhaveri@gmail.com
@@ -40,8 +56,12 @@ files:
40
56
  - README.md
41
57
  - Rakefile
42
58
  - lib/tweetabout.rb
43
- - lib/tweetabout/sorted_tweets.rb
59
+ - lib/tweetabout/tweet_words.rb
60
+ - lib/tweetabout/twit_api.rb
44
61
  - lib/tweetabout/version.rb
62
+ - lib/tweetabout/words_hash.rb
63
+ - test/test_tweetabout.rb
64
+ - test/test_wordhash.rb
45
65
  - tweetabout.gemspec
46
66
  homepage: ''
47
67
  licenses: []
@@ -66,6 +86,9 @@ rubyforge_project:
66
86
  rubygems_version: 1.8.24
67
87
  signing_key:
68
88
  specification_version: 3
69
- summary: Takes a twitter username and outputs the most frequently tweeted words in
70
- their last 1,000 tweets. Includes Re-tweets and excludes links.
71
- test_files: []
89
+ summary: Takes a twitter username and returns a list of words. Can also return a
90
+ Hash with the count for each word. Has options to not include retweets and set
91
+ the number of tweets to process
92
+ test_files:
93
+ - test/test_tweetabout.rb
94
+ - test/test_wordhash.rb
@@ -1,84 +0,0 @@
1
- module TweetAbout
2
-
3
- class SortedTweets
4
- include HTTParty
5
-
6
- def initialize(username)
7
- @username = username
8
- end
9
-
10
- def sort_tweets
11
- get_tweets
12
- hash = {}
13
-
14
- @responses.each do |tweet|
15
- tweet.split(" ").each do |key|
16
- key = key.gsub(/\W/, "").downcase
17
- if hash.has_key?(key)
18
- hash["#{key}"] += 1
19
- else
20
- hash.merge!({"#{key}" => 1}) unless bad_key(key)
21
- end
22
- end
23
- end
24
- sorted_array = hash.sort_by { |keyword, frequency| frequency }.reverse
25
- sorted_array.map { |keyword| keyword[0] }
26
- end
27
-
28
- def get_tweets
29
- options = { :query => {:screen_name => @username, :include_rts => true, :count => 200} }
30
-
31
- base_url = "http://api.twitter.com/1/statuses/user_timeline.json?"
32
-
33
- @responses = []
34
- response1 = HTTParty.get("#{base_url}", options)
35
- return if response1.code == 404 #invalid username/bad url
36
- return if response1.count == 0 #no tweets!
37
- return if response1.code == 400 #api limt exceeded
38
- start_at_1 = response1.last["id"]
39
- response1.each do |object|
40
- @responses << object["text"]
41
- end
42
-
43
- response2 = HTTParty.get("#{base_url}&max_id=#{start_at_1-1}", options)
44
- return if response2.count == 0
45
- return if response2.code == 400
46
- start_at_2 = response2.last["id"]
47
- response2.each do |object|
48
- @responses << object["text"]
49
- end
50
-
51
- response3 = HTTParty.get("#{base_url}&max_id=#{start_at_2-1}", options)
52
- return if response3.count == 0
53
- return if response3.code == 400
54
- start_at_3 = response3.last["id"]
55
- response3.each do |object|
56
- @responses << object["text"]
57
- end
58
-
59
- response4 = HTTParty.get("#{base_url}&max_id=#{start_at_3-1}", options)
60
- return if response4.count == 0
61
- return if response4.code == 400
62
- start_at_4 = response4.last["id"]
63
- response4.each do |object|
64
- @responses << object["text"]
65
- end
66
-
67
- response5 = HTTParty.get("#{base_url}&max_id=#{start_at_4-1}", options)
68
- return if response5.count == 0
69
- return if response5.code == 400
70
- start_at_5 = response4.last["id"]
71
- response5.each do |object|
72
- @responses << object["text"]
73
- end
74
- end
75
-
76
- def bad_key(key)
77
- return true if key.empty?
78
- return true if key.start_with?('http')
79
- end
80
-
81
- def to_a
82
- end
83
- end
84
- end