wuclan 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.textile +81 -6
 - data/examples/twitter/parse/parse_twitter_requests.rb +24 -18
 - data/examples/twitter/parse/parse_twitter_search_requests.rb +12 -16
 - data/examples/twitter/parse/parse_twitter_stream_requests.rb +40 -0
 - data/examples/twitter/scrape_twitter_search/README.textile +95 -0
 - data/examples/twitter/scrape_twitter_search/edamame_global_config-template.yaml +17 -0
 - data/examples/twitter/scrape_twitter_search/scrape_twitter_search.rb +1 -1
 - data/examples/twitter/scrape_twitter_search/seed.tsv +19 -0
 - data/examples/twitter/scrape_twitter_search/twitter_search_daemons.god +38 -18
 - data/lib/wuclan/twitter.rb +2 -0
 - data/lib/wuclan/twitter/model.rb +1 -0
 - data/lib/wuclan/twitter/model/relationship.rb +34 -26
 - data/lib/wuclan/twitter/model/tweet.rb +7 -0
 - data/lib/wuclan/twitter/model/twitter_user.rb +5 -0
 - data/lib/wuclan/twitter/parse.rb +3 -0
 - data/lib/wuclan/twitter/parse/twitter_search_parse.rb +1 -0
 - data/lib/wuclan/twitter/scrape.rb +2 -0
 - data/lib/wuclan/twitter/scrape/old_skool_request_classes.rb +8 -5
 - data/lib/wuclan/twitter/scrape/twitter_ff_ids_request.rb +4 -2
 - data/lib/wuclan/twitter/scrape/twitter_json_response.rb +39 -6
 - data/lib/wuclan/twitter/scrape/twitter_request_stream.rb +1 -0
 - data/lib/wuclan/twitter/scrape/twitter_stream_request.rb +44 -0
 - data/wuclan.gemspec +10 -2
 - metadata +9 -2
 
    
        data/README.textile
    CHANGED
    
    | 
         @@ -1,9 +1,29 @@ 
     | 
|
| 
       1 
1 
     | 
    
         | 
| 
       2 
     | 
    
         
            -
             
     | 
| 
      
 2 
     | 
    
         
            +
            Wuclan uses "Wukong":http://mrflip.github.com/wukong (Hadoop massive-data processing made easy) and "Monkeyshines":http://mrflip.github.com/monkeyshines (massive-scale directed scraper) to grok the deep structure of social networks. It is designed to scrape in a way that respectful of the terms and technical limits of each site while being agressive and efficient with your resources. We use it in practice to collect and analyze social graphs as large as 50 million-nodes, 1 billion-edges, 500 GB raw data  -- all of it actual data extracted in compliance with the site's terms of service.
         
     | 
| 
       3 
3 
     | 
    
         | 
| 
       4 
     | 
    
         
            -
             
     | 
| 
      
 4 
     | 
    
         
            +
            Currently wuclan handles:
         
     | 
| 
       5 
5 
     | 
    
         | 
| 
       6 
     | 
    
         
            -
             
     | 
| 
      
 6 
     | 
    
         
            +
            * Twitter -- API
         
     | 
| 
      
 7 
     | 
    
         
            +
            * Twitter -- Search
         
     | 
| 
      
 8 
     | 
    
         
            +
            * Twitter -- Hosebird
         
     | 
| 
      
 9 
     | 
    
         
            +
            * Last.fm
         
     | 
| 
      
 10 
     | 
    
         
            +
            * Opensocial
         
     | 
| 
      
 11 
     | 
    
         
            +
             
     | 
| 
      
 12 
     | 
    
         
            +
            <notextile><div class="toggle"></notextile>
         
     | 
| 
      
 13 
     | 
    
         
            +
             
     | 
| 
      
 14 
     | 
    
         
            +
            h2. Why?
         
     | 
| 
      
 15 
     | 
    
         
            +
             
     | 
| 
      
 16 
     | 
    
         
            +
            APIs are nice and all, but they prevent any insight into a) global properties, or b) deep structure.  You can't find global word frequency and dispersion, or average clustering coefficient, or calculate pagerank, or determine weighted-shortest-paths connections between two people through an API call.  But with a 10 machine hadoop cluster and a good-sized collection of data, you can (and wuclan has scripts to help answer many of those questions).
         
     | 
| 
      
 17 
     | 
    
         
            +
             
     | 
| 
      
 18 
     | 
    
         
            +
            Wuclan is strictly meant for such massive-scale investigations. Unless you're planning to do your final analysis on either hadoop or an enterprise-grade database system it's probably not worth the hassle.
         
     | 
| 
      
 19 
     | 
    
         
            +
             
     | 
| 
      
 20 
     | 
    
         
            +
            <notextile></div><div class="toggle"></notextile>
         
     | 
| 
      
 21 
     | 
    
         
            +
             
     | 
| 
      
 22 
     | 
    
         
            +
            h2. Wuclan: Scraping
         
     | 
| 
      
 23 
     | 
    
         
            +
             
     | 
| 
      
 24 
     | 
    
         
            +
            is almost ready for public use. Check back shortly.
         
     | 
| 
      
 25 
     | 
    
         
            +
             
     | 
| 
      
 26 
     | 
    
         
            +
            h3. lib/wuclan/*/models
         
     | 
| 
       7 
27 
     | 
    
         | 
| 
       8 
28 
     | 
    
         
             
            Defines the Wukong objects we'll most often use
         
     | 
| 
       9 
29 
     | 
    
         | 
| 
         @@ -12,9 +32,7 @@ Defines the Wukong objects we'll most often use 
     | 
|
| 
       12 
32 
     | 
    
         
             
            * TwitterUser
         
     | 
| 
       13 
33 
     | 
    
         
             
            * TwitterUserProfiles
         
     | 
| 
       14 
34 
     | 
    
         | 
| 
       15 
     | 
    
         
            -
             
     | 
| 
       16 
     | 
    
         
            -
             
     | 
| 
       17 
     | 
    
         
            -
            h3. lib/wuclan/request
         
     | 
| 
      
 35 
     | 
    
         
            +
            h3. lib/wuclan/*/request
         
     | 
| 
       18 
36 
     | 
    
         | 
| 
       19 
37 
     | 
    
         | 
| 
       20 
38 
     | 
    
         
             
            * Request -- the basic request metadata
         
     | 
| 
         @@ -25,4 +43,61 @@ h3. lib/wuclan/request 
     | 
|
| 
       25 
43 
     | 
    
         
             
              ensures that the request is left alone while recordizing.
         
     | 
| 
       26 
44 
     | 
    
         | 
| 
       27 
45 
     | 
    
         | 
| 
      
 46 
     | 
    
         
            +
            <notextile></div><div class="toggle"></notextile>
         
     | 
| 
      
 47 
     | 
    
         
            +
             
     | 
| 
      
 48 
     | 
    
         
            +
            h2. Wuclan: Analysis
         
     | 
| 
      
 49 
     | 
    
         
            +
             
     | 
| 
      
 50 
     | 
    
         
            +
            actually most of this still lives in the imw_twitter_friends repo.
         
     | 
| 
      
 51 
     | 
    
         
            +
             
     | 
| 
      
 52 
     | 
    
         
            +
            <notextile></div><div class="toggle"></notextile>
         
     | 
| 
      
 53 
     | 
    
         
            +
             
     | 
| 
      
 54 
     | 
    
         
            +
            h2. Install
         
     | 
| 
      
 55 
     | 
    
         
            +
             
     | 
| 
      
 56 
     | 
    
         
            +
            ** "Main Install and Setup Documentation":http://mrflip.github.com/edamame/INSTALL.html **
         
     | 
| 
      
 57 
     | 
    
         
            +
             
     | 
| 
      
 58 
     | 
    
         
            +
            h3. Get the code
         
     | 
| 
      
 59 
     | 
    
         
            +
             
     | 
| 
      
 60 
     | 
    
         
            +
            We're still actively developing edamame.  The newest version is available via "Git":http://git-scm.com on "github:":http://github.com/mrflip/edamame
         
     | 
| 
      
 61 
     | 
    
         
            +
             
     | 
| 
      
 62 
     | 
    
         
            +
            pre. $ git clone git://github.com/mrflip/edamame
         
     | 
| 
      
 63 
     | 
    
         
            +
             
     | 
| 
      
 64 
     | 
    
         
            +
            A gem is available from "gemcutter:":http://gemcutter.org/gems/edamame
         
     | 
| 
      
 65 
     | 
    
         
            +
             
     | 
| 
      
 66 
     | 
    
         
            +
            pre. $ sudo gem install edamame --source=http://gemcutter.org
         
     | 
| 
      
 67 
     | 
    
         
            +
             
     | 
| 
      
 68 
     | 
    
         
            +
            (don't use the gems.github.com version -- it's way out of date.)
         
     | 
| 
      
 69 
     | 
    
         
            +
             
     | 
| 
      
 70 
     | 
    
         
            +
            You can instead download this project in either "zip":http://github.com/mrflip/edamame/zipball/master or "tar":http://github.com/mrflip/edamame/tarball/master formats.
         
     | 
| 
      
 71 
     | 
    
         
            +
             
     | 
| 
      
 72 
     | 
    
         
            +
            h3. Get the Dependencies
         
     | 
| 
      
 73 
     | 
    
         
            +
             
     | 
| 
      
 74 
     | 
    
         
            +
            To finish setting up, see the "detailed setup instructions":http://mrflip.github.com/edamame/INSTALL.html and then read the "usage notes":http://mrflip.github.com/edamame/usage.html
         
     | 
| 
      
 75 
     | 
    
         
            +
             
     | 
| 
      
 76 
     | 
    
         
            +
            * "beanstalkd 1.3,":http://xph.us/dist/beanstalkd/ "libevent 1.4,":http://monkey.org/~provos/libevent/ and "beanstalk-client":http://github.com/dustin/beanstalk-client
         
     | 
| 
      
 77 
     | 
    
         
            +
            * "Tokyo Tyrant,":http://tokyocabinet.sourceforge.net/tyrantdoc/ "Tokyo Tyrant Ruby libs,":http://tokyocabinet.sourceforge.net/tyrantrubydoc/ "Tokyo Cabinet,":http://tokyocabinet.sourceforge.net and "Tokyo Cabinet Ruby libs":http://tokyocabinet.sourceforge.net/tyrantdoc/
         
     | 
| 
      
 78 
     | 
    
         
            +
            * Gems: "wukong":http://mrflip.github.com/wukong and "monkeyshines":http://mrflip.github.com/monkeyshines
         
     | 
| 
      
 79 
     | 
    
         
            +
             
     | 
| 
      
 80 
     | 
    
         
            +
            See the "Detailed install instructions":http://mrflip.github.com/edamame/INSTALL.html (it also has hints about installing Tokyo*, Beanstalkd and friends.
         
     | 
| 
      
 81 
     | 
    
         
            +
             
     | 
| 
      
 82 
     | 
    
         
            +
            <notextile></div><div class="toggle"></notextile>
         
     | 
| 
      
 83 
     | 
    
         
            +
             
     | 
| 
       28 
84 
     | 
    
         
             
            h3. lib/wuclan/  
         
     | 
| 
      
 85 
     | 
    
         
            +
             
     | 
| 
      
 86 
     | 
    
         
            +
             
     | 
| 
      
 87 
     | 
    
         
            +
            ---------------------------------------------------------------------------
         
     | 
| 
      
 88 
     | 
    
         
            +
             
     | 
| 
      
 89 
     | 
    
         
            +
            <notextile><div class="toggle"></notextile>
         
     | 
| 
      
 90 
     | 
    
         
            +
             
     | 
| 
      
 91 
     | 
    
         
            +
            h2. More info
         
     | 
| 
      
 92 
     | 
    
         
            +
             
     | 
| 
      
 93 
     | 
    
         
            +
            There are many useful examples in the examples/ directory.
         
     | 
| 
      
 94 
     | 
    
         
            +
             
     | 
| 
      
 95 
     | 
    
         
            +
            h3. Credits
         
     | 
| 
      
 96 
     | 
    
         
            +
             
     | 
| 
      
 97 
     | 
    
         
            +
            wuclan was written by "Philip (flip) Kromer":http://mrflip.com (flip@infochimps.org / "@mrflip":http://twitter.com/mrflip) for the "infochimps project":http://infochimps.org
         
     | 
| 
      
 98 
     | 
    
         
            +
             
     | 
| 
      
 99 
     | 
    
         
            +
            h3. Help!
         
     | 
| 
      
 100 
     | 
    
         
            +
             
     | 
| 
      
 101 
     | 
    
         
            +
            Send wuclan questions to the "Infinite Monkeywrench mailing list":http://groups.google.com/group/infochimps-code
         
     | 
| 
      
 102 
     | 
    
         
            +
             
     | 
| 
      
 103 
     | 
    
         
            +
            <notextile></div></notextile>
         
     | 
| 
         @@ -5,23 +5,24 @@ require 'wukong' 
     | 
|
| 
       5 
5 
     | 
    
         
             
            require 'monkeyshines'
         
     | 
| 
       6 
6 
     | 
    
         | 
| 
       7 
7 
     | 
    
         
             
            require 'wuclan/twitter'
         
     | 
| 
       8 
     | 
    
         
            -
            # if you're anyone but original author this next require is useless but harmless.
         
     | 
| 
       9 
     | 
    
         
            -
            require 'wuclan/twitter/scrape/old_skool_request_classes'
         
     | 
| 
       10 
8 
     | 
    
         
             
            # un-namespace request classes.
         
     | 
| 
       11 
9 
     | 
    
         
             
            include Wuclan::Twitter::Scrape
         
     | 
| 
       12 
10 
     | 
    
         
             
            include Wuclan::Twitter::Model
         
     | 
| 
      
 11 
     | 
    
         
            +
            # if you're anyone but original author this next require is useless but harmless.
         
     | 
| 
      
 12 
     | 
    
         
            +
            require 'wuclan/twitter/scrape/old_skool_request_classes'
         
     | 
| 
       13 
13 
     | 
    
         | 
| 
       14 
14 
     | 
    
         
             
            #
         
     | 
| 
      
 15 
     | 
    
         
            +
            # Incoming objects are Wuclan::Twitter::Scrape requests.
         
     | 
| 
       15 
16 
     | 
    
         
             
            #
         
     | 
| 
       16 
     | 
    
         
            -
            #  
     | 
| 
       17 
     | 
    
         
            -
            #  
     | 
| 
       18 
     | 
    
         
            -
            #
         
     | 
| 
      
 17 
     | 
    
         
            +
            # Their #parse method disgorges a stream of Wuclan::Twitter::Model objects, as
         
     | 
| 
      
 18 
     | 
    
         
            +
            # few or as many as found.  For example, a twitter_user_request will assumedly
         
     | 
| 
      
 19 
     | 
    
         
            +
            # have a twitter_user record if it is healthy, but may not have a tweet (if the
         
     | 
| 
      
 20 
     | 
    
         
            +
            # user hasn't ever tweeted) and might not have profile or style info (if the
         
     | 
| 
      
 21 
     | 
    
         
            +
            # user is protected).
         
     | 
| 
       19 
22 
     | 
    
         
             
            #
         
     | 
| 
       20 
23 
     | 
    
         
             
            class TwitterRequestParser < Wukong::Streamer::StructStreamer
         
     | 
| 
       21 
     | 
    
         
            -
             
     | 
| 
       22 
24 
     | 
    
         
             
              def process request, *args, &block
         
     | 
| 
       23 
25 
     | 
    
         
             
                request.parse(*args) do |obj|
         
     | 
| 
       24 
     | 
    
         
            -
                  next if obj.is_a? BadRecord
         
     | 
| 
       25 
26 
     | 
    
         
             
                  yield obj.to_flat(false)
         
     | 
| 
       26 
27 
     | 
    
         
             
                end
         
     | 
| 
       27 
28 
     | 
    
         
             
              end
         
     | 
| 
         @@ -31,29 +32,34 @@ end 
     | 
|
| 
       31 
32 
     | 
    
         
             
            # We want to record each individual state of the resource, with the last-seen of
         
     | 
| 
       32 
33 
     | 
    
         
             
            # its timestamps (if there are many). So if we saw
         
     | 
| 
       33 
34 
     | 
    
         
             
            #
         
     | 
| 
       34 
     | 
    
         
            -
            #     rsrc  id   screen_name   followers_count  friends_count  (...  
     | 
| 
       35 
     | 
    
         
            -
            #     user  23   skidoo        47               61
         
     | 
| 
       36 
     | 
    
         
            -
            #     user  23   skidoo        48               62
         
     | 
| 
       37 
     | 
    
         
            -
            #     user  23   skidoo        48               62
         
     | 
| 
       38 
     | 
    
         
            -
            #     user  23   skidoo        52               62
         
     | 
| 
       39 
     | 
    
         
            -
            #     user  23   skidoo        52                
     | 
| 
      
 35 
     | 
    
         
            +
            #     rsrc  id   screen_name   followers_count  friends_count  (...) scraped_at
         
     | 
| 
      
 36 
     | 
    
         
            +
            #     user  23   skidoo        47               61                   20090608
         
     | 
| 
      
 37 
     | 
    
         
            +
            #     user  23   skidoo        48               62                   20090802
         
     | 
| 
      
 38 
     | 
    
         
            +
            #     user  23   skidoo        48               62                   20090901
         
     | 
| 
      
 39 
     | 
    
         
            +
            #     user  23   skidoo        52               62                   20090920
         
     | 
| 
      
 40 
     | 
    
         
            +
            #     user  23   skidoo        52               62                   20090922
         
     | 
| 
      
 41 
     | 
    
         
            +
            #     user  23   skidoo        52               63                   20090923
         
     | 
| 
      
 42 
     | 
    
         
            +
            #
         
     | 
| 
      
 43 
     | 
    
         
            +
            # we would only keep
         
     | 
| 
       40 
44 
     | 
    
         
             
            #
         
     | 
| 
      
 45 
     | 
    
         
            +
            #     user  23   skidoo        47               61                   20090608
         
     | 
| 
      
 46 
     | 
    
         
            +
            #     user  23   skidoo        48               62                   20090802
         
     | 
| 
      
 47 
     | 
    
         
            +
            #     user  23   skidoo        52               62                   20090920
         
     | 
| 
      
 48 
     | 
    
         
            +
            #     user  23   skidoo        52               63                   20090922
         
     | 
| 
       41 
49 
     | 
    
         
             
            #
         
     | 
| 
       42 
50 
     | 
    
         
             
            class TwitterRequestUniqer < Wukong::Streamer::UniqByLastReducer
         
     | 
| 
       43 
51 
     | 
    
         
             
              include Wukong::Streamer::StructRecordizer
         
     | 
| 
       44 
     | 
    
         
            -
             
     | 
| 
       45 
52 
     | 
    
         
             
              attr_accessor :uniquer_count
         
     | 
| 
       46 
53 
     | 
    
         | 
| 
       47 
54 
     | 
    
         
             
              #
         
     | 
| 
       48 
     | 
    
         
            -
              #
         
     | 
| 
       49 
     | 
    
         
            -
              #
         
     | 
| 
      
 55 
     | 
    
         
            +
              # FIXME -- move this into the models themselves.
         
     | 
| 
       50 
56 
     | 
    
         
             
              #
         
     | 
| 
       51 
57 
     | 
    
         
             
              # for immutable objects we can just work off their ID.
         
     | 
| 
       52 
58 
     | 
    
         
             
              #
         
     | 
| 
       53 
59 
     | 
    
         
             
              # for mutable objects we want to record each unique state: all the fields
         
     | 
| 
       54 
60 
     | 
    
         
             
              # apart from the scraped_at timestamp.
         
     | 
| 
       55 
61 
     | 
    
         
             
              #
         
     | 
| 
       56 
     | 
    
         
            -
              def get_key obj
         
     | 
| 
      
 62 
     | 
    
         
            +
              def get_key obj, *_
         
     | 
| 
       57 
63 
     | 
    
         
             
                case obj
         
     | 
| 
       58 
64 
     | 
    
         
             
                when Tweet
         
     | 
| 
       59 
65 
     | 
    
         
             
                  obj.id
         
     | 
| 
         @@ -71,7 +77,7 @@ class TwitterRequestUniqer < Wukong::Streamer::UniqByLastReducer 
     | 
|
| 
       71 
77 
     | 
    
         
             
                super *args
         
     | 
| 
       72 
78 
     | 
    
         
             
              end
         
     | 
| 
       73 
79 
     | 
    
         | 
| 
       74 
     | 
    
         
            -
              def accumulate obj
         
     | 
| 
      
 80 
     | 
    
         
            +
              def accumulate obj, *_
         
     | 
| 
       75 
81 
     | 
    
         
             
                self.uniquer_count      += 1
         
     | 
| 
       76 
82 
     | 
    
         
             
                self.final_value = [self.uniquer_count, obj.to_flat].flatten
         
     | 
| 
       77 
83 
     | 
    
         
             
              end
         
     | 
| 
         @@ -1,28 +1,24 @@ 
     | 
|
| 
       1 
1 
     | 
    
         
             
            #!/usr/bin/env ruby
         
     | 
| 
       2 
     | 
    
         
            -
            #$: << ENV['WUKONG_PATH']
         
     | 
| 
       3 
2 
     | 
    
         
             
            require 'rubygems'
         
     | 
| 
       4 
3 
     | 
    
         
             
            require 'wukong'
         
     | 
| 
       5 
     | 
    
         
            -
            require ' 
     | 
| 
       6 
     | 
    
         
            -
             
     | 
| 
       7 
     | 
    
         
            -
            require 'wuclan/twitter'
         
     | 
| 
       8 
     | 
    
         
            -
            require 'wuclan/twitter/scrape/twitter_search_request'
         
     | 
| 
       9 
     | 
    
         
            -
            require 'wuclan/twitter/parse/twitter_search_parse'
         
     | 
| 
      
 4 
     | 
    
         
            +
            require 'wuclan/twitter';
         
     | 
| 
      
 5 
     | 
    
         
            +
            require 'wuclan/twitter/parse';
         
     | 
| 
       10 
6 
     | 
    
         
             
            include Wuclan::Twitter::Scrape
         
     | 
| 
       11 
7 
     | 
    
         | 
| 
       12 
     | 
    
         
            -
            #
         
     | 
| 
       13 
     | 
    
         
            -
            #
         
     | 
| 
       14 
     | 
    
         
            -
            # Instantiate each incoming request.
         
     | 
| 
       15 
     | 
    
         
            -
            # Stream out the contained classes it generates.
         
     | 
| 
       16 
     | 
    
         
            -
            #
         
     | 
| 
       17 
     | 
    
         
            -
            #
         
     | 
| 
       18 
8 
     | 
    
         
             
            class TwitterRequestParser < Wukong::Streamer::StructStreamer
         
     | 
| 
      
 9 
     | 
    
         
            +
              #
         
     | 
| 
      
 10 
     | 
    
         
            +
              # Object: parse thyself.
         
     | 
| 
      
 11 
     | 
    
         
            +
              #
         
     | 
| 
       19 
12 
     | 
    
         
             
              def process request, *args, &block
         
     | 
| 
       20 
13 
     | 
    
         
             
                request.parse(*args) do |obj|
         
     | 
| 
       21 
     | 
    
         
            -
                  next if obj.is_a? 
     | 
| 
       22 
     | 
    
         
            -
                  yield obj 
     | 
| 
      
 14 
     | 
    
         
            +
                  next if obj.blank? || obj.is_a?(BadRecord)
         
     | 
| 
      
 15 
     | 
    
         
            +
                  yield obj
         
     | 
| 
       23 
16 
     | 
    
         
             
                end
         
     | 
| 
       24 
17 
     | 
    
         
             
              end
         
     | 
| 
       25 
18 
     | 
    
         
             
            end
         
     | 
| 
       26 
19 
     | 
    
         | 
| 
       27 
     | 
    
         
            -
            #  
     | 
| 
       28 
     | 
    
         
            -
            Wukong::Script.new( 
     | 
| 
      
 20 
     | 
    
         
            +
            # Go, script, go!
         
     | 
| 
      
 21 
     | 
    
         
            +
            Wukong::Script.new(
         
     | 
| 
      
 22 
     | 
    
         
            +
              TwitterRequestParser,
         
     | 
| 
      
 23 
     | 
    
         
            +
              nil
         
     | 
| 
      
 24 
     | 
    
         
            +
              ).run
         
     | 
| 
         @@ -0,0 +1,40 @@ 
     | 
|
| 
      
 1 
     | 
    
         
            +
            #!/usr/bin/env ruby
         
     | 
| 
      
 2 
     | 
    
         
            +
            require 'rubygems'
         
     | 
| 
      
 3 
     | 
    
         
            +
            require 'wukong'
         
     | 
| 
      
 4 
     | 
    
         
            +
            require 'monkeyshines';
         
     | 
| 
      
 5 
     | 
    
         
            +
            require 'wuclan/twitter';
         
     | 
| 
      
 6 
     | 
    
         
            +
            require 'wuclan/twitter/scrape/twitter_search_request';
         
     | 
| 
      
 7 
     | 
    
         
            +
            require 'wuclan/twitter/parse/twitter_search_parse';
         
     | 
| 
      
 8 
     | 
    
         
            +
            include Wuclan::Twitter::Scrape
         
     | 
| 
      
 9 
     | 
    
         
            +
             
     | 
| 
      
 10 
     | 
    
         
            +
             
     | 
| 
      
 11 
     | 
    
         
            +
             
     | 
| 
      
 12 
     | 
    
         
            +
            #
         
     | 
| 
      
 13 
     | 
    
         
            +
            # Twitter stream requests
         
     | 
| 
      
 14 
     | 
    
         
            +
            #
         
     | 
| 
      
 15 
     | 
    
         
            +
            #   http://apiwiki.twitter.com/Streaming-API-Documentation
         
     | 
| 
      
 16 
     | 
    
         
            +
            #
         
     | 
| 
      
 17 
     | 
    
         
            +
            # Fills a file with JSON status records, one line per status.
         
     | 
| 
      
 18 
     | 
    
         
            +
            #
         
     | 
| 
      
 19 
     | 
    
         
            +
            #   {"text":"Hey #bigdata #hadoop geeks: who's missing? @mrflip/bigdata / http://bit.ly/datatweeps","favorited":false,"geo":null,"in_reply_to_screen_name":null,"source":"web","created_at":"Thu Oct 29 09:29:32 +0000 2009","user":{"verified":false,"notifications":null,"profile_text_color":"000000","time_zone":"Central Time (US & Canada)","following":null,"profile_link_color":"0000ff","profile_image_url":"http://a3.twimg.com/profile_images/377919497/FlipCircle-2009-900-trans_normal.png","profile_background_image_url":"http://a3.twimg.com/profile_background_images/2348065/2005Mar-AustinTypeTour-075_-_Rappers_Delight_Raindrop.jpg","description":"Increasing access to free open data, building tools to Organize, Explore and Comprehend massive data sources - http://infochimps.org","location":"iPhone: 30.316122,-97.733817","profile_sidebar_fill_color":"ffffff","screen_name":"mrflip","profile_background_tile":false,"profile_sidebar_border_color":"f0edd8","statuses_count":1307,"followers_count":678,"protected":false,"url":"http://infochimps.org","created_at":"Mon Mar 19 21:08:24 +0000 2007","friends_count":514,"name":"Philip Flip Kromer","geo_enabled":false,"profile_background_color":"BCC0C8","id":1554031,"utc_offset":-21600,"favourites_count":61},"id":5254924802,"in_reply_to_user_id":null,"in_reply_to_status_id":null,"truncated":false}
         
     | 
| 
      
 20 
     | 
    
         
            +
            #
         
     | 
| 
      
 21 
     | 
    
         
            +
            # Try it with
         
     | 
| 
      
 22 
     | 
    
         
            +
            #   twuserpass='name:pass'
         
     | 
| 
      
 23 
     | 
    
         
            +
            #   curl -s -u $twpass http://stream.twitter.com/1/statuses/sample.json > /tmp/sample.json
         
     | 
| 
      
 24 
     | 
    
         
            +
            #   cat /tmp/sample.json | parse_twitter_stream_requests.rb --map
         
     | 
| 
      
 25 
     | 
    
         
            +
            #
         
     | 
| 
      
 26 
     | 
    
         
            +
            class TwitterRequestParser < Wukong::Streamer::RecordStreamer
         
     | 
| 
      
 27 
     | 
    
         
            +
              def recordize *args
         
     | 
| 
      
 28 
     | 
    
         
            +
                foo = args.first
         
     | 
| 
      
 29 
     | 
    
         
            +
                [ TwitterStreamRequest.new(super(*args).first) ]
         
     | 
| 
      
 30 
     | 
    
         
            +
              end
         
     | 
| 
      
 31 
     | 
    
         
            +
              def process request, *args, &block
         
     | 
| 
      
 32 
     | 
    
         
            +
                request.parse(*args) do |obj|
         
     | 
| 
      
 33 
     | 
    
         
            +
                  next if obj.is_a? BadRecord
         
     | 
| 
      
 34 
     | 
    
         
            +
                  yield obj.to_flat(false) # if obj.is_a?(DeleteTweet)
         
     | 
| 
      
 35 
     | 
    
         
            +
                end
         
     | 
| 
      
 36 
     | 
    
         
            +
              end
         
     | 
| 
      
 37 
     | 
    
         
            +
            end
         
     | 
| 
      
 38 
     | 
    
         
            +
             
     | 
| 
      
 39 
     | 
    
         
            +
            # This makes the script go.
         
     | 
| 
      
 40 
     | 
    
         
            +
            Wukong::Script.new(TwitterRequestParser, nil).run
         
     | 
| 
         @@ -0,0 +1,95 @@ 
     | 
|
| 
      
 1 
     | 
    
         
            +
            This is actually less rickety than it seems, but you'll have to hand edit a few paths and config files.  Feel free to suggest a more polite organization of it all.
         
     | 
| 
      
 2 
     | 
    
         
            +
             
     | 
| 
      
 3 
     | 
    
         
            +
            h2. Initial setup
         
     | 
| 
      
 4 
     | 
    
         
            +
             
     | 
| 
      
 5 
     | 
    
         
            +
            * Install prerequisites using rubygems:
         
     | 
| 
      
 6 
     | 
    
         
            +
             
     | 
| 
      
 7 
     | 
    
         
            +
            <pre>
         
     | 
| 
      
 8 
     | 
    
         
            +
              sudo gem install htmlentities extlib god
         
     | 
| 
      
 9 
     | 
    
         
            +
            </pre>
         
     | 
| 
      
 10 
     | 
    
         
            +
             
     | 
| 
      
 11 
     | 
    
         
            +
            * check out each of the "monkeyshines":http://github.com/mrflip/monkeyshines, "wukong":http://github.com/mrflip/wukong, "wuclan":http://github.com/mrflip/wuclan, and "edamame":http://github.com/mrflip/edamame repos using git, preferably as neighbors in the same directory.
         
     | 
| 
      
 12 
     | 
    
         
            +
             
     | 
| 
      
 13 
     | 
    
         
            +
            * follow instructions from http://mrflip.github.com/edamame/INSTALL.html for beanstalkd and tokyo tyrant
         
     | 
| 
      
 14 
     | 
    
         
            +
             
     | 
| 
      
 15 
     | 
    
         
            +
            h2. Find the scraper
         
     | 
| 
      
 16 
     | 
    
         
            +
             
     | 
| 
      
 17 
     | 
    
         
            +
            Although you can install wuclan as a gem, I actually recommend installing it from git source:
         
     | 
| 
      
 18 
     | 
    
         
            +
             
     | 
| 
      
 19 
     | 
    
         
            +
            <pre>
         
     | 
| 
      
 20 
     | 
    
         
            +
              git clone git://github.com/mrflip/wuclan.git
         
     | 
| 
      
 21 
     | 
    
         
            +
            </pre>
         
     | 
| 
      
 22 
     | 
    
         
            +
             
     | 
| 
      
 23 
     | 
    
         
            +
            You'll run the scraper from
         
     | 
| 
      
 24 
     | 
    
         
            +
             
     | 
| 
      
 25 
     | 
    
         
            +
            <pre>
         
     | 
| 
      
 26 
     | 
    
         
            +
              wuclan/examples/twitter/scrape_twitter_search
         
     | 
| 
      
 27 
     | 
    
         
            +
            </pre>
         
     | 
| 
      
 28 
     | 
    
         
            +
             
     | 
| 
      
 29 
     | 
    
         
            +
            h2. Make the scrape destination
         
     | 
| 
      
 30 
     | 
    
         
            +
             
     | 
| 
      
 31 
     | 
    
         
            +
            You will need to set up a landing place for the files, probably by editing the work/ symlink (sorry, this is kludgy and should be fixed). 
         
     | 
| 
      
 32 
     | 
    
         
            +
             
     | 
| 
      
 33 
     | 
    
         
            +
            The naming scheme I use is good for running scrapers against a lot of targets.         From the wuclan/examples/twitter/scrape_twitter_search directory:
         
     | 
| 
      
 34 
     | 
    
         
            +
             
     | 
| 
      
 35 
     | 
    
         
            +
            <pre>
         
     | 
| 
      
 36 
     | 
    
         
            +
              mkdir ../../../../data/ripd/com.tw/com.twitter.search
         
     | 
| 
      
 37 
     | 
    
         
            +
            </pre>
         
     | 
| 
      
 38 
     | 
    
         
            +
             
     | 
| 
      
 39 
     | 
    
         
            +
            (this constructs a tree that is a sibling of the wuclan dir).
         
     | 
| 
      
 40 
     | 
    
         
            +
             
     | 
| 
      
 41 
     | 
    
         
            +
            Wherever you put the scrape destination,
         
     | 
| 
      
 42 
     | 
    
         
            +
            * DO NOT add it to your code's git repo
         
     | 
| 
      
 43 
     | 
    
         
            +
            * exclude it from spotlight indexing and so forth
         
     | 
| 
      
 44 
     | 
    
         
            +
             
     | 
| 
      
 45 
     | 
    
         
            +
            h2. Add your search terms to seed.tsv
         
     | 
| 
      
 46 
     | 
    
         
            +
             
     | 
| 
      
 47 
     | 
    
         
            +
            To add a search job, edit seed.tsv: add each search phrase and its priority, separated by a tab (Lower priority == more important). **Don't** url-encode your query terms.  Spaces will be replaced by plus signs+ and other non-alphanumerics will be url-encoded.
         
     | 
| 
      
 48 
     | 
    
         
            +
             
     | 
| 
      
 49 
     | 
    
         
            +
            h2. Start the queue daemons
         
     | 
| 
      
 50 
     | 
    
         
            +
             
     | 
| 
      
 51 
     | 
    
         
            +
            Copy @edamame_global_config-template.yaml@ to your scrape destination, and name it @edamame_global_config.yaml@ Also, edit the @./twitter_search_daemons.god@ file to indicate the scrape destination.
         
     | 
| 
      
 52 
     | 
    
         
            +
             
     | 
| 
      
 53 
     | 
    
         
            +
            Use god to start the daemons: @sudo god -c ./twitter_search_daemons.god@ (add the -D flag to debug)
         
     | 
| 
      
 54 
     | 
    
         
            +
             
     | 
| 
      
 55 
     | 
    
         
            +
            h2. Load the search terms
         
     | 
| 
      
 56 
     | 
    
         
            +
             
     | 
| 
      
 57 
     | 
    
         
            +
            Load this data with
         
     | 
| 
      
 58 
     | 
    
         
            +
             
     | 
| 
      
 59 
     | 
    
         
            +
            <pre>
         
     | 
| 
      
 60 
     | 
    
         
            +
              ./load_twitter_search_jobs.rb --handle=com.twitter.search --source-filename=./seed_lim.tsv
         
     | 
| 
      
 61 
     | 
    
         
            +
            </pre>
         
     | 
| 
      
 62 
     | 
    
         
            +
             
     | 
| 
      
 63 
     | 
    
         
            +
            You can check it was loaded with 
         
     | 
| 
      
 64 
     | 
    
         
            +
             
     | 
| 
      
 65 
     | 
    
         
            +
            <pre>
         
     | 
| 
      
 66 
     | 
    
         
            +
              /path/to/edamame/bin/edamame-sync  --handle=com.twitter.search --store=:11241 --queue=:11240
         
     | 
| 
      
 67 
     | 
    
         
            +
            </pre>
         
     | 
| 
      
 68 
     | 
    
         
            +
             
     | 
| 
      
 69 
     | 
    
         
            +
            (This unloads all jobs from the transient queue and stuffs them back in from the database).
         
     | 
| 
      
 70 
     | 
    
         
            +
             
     | 
| 
      
 71 
     | 
    
         
            +
            Empty all search queues with
         
     | 
| 
      
 72 
     | 
    
         
            +
             
     | 
| 
      
 73 
     | 
    
         
            +
            <pre>
         
     | 
| 
      
 74 
     | 
    
         
            +
              /path/to/edamame/bin/edamame-nuke  --handle=com.twitter.search --store=:11241 --queue=:11240
         
     | 
| 
      
 75 
     | 
    
         
            +
            </pre>
         
     | 
| 
      
 76 
     | 
    
         
            +
             
     | 
| 
      
 77 
     | 
    
         
            +
            h2. Run the scraper
         
     | 
| 
      
 78 
     | 
    
         
            +
             
     | 
| 
      
 79 
     | 
    
         
            +
            <pre>
         
     | 
| 
      
 80 
     | 
    
         
            +
              nohup ./scrape_twitter_search.rb --handle=com.twitter.search >> work/log/twitter_search-console.log 2>&1 &
         
     | 
| 
      
 81 
     | 
    
         
            +
            </pre>
         
     | 
| 
      
 82 
     | 
    
         
            +
             
     | 
| 
      
 83 
     | 
    
         
            +
            This will run forever.  Check its progress with 
         
     | 
| 
      
 84 
     | 
    
         
            +
             
     | 
| 
      
 85 
     | 
    
         
            +
            <pre>
         
     | 
| 
      
 86 
     | 
    
         
            +
              tail -f work/log/twitter_search-console.log
         
     | 
| 
      
 87 
     | 
    
         
            +
            </pre>
         
     | 
| 
      
 88 
     | 
    
         
            +
             
     | 
| 
      
 89 
     | 
    
         
            +
            If you want to watch the output files,
         
     | 
| 
      
 90 
     | 
    
         
            +
             
     | 
| 
      
 91 
     | 
    
         
            +
            <pre>
         
     | 
| 
      
 92 
     | 
    
         
            +
              datename=`date "+%Y%m%d"` ; tail -f work/$datename/* | cut -c 1-2000
         
     | 
| 
      
 93 
     | 
    
         
            +
            </pre>
         
     | 
| 
      
 94 
     | 
    
         
            +
             
     | 
| 
      
 95 
     | 
    
         
            +
            Be careful dumping the output files to screen -- each line can be tens of thousand characters long and will lock your terminal right up.
         
     | 
| 
         @@ -0,0 +1,17 @@ 
     | 
|
| 
      
 1 
     | 
    
         
            +
            --- # -*- YAML -*-
         
     | 
| 
      
 2 
     | 
    
         
            +
            #
         
     | 
| 
      
 3 
     | 
    
         
            +
            # Save this file in your god dir, *then* change the settings below.
         
     | 
| 
      
 4 
     | 
    
         
            +
            # Make sure your version control system is set to ignore the file.
         
     | 
| 
      
 5 
     | 
    
         
            +
            #
         
     | 
| 
      
 6 
     | 
    
         
            +
             
     | 
| 
      
 7 
     | 
    
         
            +
            :email:
         
     | 
| 
      
 8 
     | 
    
         
            +
              :domain:              your.domain.com
         
     | 
| 
      
 9 
     | 
    
         
            +
              :username:            robot@your.domain.com
         
     | 
| 
      
 10 
     | 
    
         
            +
              :password:            YOURPASSWORD
         
     | 
| 
      
 11 
     | 
    
         
            +
              :to:                  people_who_can_fix_errors@your.domain.com
         
     | 
| 
      
 12 
     | 
    
         
            +
              :to_name:             People who can fix the scraper
         
     | 
| 
      
 13 
     | 
    
         
            +
             
     | 
| 
      
 14 
     | 
    
         
            +
            # these apply to all processes
         
     | 
| 
      
 15 
     | 
    
         
            +
            :god_process:
         
     | 
| 
      
 16 
     | 
    
         
            +
              :flapping_notify:     default
         
     | 
| 
      
 17 
     | 
    
         
            +
             
     | 
| 
         @@ -34,7 +34,7 @@ loop do 
     | 
|
| 
       34 
34 
     | 
    
         
             
                    :dest    => { :type  => :chunked_flat_file_store, :rootdir => WORK_DIR, :filemode => 'a' },
         
     | 
| 
       35 
35 
     | 
    
         
             
                    # :dest    => { :type  => :flat_file_store, :filename => WORK_DIR+"/test_output.tsv" },
         
     | 
| 
       36 
36 
     | 
    
         
             
                    # :fetcher => { :type => TwitterSearchFakeFetcher },
         
     | 
| 
       37 
     | 
    
         
            -
                    :sleep_time  => 1 ,
         
     | 
| 
      
 37 
     | 
    
         
            +
                    :sleep_time  => 1.25 ,
         
     | 
| 
       38 
38 
     | 
    
         
             
                  })
         
     | 
| 
       39 
39 
     | 
    
         
             
                Log.info "Starting a run!"
         
     | 
| 
       40 
40 
     | 
    
         
             
                scraper.run
         
     | 
| 
         @@ -0,0 +1,19 @@ 
     | 
|
| 
      
 1 
     | 
    
         
            +
            # See the readme for instructions.
         
     | 
| 
      
 2 
     | 
    
         
            +
             
     | 
| 
      
 3 
     | 
    
         
            +
            # To add a search job, put the search phrase and its priority, separated by a
         
     | 
| 
      
 4 
     | 
    
         
            +
            # tab (Lower priority == more important).
         
     | 
| 
      
 5 
     | 
    
         
            +
             
     | 
| 
      
 6 
     | 
    
         
            +
            red sox            	  1000
         
     | 
| 
      
 7 
     | 
    
         
            +
            yankees            	  1000
         
     | 
| 
      
 8 
     | 
    
         
            +
             
     | 
| 
      
 9 
     | 
    
         
            +
            # You can recycle the output of dump_twitter_search_jobs
         
     | 
| 
      
 10 
     | 
    
         
            +
            hadoop             	    50	 0.103063874053513	4985477488	    3852	9700.94447529118952	20091019021751
         
     | 
| 
      
 11 
     | 
    
         
            +
            infochimp          	    50	 0.106963481416675	4827288395	      62	9347.39116701332932	20091019022759
         
     | 
| 
      
 12 
     | 
    
         
            +
            infochimps         	    50	 0.102891460905350	4922555326	     575	9717.31832415175086	20091019025808
         
     | 
| 
      
 13 
     | 
    
         
            +
            semantic           	    50	 0.103387063739869	4985327841	    4400	9670.39361100528913	20091019021435
         
     | 
| 
      
 14 
     | 
    
         
            +
            semanticweb        	    50	 0.102956390169747	4986072386	    3156	9711.01359700656940	20091019025943
         
     | 
| 
      
 15 
     | 
    
         
            +
             
         
     | 
| 
      
 16 
     | 
    
         
            +
            # These will quickly generate a buttload of data
         
     | 
| 
      
 17 
     | 
    
         
            +
            # RT               	110000	 6.977299880525690	4986408447	 9628514	 115.87012880821899	20091019032753
         
     | 
| 
      
 18 
     | 
    
         
            +
            # http             	110000	28.312757201646100	4986411833	70327665	   6.90163844186046	20091019032825
         
     | 
| 
      
 19 
     | 
    
         
            +
            # twitter          	110000	 1.319672131147540	4986376554	 7432567	 733.71915915527904	20091019032511
         
     | 
| 
         @@ -1,25 +1,45 @@ 
     | 
|
| 
       1 
     | 
    
         
            -
             
     | 
| 
       2 
     | 
    
         
            -
            require ' 
     | 
| 
       3 
     | 
    
         
            -
             
     | 
| 
      
 1 
     | 
    
         
            +
            require 'yaml'
         
     | 
| 
      
 2 
     | 
    
         
            +
            require 'extlib'
         
     | 
| 
      
 3 
     | 
    
         
            +
            require 'wukong/extensions/hash'
         
     | 
| 
      
 4 
     | 
    
         
            +
            require "edamame/monitoring"
         
     | 
| 
       4 
5 
     | 
    
         | 
| 
       5 
6 
     | 
    
         
             
            #
         
     | 
| 
       6 
     | 
    
         
            -
            #  
     | 
| 
      
 7 
     | 
    
         
            +
            # You can load this file with
         
     | 
| 
      
 8 
     | 
    
         
            +
            #   sudo god -c ./twitter_search_daemons.god
         
     | 
| 
      
 9 
     | 
    
         
            +
            # To debug, run
         
     | 
| 
      
 10 
     | 
    
         
            +
            #   sudo god -c ./twitter_search_daemons.god -D
         
     | 
| 
       7 
11 
     | 
    
         
             
            #
         
     | 
| 
       8 
     | 
    
         
            -
             
     | 
| 
      
 12 
     | 
    
         
            +
             
     | 
| 
      
 13 
     | 
    
         
            +
            #
         
     | 
| 
      
 14 
     | 
    
         
            +
            # Change this to point to your scrape destination.
         
     | 
| 
      
 15 
     | 
    
         
            +
            #
         
     | 
| 
      
 16 
     | 
    
         
            +
            WORK_DIR = '/data/ripd/com.tw/com.twitter.search'
         
     | 
| 
      
 17 
     | 
    
         
            +
             
     | 
| 
      
 18 
     | 
    
         
            +
            #
         
     | 
| 
      
 19 
     | 
    
         
            +
            # Also, make a copy of edamame_global_config-template.yaml in that directory,
         
     | 
| 
      
 20 
     | 
    
         
            +
            # but rename it edamame_global_config.yaml and edit it to suit.
         
     | 
| 
       9 
21 
     | 
    
         
             
            #
         
     | 
| 
       10 
     | 
    
         
            -
             
     | 
| 
      
 22 
     | 
    
         
            +
            GodProcess::GLOBAL_SITE_OPTIONS_FILES << WORK_DIR+'/edamame_global_config.yaml'
         
     | 
| 
      
 23 
     | 
    
         
            +
             
     | 
| 
      
 24 
     | 
    
         
            +
            # Files will be timestamped by when god is started.
         
     | 
| 
      
 25 
     | 
    
         
            +
            DATESTAMP = Time.now.utc.strftime("%Y%m%d")
         
     | 
| 
      
 26 
     | 
    
         
            +
             
     | 
| 
      
 27 
     | 
    
         
            +
            # Uncomment for a bunch of diagnostics:
         
     | 
| 
      
 28 
     | 
    
         
            +
            # p GodProcess.global_site_options,
         
     | 
| 
      
 29 
     | 
    
         
            +
            #   TyrantGod.site_options, TyrantGod.default_options.deep_merge(TyrantGod.site_options),
         
     | 
| 
      
 30 
     | 
    
         
            +
            #   GodProcess.site_options
         
     | 
| 
      
 31 
     | 
    
         
            +
             
     | 
| 
       11 
32 
     | 
    
         
             
            #
         
     | 
| 
       12 
     | 
    
         
            -
            #  
     | 
| 
      
 33 
     | 
    
         
            +
            # Define email notifiers and attach one by default
         
     | 
| 
       13 
34 
     | 
    
         
             
            #
         
     | 
| 
       14 
     | 
    
         
            -
             
     | 
| 
       15 
     | 
    
         
            -
             
     | 
| 
       16 
     | 
    
         
            -
              [BeanstalkdGod, { :port => 11240, :max_mem_usage => 100.megabytes,  }],
         
     | 
| 
       17 
     | 
    
         
            -
              [TyrantGod,     { :port => 11241, :db_dirname => WORK_DIR, :db_name => "twitter_search-queue.tct" }],
         
     | 
| 
       18 
     | 
    
         
            -
              #
         
     | 
| 
       19 
     | 
    
         
            -
              # [TyrantGod,     { :port => 11249, :db_dirname => WORK_DIR, :db_name => "twitter_search-flat.tct" }],
         
     | 
| 
       20 
     | 
    
         
            -
            ]
         
     | 
| 
      
 35 
     | 
    
         
            +
            God.setup_email GodProcess.global_site_options[:email]
         
     | 
| 
      
 36 
     | 
    
         
            +
            GodProcess::DEFAULT_OPTIONS[:flapping_notify] = 'default'
         
     | 
| 
       21 
37 
     | 
    
         | 
| 
       22 
     | 
    
         
            -
             
     | 
| 
       23 
     | 
    
         
            -
             
     | 
| 
       24 
     | 
    
         
            -
             
     | 
| 
       25 
     | 
    
         
            -
             
     | 
| 
      
 38 
     | 
    
         
            +
            #
         
     | 
| 
      
 39 
     | 
    
         
            +
            # Twitter Search
         
     | 
| 
      
 40 
     | 
    
         
            +
            #
         
     | 
| 
      
 41 
     | 
    
         
            +
            handle     = 'comtwittersearch'
         
     | 
| 
      
 42 
     | 
    
         
            +
            base_port  = 11220
         
     | 
| 
      
 43 
     | 
    
         
            +
            db_dirname = WORK_DIR+'/distdb/'+DATESTAMP
         
     | 
| 
      
 44 
     | 
    
         
            +
            BeanstalkdGod.create :port => base_port + 0, :max_mem_usage => 100.megabytes
         
     | 
| 
      
 45 
     | 
    
         
            +
            TyrantGod.create     :port => base_port + 1, :db_name => handle+'-queue.tct', :db_dirname => db_dirname
         
     | 
    
        data/lib/wuclan/twitter.rb
    CHANGED
    
    
    
        data/lib/wuclan/twitter/model.rb
    CHANGED
    
    | 
         @@ -9,6 +9,7 @@ module Wuclan 
     | 
|
| 
       9 
9 
     | 
    
         
             
                  autoload :TwitterUserSearchId, 'wuclan/twitter/model/twitter_user'
         
     | 
| 
       10 
10 
     | 
    
         
             
                  autoload :TwitterUserId,       'wuclan/twitter/model/twitter_user'
         
     | 
| 
       11 
11 
     | 
    
         
             
                  autoload :Tweet,               'wuclan/twitter/model/tweet'
         
     | 
| 
      
 12 
     | 
    
         
            +
                  autoload :DeleteTweet,         'wuclan/twitter/model/tweet'
         
     | 
| 
       12 
13 
     | 
    
         
             
                  autoload :SearchTweet,         'wuclan/twitter/model/tweet'
         
     | 
| 
       13 
14 
     | 
    
         
             
                  autoload :AFollowsB,           'wuclan/twitter/model/relationship'
         
     | 
| 
       14 
15 
     | 
    
         
             
                  autoload :AFavoritesB,         'wuclan/twitter/model/relationship'
         
     | 
| 
         @@ -7,6 +7,14 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       7 
7 
     | 
    
         
             
                  end
         
     | 
| 
       8 
8 
     | 
    
         
             
                end
         
     | 
| 
       9 
9 
     | 
    
         | 
| 
      
 10 
     | 
    
         
            +
                def status_id
         
     | 
| 
      
 11 
     | 
    
         
            +
                  tweet_id
         
     | 
| 
      
 12 
     | 
    
         
            +
                end
         
     | 
| 
      
 13 
     | 
    
         
            +
             
     | 
| 
      
 14 
     | 
    
         
            +
                def in_reply_to_status_id
         
     | 
| 
      
 15 
     | 
    
         
            +
                  in_reply_to_status_id
         
     | 
| 
      
 16 
     | 
    
         
            +
                end
         
     | 
| 
      
 17 
     | 
    
         
            +
             
     | 
| 
       10 
18 
     | 
    
         
             
                def self.included base
         
     | 
| 
       11 
19 
     | 
    
         
             
                  base.class_eval{ extend ClassMethods }
         
     | 
| 
       12 
20 
     | 
    
         
             
                end
         
     | 
| 
         @@ -28,43 +36,43 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       28 
36 
     | 
    
         
             
              class AFavoritesB        < TypedStruct.new(
         
     | 
| 
       29 
37 
     | 
    
         
             
                  [:user_a_id,              Integer],
         
     | 
| 
       30 
38 
     | 
    
         
             
                  [:user_b_id,              Integer],
         
     | 
| 
       31 
     | 
    
         
            -
                  [: 
     | 
| 
      
 39 
     | 
    
         
            +
                  [:tweet_id,              Integer]
         
     | 
| 
       32 
40 
     | 
    
         
             
                  )
         
     | 
| 
       33 
41 
     | 
    
         
             
                include ModelCommon
         
     | 
| 
       34 
42 
     | 
    
         
             
                include RelationshipBase
         
     | 
| 
       35 
     | 
    
         
            -
                # Key on user_a-user_b- 
     | 
| 
      
 43 
     | 
    
         
            +
                # Key on user_a-user_b-tweet_id (really just user_a-tweet_id is enough)
         
     | 
| 
       36 
44 
     | 
    
         
             
                def num_key_fields()  3 end
         
     | 
| 
       37 
     | 
    
         
            -
                def numeric_id_fields()     [:user_a_id, :user_b_id, : 
     | 
| 
      
 45 
     | 
    
         
            +
                def numeric_id_fields()     [:user_a_id, :user_b_id, :tweet_id] ; end
         
     | 
| 
       38 
46 
     | 
    
         
             
              end
         
     | 
| 
       39 
47 
     | 
    
         | 
| 
       40 
48 
     | 
    
         
             
              # Direct (threaded) replies: occur at the start of a tweet.
         
     | 
| 
       41 
49 
     | 
    
         
             
              class ARepliesB           < TypedStruct.new(
         
     | 
| 
       42 
50 
     | 
    
         
             
                  [:user_a_id,              Integer],
         
     | 
| 
       43 
51 
     | 
    
         
             
                  [:user_b_id,              Integer],
         
     | 
| 
       44 
     | 
    
         
            -
                  [: 
     | 
| 
       45 
     | 
    
         
            -
                  [: 
     | 
| 
      
 52 
     | 
    
         
            +
                  [:tweet_id,              Integer],
         
     | 
| 
      
 53 
     | 
    
         
            +
                  [:in_reply_to_tweet_id,  Integer]
         
     | 
| 
       46 
54 
     | 
    
         
             
                  )
         
     | 
| 
       47 
55 
     | 
    
         
             
                include ModelCommon
         
     | 
| 
       48 
56 
     | 
    
         
             
                include RelationshipBase
         
     | 
| 
       49 
     | 
    
         
            -
                # Key on user_a-user_b- 
     | 
| 
      
 57 
     | 
    
         
            +
                # Key on user_a-user_b-tweet_id
         
     | 
| 
       50 
58 
     | 
    
         
             
                def num_key_fields()  3  end
         
     | 
| 
       51 
     | 
    
         
            -
                def numeric_id_fields()     [:user_a_id, :user_b_id, : 
     | 
| 
      
 59 
     | 
    
         
            +
                def numeric_id_fields()     [:user_a_id, :user_b_id, :tweet_id, :in_reply_to_tweet_id] ; end
         
     | 
| 
       52 
60 
     | 
    
         
             
              end
         
     | 
| 
       53 
61 
     | 
    
         | 
| 
       54 
62 
     | 
    
         
             
              # Direct (threaded) replies: occur at the start of a tweet.
         
     | 
| 
       55 
63 
     | 
    
         
             
              class ARepliesBName       < TypedStruct.new(
         
     | 
| 
       56 
     | 
    
         
            -
                  [:user_a_name, 
     | 
| 
       57 
     | 
    
         
            -
                  [:user_b_name, 
     | 
| 
       58 
     | 
    
         
            -
                  [: 
     | 
| 
       59 
     | 
    
         
            -
                  [: 
     | 
| 
       60 
     | 
    
         
            -
                  [:user_a_sid, 
     | 
| 
       61 
     | 
    
         
            -
                  [:user_b_sid, 
     | 
| 
      
 64 
     | 
    
         
            +
                  [:user_a_name,           String],
         
     | 
| 
      
 65 
     | 
    
         
            +
                  [:user_b_name,           String],
         
     | 
| 
      
 66 
     | 
    
         
            +
                  [:tweet_id,              Integer],
         
     | 
| 
      
 67 
     | 
    
         
            +
                  [:in_reply_to_tweet_id,  Integer],
         
     | 
| 
      
 68 
     | 
    
         
            +
                  [:user_a_sid,            Integer],
         
     | 
| 
      
 69 
     | 
    
         
            +
                  [:user_b_sid,            Integer]
         
     | 
| 
       62 
70 
     | 
    
         
             
                  )
         
     | 
| 
       63 
71 
     | 
    
         
             
                include ModelCommon
         
     | 
| 
       64 
72 
     | 
    
         
             
                include RelationshipBase
         
     | 
| 
       65 
     | 
    
         
            -
                # Key on user_a-user_b- 
     | 
| 
      
 73 
     | 
    
         
            +
                # Key on user_a-user_b-tweet_id
         
     | 
| 
       66 
74 
     | 
    
         
             
                def num_key_fields()  3  end
         
     | 
| 
       67 
     | 
    
         
            -
                def numeric_id_fields()     [:user_a_id, :user_b_id, : 
     | 
| 
      
 75 
     | 
    
         
            +
                def numeric_id_fields()     [:user_a_id, :user_b_id, :tweet_id, :in_reply_to_tweet_id] ; end
         
     | 
| 
       68 
76 
     | 
    
         
             
              end
         
     | 
| 
       69 
77 
     | 
    
         | 
| 
       70 
78 
     | 
    
         
             
              # Atsign mentions anywhere in the tweet
         
     | 
| 
         @@ -72,13 +80,13 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       72 
80 
     | 
    
         
             
              class AAtsignsB           < TypedStruct.new(
         
     | 
| 
       73 
81 
     | 
    
         
             
                  [:user_a_id,              Integer],
         
     | 
| 
       74 
82 
     | 
    
         
             
                  [:user_b_name,            String],
         
     | 
| 
       75 
     | 
    
         
            -
                  [: 
     | 
| 
      
 83 
     | 
    
         
            +
                  [:tweet_id,              Integer]
         
     | 
| 
       76 
84 
     | 
    
         
             
                  )
         
     | 
| 
       77 
85 
     | 
    
         
             
                include ModelCommon
         
     | 
| 
       78 
86 
     | 
    
         
             
                include RelationshipBase
         
     | 
| 
       79 
     | 
    
         
            -
                # Key on user_a-user_b- 
     | 
| 
      
 87 
     | 
    
         
            +
                # Key on user_a-user_b-tweet_id
         
     | 
| 
       80 
88 
     | 
    
         
             
                def num_key_fields()  3 end
         
     | 
| 
       81 
     | 
    
         
            -
                def numeric_id_fields()     [:user_a_id, : 
     | 
| 
      
 89 
     | 
    
         
            +
                def numeric_id_fields()     [:user_a_id, :tweet_id] ; end
         
     | 
| 
       82 
90 
     | 
    
         
             
              end
         
     | 
| 
       83 
91 
     | 
    
         | 
| 
       84 
92 
     | 
    
         
             
              # Atsign mentions anywhere in the tweet
         
     | 
| 
         @@ -86,13 +94,13 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       86 
94 
     | 
    
         
             
              class AAtsignsBId         < TypedStruct.new(
         
     | 
| 
       87 
95 
     | 
    
         
             
                  [:user_a_id,              Integer],
         
     | 
| 
       88 
96 
     | 
    
         
             
                  [:user_b_id,              Integer],
         
     | 
| 
       89 
     | 
    
         
            -
                  [: 
     | 
| 
      
 97 
     | 
    
         
            +
                  [:tweet_id,              Integer]
         
     | 
| 
       90 
98 
     | 
    
         
             
                  )
         
     | 
| 
       91 
99 
     | 
    
         
             
                include ModelCommon
         
     | 
| 
       92 
100 
     | 
    
         
             
                include RelationshipBase
         
     | 
| 
       93 
     | 
    
         
            -
                # Key on user_a-user_b- 
     | 
| 
      
 101 
     | 
    
         
            +
                # Key on user_a-user_b-tweet_id
         
     | 
| 
       94 
102 
     | 
    
         
             
                def num_key_fields()  3 end
         
     | 
| 
       95 
     | 
    
         
            -
                def numeric_id_fields()     [:user_a_id, :user_b_id, : 
     | 
| 
      
 103 
     | 
    
         
            +
                def numeric_id_fields()     [:user_a_id, :user_b_id, :tweet_id] ; end
         
     | 
| 
       96 
104 
     | 
    
         
             
              end
         
     | 
| 
       97 
105 
     | 
    
         | 
| 
       98 
106 
     | 
    
         | 
| 
         @@ -112,7 +120,7 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       112 
120 
     | 
    
         
             
              # non-retweet-whore-requests have user_b_name set and unset respectively.)
         
     | 
| 
       113 
121 
     | 
    
         
             
              #
         
     | 
| 
       114 
122 
     | 
    
         
             
              # +user_a_id:+   the user who sent the re-tweet
         
     | 
| 
       115 
     | 
    
         
            -
              # + 
     | 
| 
      
 123 
     | 
    
         
            +
              # +tweet_id:+   the id of the tweet *containing* the re-tweet (for the ID of the original tweet you're on your own.)
         
     | 
| 
       116 
124 
     | 
    
         
             
              # +user_b_name:+ the user citied as originating: RT @user_b_name
         
     | 
| 
       117 
125 
     | 
    
         
             
              # +please_flag:+ a 1 if the text contains 'please' or 'plz' as a stand-alone word
         
     | 
| 
       118 
126 
     | 
    
         
             
              # +text:+        the *full* text of the tweet
         
     | 
| 
         @@ -120,7 +128,7 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       120 
128 
     | 
    
         
             
              class ARetweetsB <  TypedStruct.new(
         
     | 
| 
       121 
129 
     | 
    
         
             
                  [:user_a_id,              Integer],
         
     | 
| 
       122 
130 
     | 
    
         
             
                  [:user_b_name,            String],
         
     | 
| 
       123 
     | 
    
         
            -
                  [: 
     | 
| 
      
 131 
     | 
    
         
            +
                  [:tweet_id,              Integer],
         
     | 
| 
       124 
132 
     | 
    
         
             
                  [:please_flag,            Integer],
         
     | 
| 
       125 
133 
     | 
    
         
             
                  [:text,                   String]
         
     | 
| 
       126 
134 
     | 
    
         
             
                  )
         
     | 
| 
         @@ -133,7 +141,7 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       133 
141 
     | 
    
         
             
                end
         
     | 
| 
       134 
142 
     | 
    
         
             
                # Key on retweeting_user-user-tweet_id
         
     | 
| 
       135 
143 
     | 
    
         
             
                def num_key_fields()  3  end
         
     | 
| 
       136 
     | 
    
         
            -
                def numeric_id_fields()     [:user_a_id, : 
     | 
| 
      
 144 
     | 
    
         
            +
                def numeric_id_fields()     [:user_a_id, :tweet_id] ; end
         
     | 
| 
       137 
145 
     | 
    
         
             
                #
         
     | 
| 
       138 
146 
     | 
    
         
             
                # If there's no user we'll assume this
         
     | 
| 
       139 
147 
     | 
    
         
             
                # is a retweet and not an rtwhore.
         
     | 
| 
         @@ -146,7 +154,7 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       146 
154 
     | 
    
         
             
              class ARetweetsBId <  TypedStruct.new(
         
     | 
| 
       147 
155 
     | 
    
         
             
                  [:user_a_id,              Integer],
         
     | 
| 
       148 
156 
     | 
    
         
             
                  [:user_b_id,              Integer],
         
     | 
| 
       149 
     | 
    
         
            -
                  [: 
     | 
| 
      
 157 
     | 
    
         
            +
                  [:tweet_id,              Integer],
         
     | 
| 
       150 
158 
     | 
    
         
             
                  [:please_flag,            Integer],
         
     | 
| 
       151 
159 
     | 
    
         
             
                  [:text,                   String]
         
     | 
| 
       152 
160 
     | 
    
         
             
                  )
         
     | 
| 
         @@ -160,7 +168,7 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       160 
168 
     | 
    
         | 
| 
       161 
169 
     | 
    
         
             
                # Key on retweeting_user-user-tweet_id
         
     | 
| 
       162 
170 
     | 
    
         
             
                def num_key_fields()  3  end
         
     | 
| 
       163 
     | 
    
         
            -
                def numeric_id_fields()     [:user_a_id, :user_b_id, : 
     | 
| 
      
 171 
     | 
    
         
            +
                def numeric_id_fields()     [:user_a_id, :user_b_id, :tweet_id] ; end
         
     | 
| 
       164 
172 
     | 
    
         | 
| 
       165 
173 
     | 
    
         
             
                #
         
     | 
| 
       166 
174 
     | 
    
         
             
                # If there's no user we'll assume this
         
     | 
| 
         @@ -31,6 +31,13 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       31 
31 
     | 
    
         
             
                def numeric_id_fields()     [:id, :twitter_user_id, :in_reply_to_status_id, :in_reply_to_user_id] ; end
         
     | 
| 
       32 
32 
     | 
    
         
             
              end
         
     | 
| 
       33 
33 
     | 
    
         | 
| 
      
 34 
     | 
    
         
            +
              class DeleteTweet < TypedStruct.new(
         
     | 
| 
      
 35 
     | 
    
         
            +
                  [:id,                      Integer     ],
         
     | 
| 
      
 36 
     | 
    
         
            +
                  [:created_at,              Bignum      ],
         
     | 
| 
      
 37 
     | 
    
         
            +
                  [:twitter_user_id,         Integer     ]
         
     | 
| 
      
 38 
     | 
    
         
            +
                  )
         
     | 
| 
      
 39 
     | 
    
         
            +
                include ModelCommon
         
     | 
| 
      
 40 
     | 
    
         
            +
              end
         
     | 
| 
       34 
41 
     | 
    
         | 
| 
       35 
42 
     | 
    
         
             
              #
         
     | 
| 
       36 
43 
     | 
    
         
             
              # SearchTweet
         
     | 
| 
         @@ -30,6 +30,8 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       30 
30 
     | 
    
         | 
| 
       31 
31 
     | 
    
         
             
              end
         
     | 
| 
       32 
32 
     | 
    
         | 
| 
      
 33 
     | 
    
         
            +
             
     | 
| 
      
 34 
     | 
    
         
            +
             
     | 
| 
       33 
35 
     | 
    
         
             
              #
         
     | 
| 
       34 
36 
     | 
    
         
             
              # Fundamental information on a user.
         
     | 
| 
       35 
37 
     | 
    
         
             
              #
         
     | 
| 
         @@ -57,6 +59,9 @@ module Wuclan::Twitter::Model 
     | 
|
| 
       57 
59 
     | 
    
         
             
                def tweets_per_day()       tweets_count.to_i    / days_since_created  end
         
     | 
| 
       58 
60 
     | 
    
         
             
              end
         
     | 
| 
       59 
61 
     | 
    
         | 
| 
      
 62 
     | 
    
         
            +
             
     | 
| 
      
 63 
     | 
    
         
            +
             
     | 
| 
      
 64 
     | 
    
         
            +
             
     | 
| 
       60 
65 
     | 
    
         
             
              #
         
     | 
| 
       61 
66 
     | 
    
         
             
              # Outside of a users/show page, when a user is mentioned
         
     | 
| 
       62 
67 
     | 
    
         
             
              # only this subset of fields appear.
         
     | 
| 
         @@ -14,8 +14,10 @@ module Wuclan 
     | 
|
| 
       14 
14 
     | 
    
         
             
                  autoload :TwitterFriendsIdsRequest,     'wuclan/twitter/scrape/twitter_ff_ids_request'
         
     | 
| 
       15 
15 
     | 
    
         
             
                  autoload :TwitterUserTimelineRequest,   'wuclan/twitter/scrape/twitter_timeline_request'
         
     | 
| 
       16 
16 
     | 
    
         
             
                  autoload :TwitterPublicTimelineRequest, 'wuclan/twitter/scrape/twitter_timeline_request'
         
     | 
| 
      
 17 
     | 
    
         
            +
                  autoload :TwitterStreamRequest,         'wuclan/twitter/scrape/twitter_stream_request'
         
     | 
| 
       17 
18 
     | 
    
         
             
                  autoload :JsonUserWithTweet,            'wuclan/twitter/scrape/twitter_json_response'
         
     | 
| 
       18 
19 
     | 
    
         
             
                  autoload :JsonTweetWithUser,            'wuclan/twitter/scrape/twitter_json_response'
         
     | 
| 
      
 20 
     | 
    
         
            +
                  autoload :JsonDeleteTweet,              'wuclan/twitter/scrape/twitter_json_response'
         
     | 
| 
       19 
21 
     | 
    
         | 
| 
       20 
22 
     | 
    
         
             
                end
         
     | 
| 
       21 
23 
     | 
    
         
             
              end
         
     | 
| 
         @@ -13,8 +13,7 @@ module Wuclan::Twitter::Scrape 
     | 
|
| 
       13 
13 
     | 
    
         | 
| 
       14 
14 
     | 
    
         
             
                def parse *args, &block
         
     | 
| 
       15 
15 
     | 
    
         
             
                  handle_special_cases!(*args, &block) or return
         
     | 
| 
       16 
     | 
    
         
            -
                   
     | 
| 
       17 
     | 
    
         
            -
                  yield self
         
     | 
| 
      
 16 
     | 
    
         
            +
                  super *args
         
     | 
| 
       18 
17 
     | 
    
         
             
                end
         
     | 
| 
       19 
18 
     | 
    
         | 
| 
       20 
19 
     | 
    
         
             
                def handle_special_cases! *args, &block
         
     | 
| 
         @@ -26,10 +25,14 @@ module Wuclan::Twitter::Scrape 
     | 
|
| 
       26 
25 
     | 
    
         
             
                end
         
     | 
| 
       27 
26 
     | 
    
         
             
              end
         
     | 
| 
       28 
27 
     | 
    
         | 
| 
       29 
     | 
    
         
            -
              class  
     | 
| 
       30 
     | 
    
         
            -
              class  
     | 
| 
       31 
     | 
    
         
            -
              class  
     | 
| 
      
 28 
     | 
    
         
            +
              class User         < TwitterUserRequest         ; include OldSkoolRequest ; end
         
     | 
| 
      
 29 
     | 
    
         
            +
              class Followers    < TwitterFollowersRequest    ; include OldSkoolRequest ; end
         
     | 
| 
      
 30 
     | 
    
         
            +
              class Friends      < TwitterFriendsRequest      ; include OldSkoolRequest ; end
         
     | 
| 
      
 31 
     | 
    
         
            +
              class FollowersIds < TwitterFollowersIdsRequest ; include OldSkoolRequest ; end
         
     | 
| 
      
 32 
     | 
    
         
            +
              class FriendsIds   < TwitterFriendsIdsRequest   ; include OldSkoolRequest ; end
         
     | 
| 
      
 33 
     | 
    
         
            +
              class Favorites    < TwitterFavoritesRequest    ; include OldSkoolRequest ; end
         
     | 
| 
       32 
34 
     | 
    
         
             
              class UserTimeline < TwitterUserTimelineRequest ; include OldSkoolRequest ; end
         
     | 
| 
      
 35 
     | 
    
         
            +
             
     | 
| 
       33 
36 
     | 
    
         
             
              class Bogus < BadRecord ;
         
     | 
| 
       34 
37 
     | 
    
         
             
                def parse suffix=nil, *args
         
     | 
| 
       35 
38 
     | 
    
         
             
                  errors = suffix.split('-')
         
     | 
| 
         @@ -27,10 +27,11 @@ module Wuclan 
     | 
|
| 
       27 
27 
     | 
    
         
             
                    # unpacks the raw API response, yielding all the relationships.
         
     | 
| 
       28 
28 
     | 
    
         
             
                    #
         
     | 
| 
       29 
29 
     | 
    
         
             
                    def parse *args, &block
         
     | 
| 
      
 30 
     | 
    
         
            +
                      return unless healthy?
         
     | 
| 
       30 
31 
     | 
    
         
             
                      parsed_contents.each do |user_b_id|
         
     | 
| 
       31 
32 
     | 
    
         
             
                        user_b_id = "%010d"%user_b_id.to_i
         
     | 
| 
       32 
33 
     | 
    
         
             
                        # B is a follower: B follows user.
         
     | 
| 
       33 
     | 
    
         
            -
                        yield AFollowsB.new(user_b_id,  
     | 
| 
      
 34 
     | 
    
         
            +
                        yield AFollowsB.new(user_b_id, twitter_user_id)
         
     | 
| 
       34 
35 
     | 
    
         
             
                      end
         
     | 
| 
       35 
36 
     | 
    
         
             
                    end
         
     | 
| 
       36 
37 
     | 
    
         
             
                  end
         
     | 
| 
         @@ -62,10 +63,11 @@ module Wuclan 
     | 
|
| 
       62 
63 
     | 
    
         
             
                    # unpacks the raw API response, yielding all the relationships.
         
     | 
| 
       63 
64 
     | 
    
         
             
                    #
         
     | 
| 
       64 
65 
     | 
    
         
             
                    def parse *args, &block
         
     | 
| 
      
 66 
     | 
    
         
            +
                      return unless healthy?
         
     | 
| 
       65 
67 
     | 
    
         
             
                      parsed_contents.each do |user_b_id|
         
     | 
| 
       66 
68 
     | 
    
         
             
                        user_b_id = "%010d"%user_b_id.to_i
         
     | 
| 
       67 
69 
     | 
    
         
             
                        # B is a friend: user follows B
         
     | 
| 
       68 
     | 
    
         
            -
                        yield AFollowsB.new( 
     | 
| 
      
 70 
     | 
    
         
            +
                        yield AFollowsB.new(twitter_user_id, user_b_id)
         
     | 
| 
       69 
71 
     | 
    
         
             
                      end
         
     | 
| 
       70 
72 
     | 
    
         
             
                    end
         
     | 
| 
       71 
73 
     | 
    
         
             
                  end
         
     | 
| 
         @@ -20,6 +20,7 @@ module Wuclan::Twitter::Scrape 
     | 
|
| 
       20 
20 
     | 
    
         
             
                # generate all the contained TwitterXXX objects
         
     | 
| 
       21 
21 
     | 
    
         
             
                #
         
     | 
| 
       22 
22 
     | 
    
         
             
                def each
         
     | 
| 
      
 23 
     | 
    
         
            +
                  return unless healthy?
         
     | 
| 
       23 
24 
     | 
    
         
             
                  if is_partial?
         
     | 
| 
       24 
25 
     | 
    
         
             
                    yield user
         
     | 
| 
       25 
26 
     | 
    
         
             
                  else
         
     | 
| 
         @@ -38,10 +39,10 @@ module Wuclan::Twitter::Scrape 
     | 
|
| 
       38 
39 
     | 
    
         
             
                # This method tries to guess, based on the fields in the raw_user, which it has.
         
     | 
| 
       39 
40 
     | 
    
         
             
                #
         
     | 
| 
       40 
41 
     | 
    
         
             
                def is_partial?
         
     | 
| 
      
 42 
     | 
    
         
            +
                  p(raw) if !raw_user
         
     | 
| 
       41 
43 
     | 
    
         
             
                  not raw_user.include?('friends_count')
         
     | 
| 
       42 
44 
     | 
    
         
             
                end
         
     | 
| 
       43 
45 
     | 
    
         | 
| 
       44 
     | 
    
         
            -
             
     | 
| 
       45 
46 
     | 
    
         
             
                def tweet
         
     | 
| 
       46 
47 
     | 
    
         
             
                  Tweet.from_hash raw_tweet if raw_tweet
         
     | 
| 
       47 
48 
     | 
    
         
             
                end
         
     | 
| 
         @@ -66,7 +67,7 @@ module Wuclan::Twitter::Scrape 
     | 
|
| 
       66 
67 
     | 
    
         
             
                #
         
     | 
| 
       67 
68 
     | 
    
         
             
                def fix_raw_user!
         
     | 
| 
       68 
69 
     | 
    
         
             
                  return unless raw_user
         
     | 
| 
       69 
     | 
    
         
            -
                  raw_user['scraped_at'] = self.moreinfo['scraped_at']
         
     | 
| 
      
 70 
     | 
    
         
            +
                  raw_user['scraped_at'] = ModelCommon.flatten_date(self.moreinfo['scraped_at'])
         
     | 
| 
       70 
71 
     | 
    
         
             
                  raw_user['created_at'] = ModelCommon.flatten_date(raw_user['created_at'])
         
     | 
| 
       71 
72 
     | 
    
         
             
                  raw_user['id']         = ModelCommon.zeropad_id(  raw_user['id'])
         
     | 
| 
       72 
73 
     | 
    
         
             
                  raw_user['protected']  = ModelCommon.unbooleanize(raw_user['protected'])
         
     | 
| 
         @@ -88,7 +89,7 @@ module Wuclan::Twitter::Scrape 
     | 
|
| 
       88 
89 
     | 
    
         
             
                  raw_tweet['created_at']             = ModelCommon.flatten_date(raw_tweet['created_at'])
         
     | 
| 
       89 
90 
     | 
    
         
             
                  raw_tweet['favorited']              = ModelCommon.unbooleanize(raw_tweet['favorited'])
         
     | 
| 
       90 
91 
     | 
    
         
             
                  raw_tweet['truncated']              = ModelCommon.unbooleanize(raw_tweet['truncated'])
         
     | 
| 
       91 
     | 
    
         
            -
                  raw_tweet['twitter_user_id']        = ModelCommon.zeropad_id( 
     | 
| 
      
 92 
     | 
    
         
            +
                  raw_tweet['twitter_user_id']        = ModelCommon.zeropad_id(   raw_user['id'] )
         
     | 
| 
       92 
93 
     | 
    
         
             
                  raw_tweet['in_reply_to_user_id']    = ModelCommon.zeropad_id(  raw_tweet['in_reply_to_user_id'])   unless raw_tweet['in_reply_to_user_id'].blank?   || (raw_tweet['in_reply_to_user_id'].to_i   == 0)
         
     | 
| 
       93 
94 
     | 
    
         
             
                  raw_tweet['in_reply_to_status_id']  = ModelCommon.zeropad_id(  raw_tweet['in_reply_to_status_id']) unless raw_tweet['in_reply_to_status_id'].blank? || (raw_tweet['in_reply_to_status_id'].to_i == 0)
         
     | 
| 
       94 
95 
     | 
    
         
             
                  Wukong.encode_components raw_tweet, 'text', 'in_reply_to_screen_name'
         
     | 
| 
         @@ -96,9 +97,7 @@ module Wuclan::Twitter::Scrape 
     | 
|
| 
       96 
97 
     | 
    
         
             
              end
         
     | 
| 
       97 
98 
     | 
    
         
             
            end
         
     | 
| 
       98 
99 
     | 
    
         | 
| 
       99 
     | 
    
         
            -
             
     | 
| 
       100 
100 
     | 
    
         
             
            class JsonUserWithTweet < JsonUserTweetPair
         
     | 
| 
       101 
     | 
    
         
            -
             
     | 
| 
       102 
101 
     | 
    
         
             
              def raw_tweet
         
     | 
| 
       103 
102 
     | 
    
         
             
                return @raw_tweet if @raw_tweet
         
     | 
| 
       104 
103 
     | 
    
         
             
                @raw_tweet = raw['status']
         
     | 
| 
         @@ -112,7 +111,6 @@ end 
     | 
|
| 
       112 
111 
     | 
    
         | 
| 
       113 
112 
     | 
    
         | 
| 
       114 
113 
     | 
    
         
             
            class JsonTweetWithUser < JsonUserTweetPair
         
     | 
| 
       115 
     | 
    
         
            -
             
     | 
| 
       116 
114 
     | 
    
         
             
              def raw_tweet
         
     | 
| 
       117 
115 
     | 
    
         
             
                @raw_tweet ||= raw
         
     | 
| 
       118 
116 
     | 
    
         
             
              end
         
     | 
| 
         @@ -122,3 +120,38 @@ class JsonTweetWithUser < JsonUserTweetPair 
     | 
|
| 
       122 
120 
     | 
    
         
             
                @raw_user
         
     | 
| 
       123 
121 
     | 
    
         
             
              end
         
     | 
| 
       124 
122 
     | 
    
         
             
            end
         
     | 
| 
      
 123 
     | 
    
         
            +
             
     | 
| 
      
 124 
     | 
    
         
            +
             
     | 
| 
      
 125 
     | 
    
         
            +
             
     | 
| 
      
 126 
     | 
    
         
            +
            class JsonDeleteTweet
         
     | 
| 
      
 127 
     | 
    
         
            +
              attr_accessor :raw, :moreinfo, :scraped_at
         
     | 
| 
      
 128 
     | 
    
         
            +
              def initialize raw, moreinfo={}
         
     | 
| 
      
 129 
     | 
    
         
            +
                self.raw        = raw
         
     | 
| 
      
 130 
     | 
    
         
            +
                self.moreinfo   = moreinfo
         
     | 
| 
      
 131 
     | 
    
         
            +
                self.scraped_at = nil # TODO -- extract this from neighbors
         
     | 
| 
      
 132 
     | 
    
         
            +
              end
         
     | 
| 
      
 133 
     | 
    
         
            +
             
     | 
| 
      
 134 
     | 
    
         
            +
              # Extracted JSON should be an array
         
     | 
| 
      
 135 
     | 
    
         
            +
              def healthy?()
         
     | 
| 
      
 136 
     | 
    
         
            +
                raw && raw.is_a?(Hash)
         
     | 
| 
      
 137 
     | 
    
         
            +
              end
         
     | 
| 
      
 138 
     | 
    
         
            +
             
     | 
| 
      
 139 
     | 
    
         
            +
              def delete_tweet
         
     | 
| 
      
 140 
     | 
    
         
            +
                Wuclan::Twitter::Model::DeleteTweet.new(
         
     | 
| 
      
 141 
     | 
    
         
            +
                  raw['delete']['status']['id'],
         
     | 
| 
      
 142 
     | 
    
         
            +
                  self.scraped_at,
         
     | 
| 
      
 143 
     | 
    
         
            +
                  raw['delete']['status']['user_id']
         
     | 
| 
      
 144 
     | 
    
         
            +
                  ) rescue nil
         
     | 
| 
      
 145 
     | 
    
         
            +
              end
         
     | 
| 
      
 146 
     | 
    
         
            +
             
     | 
| 
      
 147 
     | 
    
         
            +
              def each *args, &block
         
     | 
| 
      
 148 
     | 
    
         
            +
                return unless healthy?
         
     | 
| 
      
 149 
     | 
    
         
            +
                yield delete_tweet
         
     | 
| 
      
 150 
     | 
    
         
            +
              end
         
     | 
| 
      
 151 
     | 
    
         
            +
             
     | 
| 
      
 152 
     | 
    
         
            +
              # true if this model looks like it will parse the given JSON
         
     | 
| 
      
 153 
     | 
    
         
            +
              def self.parses? hsh
         
     | 
| 
      
 154 
     | 
    
         
            +
                # KLUDGE
         
     | 
| 
      
 155 
     | 
    
         
            +
                hsh =~ /"delete":\{/
         
     | 
| 
      
 156 
     | 
    
         
            +
              end
         
     | 
| 
      
 157 
     | 
    
         
            +
            end
         
     | 
| 
         @@ -21,6 +21,7 @@ class TwitterRequestStream < Monkeyshines::RequestStream::SimpleRequestStream 
     | 
|
| 
       21 
21 
     | 
    
         
             
              # can be a screen_name, but we need the numeric ID for followers_request's, etc.
         
     | 
| 
       22 
22 
     | 
    
         
             
              def each_request twitter_user_id, *args
         
     | 
| 
       23 
23 
     | 
    
         
             
                user_req = TwitterUserRequest.new(twitter_user_id)
         
     | 
| 
      
 24 
     | 
    
         
            +
                # this performs the request in-place: req holds the fulfilled response
         
     | 
| 
       24 
25 
     | 
    
         
             
                yield(user_req)
         
     | 
| 
       25 
26 
     | 
    
         
             
                return unless user_req.healthy?
         
     | 
| 
       26 
27 
     | 
    
         
             
                twitter_user_id = user_req.parsed_contents['id'].to_i if (user_req.parsed_contents['id'].to_i > 0)
         
     | 
| 
         @@ -0,0 +1,44 @@ 
     | 
|
| 
      
 1 
     | 
    
         
            +
            module Wuclan
         
     | 
| 
      
 2 
     | 
    
         
            +
              module Twitter
         
     | 
| 
      
 3 
     | 
    
         
            +
                module Scrape
         
     | 
| 
      
 4 
     | 
    
         
            +
             
     | 
| 
      
 5 
     | 
    
         
            +
                  class TwitterStreamRequest < Struct.new(:contents)
         
     | 
| 
      
 6 
     | 
    
         
            +
                    # Contents are JSON
         
     | 
| 
      
 7 
     | 
    
         
            +
                    include Monkeyshines::RawJsonContents
         
     | 
| 
      
 8 
     | 
    
         
            +
             
     | 
| 
      
 9 
     | 
    
         
            +
                    # self.hard_request_limit = 1
         
     | 
| 
      
 10 
     | 
    
         
            +
                    # def make_url() "http://stream.twitter.com/1/statuses/sample.json"  end
         
     | 
| 
      
 11 
     | 
    
         
            +
             
     | 
| 
      
 12 
     | 
    
         
            +
                    # Extracted JSON should be an array
         
     | 
| 
      
 13 
     | 
    
         
            +
                    def healthy?()
         
     | 
| 
      
 14 
     | 
    
         
            +
                      parsed_contents && parsed_contents.is_a?(Hash)
         
     | 
| 
      
 15 
     | 
    
         
            +
                    end
         
     | 
| 
      
 16 
     | 
    
         
            +
             
     | 
| 
      
 17 
     | 
    
         
            +
                    def parsed_as_delete_tweet *args, &block
         
     | 
| 
      
 18 
     | 
    
         
            +
                      p parsed_contents
         
     | 
| 
      
 19 
     | 
    
         
            +
                      json_obj = JsonDeleteTweet.new(parsed_contents)
         
     | 
| 
      
 20 
     | 
    
         
            +
                      json_obj.each(&block)
         
     | 
| 
      
 21 
     | 
    
         
            +
                    end
         
     | 
| 
      
 22 
     | 
    
         
            +
             
     | 
| 
      
 23 
     | 
    
         
            +
                    # Extract user and tweet
         
     | 
| 
      
 24 
     | 
    
         
            +
                    def parsed_as_tweet *args, &block
         
     | 
| 
      
 25 
     | 
    
         
            +
                      json_obj = JsonTweetWithUser.new(
         
     | 
| 
      
 26 
     | 
    
         
            +
                        parsed_contents, 'scraped_at' => parsed_contents['created_at'])
         
     | 
| 
      
 27 
     | 
    
         
            +
                      json_obj.each(&block)
         
     | 
| 
      
 28 
     | 
    
         
            +
                    end
         
     | 
| 
      
 29 
     | 
    
         
            +
             
     | 
| 
      
 30 
     | 
    
         
            +
                    #
         
     | 
| 
      
 31 
     | 
    
         
            +
                    # unpacks the raw API response, yielding all the interesting objects
         
     | 
| 
      
 32 
     | 
    
         
            +
                    # and relationships within.
         
     | 
| 
      
 33 
     | 
    
         
            +
                    #
         
     | 
| 
      
 34 
     | 
    
         
            +
                    def parse *args, &block
         
     | 
| 
      
 35 
     | 
    
         
            +
                      return unless healthy?
         
     | 
| 
      
 36 
     | 
    
         
            +
                      return parsed_as_delete_tweet(*args, &block) if JsonDeleteTweet.parses?(contents)
         
     | 
| 
      
 37 
     | 
    
         
            +
                      # else
         
     | 
| 
      
 38 
     | 
    
         
            +
                      parsed_as_tweet(*args, &block)
         
     | 
| 
      
 39 
     | 
    
         
            +
                    end
         
     | 
| 
      
 40 
     | 
    
         
            +
                  end
         
     | 
| 
      
 41 
     | 
    
         
            +
             
     | 
| 
      
 42 
     | 
    
         
            +
                end
         
     | 
| 
      
 43 
     | 
    
         
            +
              end
         
     | 
| 
      
 44 
     | 
    
         
            +
            end
         
     | 
    
        data/wuclan.gemspec
    CHANGED
    
    | 
         @@ -5,11 +5,11 @@ 
     | 
|
| 
       5 
5 
     | 
    
         | 
| 
       6 
6 
     | 
    
         
             
            Gem::Specification.new do |s|
         
     | 
| 
       7 
7 
     | 
    
         
             
              s.name = %q{wuclan}
         
     | 
| 
       8 
     | 
    
         
            -
              s.version = "0.2. 
     | 
| 
      
 8 
     | 
    
         
            +
              s.version = "0.2.1"
         
     | 
| 
       9 
9 
     | 
    
         | 
| 
       10 
10 
     | 
    
         
             
              s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
         
     | 
| 
       11 
11 
     | 
    
         
             
              s.authors = ["Philip (flip) Kromer"]
         
     | 
| 
       12 
     | 
    
         
            -
              s.date = %q{2009- 
     | 
| 
      
 12 
     | 
    
         
            +
              s.date = %q{2009-11-02}
         
     | 
| 
       13 
13 
     | 
    
         
             
              s.description = %q{Massive-scale social network analysis. Nothing to f with.}
         
     | 
| 
       14 
14 
     | 
    
         
             
              s.email = %q{flip@infochimps.org}
         
     | 
| 
       15 
15 
     | 
    
         
             
              s.extra_rdoc_files = [
         
     | 
| 
         @@ -35,6 +35,7 @@ Gem::Specification.new do |s| 
     | 
|
| 
       35 
35 
     | 
    
         
             
                 "examples/twitter/old/scrape_twitter_trending.rb",
         
     | 
| 
       36 
36 
     | 
    
         
             
                 "examples/twitter/parse/parse_twitter_requests.rb",
         
     | 
| 
       37 
37 
     | 
    
         
             
                 "examples/twitter/parse/parse_twitter_search_requests.rb",
         
     | 
| 
      
 38 
     | 
    
         
            +
                 "examples/twitter/parse/parse_twitter_stream_requests.rb",
         
     | 
| 
       38 
39 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_api/scrape_twitter_api.rb",
         
     | 
| 
       39 
40 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_api/seed.tsv",
         
     | 
| 
       40 
41 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_api/start_cache_twitter.sh",
         
     | 
| 
         @@ -49,9 +50,13 @@ Gem::Specification.new do |s| 
     | 
|
| 
       49 
50 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_hosebird/scrape_twitter_hosebird.rb",
         
     | 
| 
       50 
51 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_hosebird/test_spewer.rb",
         
     | 
| 
       51 
52 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_hosebird/twitter_hosebird_god.yaml",
         
     | 
| 
      
 53 
     | 
    
         
            +
                 "examples/twitter/scrape_twitter_search/README.textile",
         
     | 
| 
      
 54 
     | 
    
         
            +
                 "examples/twitter/scrape_twitter_search/README.textile",
         
     | 
| 
       52 
55 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_search/dump_twitter_search_jobs.rb",
         
     | 
| 
      
 56 
     | 
    
         
            +
                 "examples/twitter/scrape_twitter_search/edamame_global_config-template.yaml",
         
     | 
| 
       53 
57 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_search/load_twitter_search_jobs.rb",
         
     | 
| 
       54 
58 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_search/scrape_twitter_search.rb",
         
     | 
| 
      
 59 
     | 
    
         
            +
                 "examples/twitter/scrape_twitter_search/seed.tsv",
         
     | 
| 
       55 
60 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_search/twitter_search_daemons.god",
         
     | 
| 
       56 
61 
     | 
    
         
             
                 "lib/old/twitter_api.rb",
         
     | 
| 
       57 
62 
     | 
    
         
             
                 "lib/wuclan.rb",
         
     | 
| 
         @@ -102,6 +107,7 @@ Gem::Specification.new do |s| 
     | 
|
| 
       102 
107 
     | 
    
         
             
                 "lib/wuclan/twitter/model/tweet/tweet_token.rb",
         
     | 
| 
       103 
108 
     | 
    
         
             
                 "lib/wuclan/twitter/model/twitter_user.rb",
         
     | 
| 
       104 
109 
     | 
    
         
             
                 "lib/wuclan/twitter/model/twitter_user/style/color_to_hsv.rb",
         
     | 
| 
      
 110 
     | 
    
         
            +
                 "lib/wuclan/twitter/parse.rb",
         
     | 
| 
       105 
111 
     | 
    
         
             
                 "lib/wuclan/twitter/parse/ff_ids_parser.rb",
         
     | 
| 
       106 
112 
     | 
    
         
             
                 "lib/wuclan/twitter/parse/friends_followers_parser.rb",
         
     | 
| 
       107 
113 
     | 
    
         
             
                 "lib/wuclan/twitter/parse/generic_json_parser.rb",
         
     | 
| 
         @@ -123,6 +129,7 @@ Gem::Specification.new do |s| 
     | 
|
| 
       123 
129 
     | 
    
         
             
                 "lib/wuclan/twitter/scrape/twitter_search_job.rb",
         
     | 
| 
       124 
130 
     | 
    
         
             
                 "lib/wuclan/twitter/scrape/twitter_search_request.rb",
         
     | 
| 
       125 
131 
     | 
    
         
             
                 "lib/wuclan/twitter/scrape/twitter_search_request_stream.rb",
         
     | 
| 
      
 132 
     | 
    
         
            +
                 "lib/wuclan/twitter/scrape/twitter_stream_request.rb",
         
     | 
| 
       126 
133 
     | 
    
         
             
                 "lib/wuclan/twitter/scrape/twitter_timeline_request.rb",
         
     | 
| 
       127 
134 
     | 
    
         
             
                 "lib/wuclan/twitter/scrape/twitter_user_request.rb",
         
     | 
| 
       128 
135 
     | 
    
         
             
                 "spec/spec_helper.rb",
         
     | 
| 
         @@ -151,6 +158,7 @@ Gem::Specification.new do |s| 
     | 
|
| 
       151 
158 
     | 
    
         
             
                 "examples/twitter/old/scrape_twitter_trending.rb",
         
     | 
| 
       152 
159 
     | 
    
         
             
                 "examples/twitter/parse/parse_twitter_requests.rb",
         
     | 
| 
       153 
160 
     | 
    
         
             
                 "examples/twitter/parse/parse_twitter_search_requests.rb",
         
     | 
| 
      
 161 
     | 
    
         
            +
                 "examples/twitter/parse/parse_twitter_stream_requests.rb",
         
     | 
| 
       154 
162 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_api/scrape_twitter_api.rb",
         
     | 
| 
       155 
163 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_api/support/make_request_stats.rb",
         
     | 
| 
       156 
164 
     | 
    
         
             
                 "examples/twitter/scrape_twitter_api/support/make_requests_by_id_and_date_1.rb",
         
     | 
    
        metadata
    CHANGED
    
    | 
         @@ -1,7 +1,7 @@ 
     | 
|
| 
       1 
1 
     | 
    
         
             
            --- !ruby/object:Gem::Specification 
         
     | 
| 
       2 
2 
     | 
    
         
             
            name: wuclan
         
     | 
| 
       3 
3 
     | 
    
         
             
            version: !ruby/object:Gem::Version 
         
     | 
| 
       4 
     | 
    
         
            -
              version: 0.2. 
     | 
| 
      
 4 
     | 
    
         
            +
              version: 0.2.1
         
     | 
| 
       5 
5 
     | 
    
         
             
            platform: ruby
         
     | 
| 
       6 
6 
     | 
    
         
             
            authors: 
         
     | 
| 
       7 
7 
     | 
    
         
             
            - Philip (flip) Kromer
         
     | 
| 
         @@ -9,7 +9,7 @@ autorequire: 
     | 
|
| 
       9 
9 
     | 
    
         
             
            bindir: bin
         
     | 
| 
       10 
10 
     | 
    
         
             
            cert_chain: []
         
     | 
| 
       11 
11 
     | 
    
         | 
| 
       12 
     | 
    
         
            -
            date: 2009- 
     | 
| 
      
 12 
     | 
    
         
            +
            date: 2009-11-02 00:00:00 -06:00
         
     | 
| 
       13 
13 
     | 
    
         
             
            default_executable: 
         
     | 
| 
       14 
14 
     | 
    
         
             
            dependencies: 
         
     | 
| 
       15 
15 
     | 
    
         
             
            - !ruby/object:Gem::Dependency 
         
     | 
| 
         @@ -70,6 +70,7 @@ files: 
     | 
|
| 
       70 
70 
     | 
    
         
             
            - examples/twitter/old/scrape_twitter_trending.rb
         
     | 
| 
       71 
71 
     | 
    
         
             
            - examples/twitter/parse/parse_twitter_requests.rb
         
     | 
| 
       72 
72 
     | 
    
         
             
            - examples/twitter/parse/parse_twitter_search_requests.rb
         
     | 
| 
      
 73 
     | 
    
         
            +
            - examples/twitter/parse/parse_twitter_stream_requests.rb
         
     | 
| 
       73 
74 
     | 
    
         
             
            - examples/twitter/scrape_twitter_api/scrape_twitter_api.rb
         
     | 
| 
       74 
75 
     | 
    
         
             
            - examples/twitter/scrape_twitter_api/seed.tsv
         
     | 
| 
       75 
76 
     | 
    
         
             
            - examples/twitter/scrape_twitter_api/start_cache_twitter.sh
         
     | 
| 
         @@ -84,9 +85,12 @@ files: 
     | 
|
| 
       84 
85 
     | 
    
         
             
            - examples/twitter/scrape_twitter_hosebird/scrape_twitter_hosebird.rb
         
     | 
| 
       85 
86 
     | 
    
         
             
            - examples/twitter/scrape_twitter_hosebird/test_spewer.rb
         
     | 
| 
       86 
87 
     | 
    
         
             
            - examples/twitter/scrape_twitter_hosebird/twitter_hosebird_god.yaml
         
     | 
| 
      
 88 
     | 
    
         
            +
            - examples/twitter/scrape_twitter_search/README.textile
         
     | 
| 
       87 
89 
     | 
    
         
             
            - examples/twitter/scrape_twitter_search/dump_twitter_search_jobs.rb
         
     | 
| 
      
 90 
     | 
    
         
            +
            - examples/twitter/scrape_twitter_search/edamame_global_config-template.yaml
         
     | 
| 
       88 
91 
     | 
    
         
             
            - examples/twitter/scrape_twitter_search/load_twitter_search_jobs.rb
         
     | 
| 
       89 
92 
     | 
    
         
             
            - examples/twitter/scrape_twitter_search/scrape_twitter_search.rb
         
     | 
| 
      
 93 
     | 
    
         
            +
            - examples/twitter/scrape_twitter_search/seed.tsv
         
     | 
| 
       90 
94 
     | 
    
         
             
            - examples/twitter/scrape_twitter_search/twitter_search_daemons.god
         
     | 
| 
       91 
95 
     | 
    
         
             
            - lib/old/twitter_api.rb
         
     | 
| 
       92 
96 
     | 
    
         
             
            - lib/wuclan.rb
         
     | 
| 
         @@ -136,6 +140,7 @@ files: 
     | 
|
| 
       136 
140 
     | 
    
         
             
            - lib/wuclan/twitter/model/tweet/tweet_token.rb
         
     | 
| 
       137 
141 
     | 
    
         
             
            - lib/wuclan/twitter/model/twitter_user.rb
         
     | 
| 
       138 
142 
     | 
    
         
             
            - lib/wuclan/twitter/model/twitter_user/style/color_to_hsv.rb
         
     | 
| 
      
 143 
     | 
    
         
            +
            - lib/wuclan/twitter/parse.rb
         
     | 
| 
       139 
144 
     | 
    
         
             
            - lib/wuclan/twitter/parse/ff_ids_parser.rb
         
     | 
| 
       140 
145 
     | 
    
         
             
            - lib/wuclan/twitter/parse/friends_followers_parser.rb
         
     | 
| 
       141 
146 
     | 
    
         
             
            - lib/wuclan/twitter/parse/generic_json_parser.rb
         
     | 
| 
         @@ -157,6 +162,7 @@ files: 
     | 
|
| 
       157 
162 
     | 
    
         
             
            - lib/wuclan/twitter/scrape/twitter_search_job.rb
         
     | 
| 
       158 
163 
     | 
    
         
             
            - lib/wuclan/twitter/scrape/twitter_search_request.rb
         
     | 
| 
       159 
164 
     | 
    
         
             
            - lib/wuclan/twitter/scrape/twitter_search_request_stream.rb
         
     | 
| 
      
 165 
     | 
    
         
            +
            - lib/wuclan/twitter/scrape/twitter_stream_request.rb
         
     | 
| 
       160 
166 
     | 
    
         
             
            - lib/wuclan/twitter/scrape/twitter_timeline_request.rb
         
     | 
| 
       161 
167 
     | 
    
         
             
            - lib/wuclan/twitter/scrape/twitter_user_request.rb
         
     | 
| 
       162 
168 
     | 
    
         
             
            - spec/spec_helper.rb
         
     | 
| 
         @@ -207,6 +213,7 @@ test_files: 
     | 
|
| 
       207 
213 
     | 
    
         
             
            - examples/twitter/old/scrape_twitter_trending.rb
         
     | 
| 
       208 
214 
     | 
    
         
             
            - examples/twitter/parse/parse_twitter_requests.rb
         
     | 
| 
       209 
215 
     | 
    
         
             
            - examples/twitter/parse/parse_twitter_search_requests.rb
         
     | 
| 
      
 216 
     | 
    
         
            +
            - examples/twitter/parse/parse_twitter_stream_requests.rb
         
     | 
| 
       210 
217 
     | 
    
         
             
            - examples/twitter/scrape_twitter_api/scrape_twitter_api.rb
         
     | 
| 
       211 
218 
     | 
    
         
             
            - examples/twitter/scrape_twitter_api/support/make_request_stats.rb
         
     | 
| 
       212 
219 
     | 
    
         
             
            - examples/twitter/scrape_twitter_api/support/make_requests_by_id_and_date_1.rb
         
     |