wuclan 0.2.0 → 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,9 +1,29 @@
1
1
 
2
- h2. Help!
2
+ Wuclan uses "Wukong":http://mrflip.github.com/wukong (Hadoop massive-data processing made easy) and "Monkeyshines":http://mrflip.github.com/monkeyshines (massive-scale directed scraper) to grok the deep structure of social networks. It is designed to scrape in a way that respectful of the terms and technical limits of each site while being agressive and efficient with your resources. We use it in practice to collect and analyze social graphs as large as 50 million-nodes, 1 billion-edges, 500 GB raw data -- all of it actual data extracted in compliance with the site's terms of service.
3
3
 
4
- Send Wuclan questions to the "Infinite Monkeywrench mailing list":http://groups.google.com/group/infochimps-code
4
+ Currently wuclan handles:
5
5
 
6
- h3. lib/wuclan/models
6
+ * Twitter -- API
7
+ * Twitter -- Search
8
+ * Twitter -- Hosebird
9
+ * Last.fm
10
+ * Opensocial
11
+
12
+ <notextile><div class="toggle"></notextile>
13
+
14
+ h2. Why?
15
+
16
+ APIs are nice and all, but they prevent any insight into a) global properties, or b) deep structure. You can't find global word frequency and dispersion, or average clustering coefficient, or calculate pagerank, or determine weighted-shortest-paths connections between two people through an API call. But with a 10 machine hadoop cluster and a good-sized collection of data, you can (and wuclan has scripts to help answer many of those questions).
17
+
18
+ Wuclan is strictly meant for such massive-scale investigations. Unless you're planning to do your final analysis on either hadoop or an enterprise-grade database system it's probably not worth the hassle.
19
+
20
+ <notextile></div><div class="toggle"></notextile>
21
+
22
+ h2. Wuclan: Scraping
23
+
24
+ is almost ready for public use. Check back shortly.
25
+
26
+ h3. lib/wuclan/*/models
7
27
 
8
28
  Defines the Wukong objects we'll most often use
9
29
 
@@ -12,9 +32,7 @@ Defines the Wukong objects we'll most often use
12
32
  * TwitterUser
13
33
  * TwitterUserProfiles
14
34
 
15
-
16
-
17
- h3. lib/wuclan/request
35
+ h3. lib/wuclan/*/request
18
36
 
19
37
 
20
38
  * Request -- the basic request metadata
@@ -25,4 +43,61 @@ h3. lib/wuclan/request
25
43
  ensures that the request is left alone while recordizing.
26
44
 
27
45
 
46
+ <notextile></div><div class="toggle"></notextile>
47
+
48
+ h2. Wuclan: Analysis
49
+
50
+ actually most of this still lives in the imw_twitter_friends repo.
51
+
52
+ <notextile></div><div class="toggle"></notextile>
53
+
54
+ h2. Install
55
+
56
+ ** "Main Install and Setup Documentation":http://mrflip.github.com/edamame/INSTALL.html **
57
+
58
+ h3. Get the code
59
+
60
+ We're still actively developing edamame. The newest version is available via "Git":http://git-scm.com on "github:":http://github.com/mrflip/edamame
61
+
62
+ pre. $ git clone git://github.com/mrflip/edamame
63
+
64
+ A gem is available from "gemcutter:":http://gemcutter.org/gems/edamame
65
+
66
+ pre. $ sudo gem install edamame --source=http://gemcutter.org
67
+
68
+ (don't use the gems.github.com version -- it's way out of date.)
69
+
70
+ You can instead download this project in either "zip":http://github.com/mrflip/edamame/zipball/master or "tar":http://github.com/mrflip/edamame/tarball/master formats.
71
+
72
+ h3. Get the Dependencies
73
+
74
+ To finish setting up, see the "detailed setup instructions":http://mrflip.github.com/edamame/INSTALL.html and then read the "usage notes":http://mrflip.github.com/edamame/usage.html
75
+
76
+ * "beanstalkd 1.3,":http://xph.us/dist/beanstalkd/ "libevent 1.4,":http://monkey.org/~provos/libevent/ and "beanstalk-client":http://github.com/dustin/beanstalk-client
77
+ * "Tokyo Tyrant,":http://tokyocabinet.sourceforge.net/tyrantdoc/ "Tokyo Tyrant Ruby libs,":http://tokyocabinet.sourceforge.net/tyrantrubydoc/ "Tokyo Cabinet,":http://tokyocabinet.sourceforge.net and "Tokyo Cabinet Ruby libs":http://tokyocabinet.sourceforge.net/tyrantdoc/
78
+ * Gems: "wukong":http://mrflip.github.com/wukong and "monkeyshines":http://mrflip.github.com/monkeyshines
79
+
80
+ See the "Detailed install instructions":http://mrflip.github.com/edamame/INSTALL.html (it also has hints about installing Tokyo*, Beanstalkd and friends.
81
+
82
+ <notextile></div><div class="toggle"></notextile>
83
+
28
84
  h3. lib/wuclan/
85
+
86
+
87
+ ---------------------------------------------------------------------------
88
+
89
+ <notextile><div class="toggle"></notextile>
90
+
91
+ h2. More info
92
+
93
+ There are many useful examples in the examples/ directory.
94
+
95
+ h3. Credits
96
+
97
+ wuclan was written by "Philip (flip) Kromer":http://mrflip.com (flip@infochimps.org / "@mrflip":http://twitter.com/mrflip) for the "infochimps project":http://infochimps.org
98
+
99
+ h3. Help!
100
+
101
+ Send wuclan questions to the "Infinite Monkeywrench mailing list":http://groups.google.com/group/infochimps-code
102
+
103
+ <notextile></div></notextile>
@@ -5,23 +5,24 @@ require 'wukong'
5
5
  require 'monkeyshines'
6
6
 
7
7
  require 'wuclan/twitter'
8
- # if you're anyone but original author this next require is useless but harmless.
9
- require 'wuclan/twitter/scrape/old_skool_request_classes'
10
8
  # un-namespace request classes.
11
9
  include Wuclan::Twitter::Scrape
12
10
  include Wuclan::Twitter::Model
11
+ # if you're anyone but original author this next require is useless but harmless.
12
+ require 'wuclan/twitter/scrape/old_skool_request_classes'
13
13
 
14
14
  #
15
+ # Incoming objects are Wuclan::Twitter::Scrape requests.
15
16
  #
16
- # Instantiate each incoming request.
17
- # Stream out the contained classes it generates.
18
- #
17
+ # Their #parse method disgorges a stream of Wuclan::Twitter::Model objects, as
18
+ # few or as many as found. For example, a twitter_user_request will assumedly
19
+ # have a twitter_user record if it is healthy, but may not have a tweet (if the
20
+ # user hasn't ever tweeted) and might not have profile or style info (if the
21
+ # user is protected).
19
22
  #
20
23
  class TwitterRequestParser < Wukong::Streamer::StructStreamer
21
-
22
24
  def process request, *args, &block
23
25
  request.parse(*args) do |obj|
24
- next if obj.is_a? BadRecord
25
26
  yield obj.to_flat(false)
26
27
  end
27
28
  end
@@ -31,29 +32,34 @@ end
31
32
  # We want to record each individual state of the resource, with the last-seen of
32
33
  # its timestamps (if there are many). So if we saw
33
34
  #
34
- # rsrc id screen_name followers_count friends_count (... more)
35
- # user 23 skidoo 47 61
36
- # user 23 skidoo 48 62
37
- # user 23 skidoo 48 62
38
- # user 23 skidoo 52 62
39
- # user 23 skidoo 52 63
35
+ # rsrc id screen_name followers_count friends_count (...) scraped_at
36
+ # user 23 skidoo 47 61 20090608
37
+ # user 23 skidoo 48 62 20090802
38
+ # user 23 skidoo 48 62 20090901
39
+ # user 23 skidoo 52 62 20090920
40
+ # user 23 skidoo 52 62 20090922
41
+ # user 23 skidoo 52 63 20090923
42
+ #
43
+ # we would only keep
40
44
  #
45
+ # user 23 skidoo 47 61 20090608
46
+ # user 23 skidoo 48 62 20090802
47
+ # user 23 skidoo 52 62 20090920
48
+ # user 23 skidoo 52 63 20090922
41
49
  #
42
50
  class TwitterRequestUniqer < Wukong::Streamer::UniqByLastReducer
43
51
  include Wukong::Streamer::StructRecordizer
44
-
45
52
  attr_accessor :uniquer_count
46
53
 
47
54
  #
48
- #
49
- #
55
+ # FIXME -- move this into the models themselves.
50
56
  #
51
57
  # for immutable objects we can just work off their ID.
52
58
  #
53
59
  # for mutable objects we want to record each unique state: all the fields
54
60
  # apart from the scraped_at timestamp.
55
61
  #
56
- def get_key obj
62
+ def get_key obj, *_
57
63
  case obj
58
64
  when Tweet
59
65
  obj.id
@@ -71,7 +77,7 @@ class TwitterRequestUniqer < Wukong::Streamer::UniqByLastReducer
71
77
  super *args
72
78
  end
73
79
 
74
- def accumulate obj
80
+ def accumulate obj, *_
75
81
  self.uniquer_count += 1
76
82
  self.final_value = [self.uniquer_count, obj.to_flat].flatten
77
83
  end
@@ -1,28 +1,24 @@
1
1
  #!/usr/bin/env ruby
2
- #$: << ENV['WUKONG_PATH']
3
2
  require 'rubygems'
4
3
  require 'wukong'
5
- require 'monkeyshines'
6
-
7
- require 'wuclan/twitter'
8
- require 'wuclan/twitter/scrape/twitter_search_request'
9
- require 'wuclan/twitter/parse/twitter_search_parse'
4
+ require 'wuclan/twitter';
5
+ require 'wuclan/twitter/parse';
10
6
  include Wuclan::Twitter::Scrape
11
7
 
12
- #
13
- #
14
- # Instantiate each incoming request.
15
- # Stream out the contained classes it generates.
16
- #
17
- #
18
8
  class TwitterRequestParser < Wukong::Streamer::StructStreamer
9
+ #
10
+ # Object: parse thyself.
11
+ #
19
12
  def process request, *args, &block
20
13
  request.parse(*args) do |obj|
21
- next if obj.is_a? BadRecord
22
- yield obj.to_flat(false)
14
+ next if obj.blank? || obj.is_a?(BadRecord)
15
+ yield obj
23
16
  end
24
17
  end
25
18
  end
26
19
 
27
- # This makes the script go.
28
- Wukong::Script.new(TwitterRequestParser, nil).run
20
+ # Go, script, go!
21
+ Wukong::Script.new(
22
+ TwitterRequestParser,
23
+ nil
24
+ ).run
@@ -0,0 +1,40 @@
1
+ #!/usr/bin/env ruby
2
+ require 'rubygems'
3
+ require 'wukong'
4
+ require 'monkeyshines';
5
+ require 'wuclan/twitter';
6
+ require 'wuclan/twitter/scrape/twitter_search_request';
7
+ require 'wuclan/twitter/parse/twitter_search_parse';
8
+ include Wuclan::Twitter::Scrape
9
+
10
+
11
+
12
+ #
13
+ # Twitter stream requests
14
+ #
15
+ # http://apiwiki.twitter.com/Streaming-API-Documentation
16
+ #
17
+ # Fills a file with JSON status records, one line per status.
18
+ #
19
+ # {"text":"Hey #bigdata #hadoop geeks: who's missing? @mrflip/bigdata / http://bit.ly/datatweeps","favorited":false,"geo":null,"in_reply_to_screen_name":null,"source":"web","created_at":"Thu Oct 29 09:29:32 +0000 2009","user":{"verified":false,"notifications":null,"profile_text_color":"000000","time_zone":"Central Time (US & Canada)","following":null,"profile_link_color":"0000ff","profile_image_url":"http://a3.twimg.com/profile_images/377919497/FlipCircle-2009-900-trans_normal.png","profile_background_image_url":"http://a3.twimg.com/profile_background_images/2348065/2005Mar-AustinTypeTour-075_-_Rappers_Delight_Raindrop.jpg","description":"Increasing access to free open data, building tools to Organize, Explore and Comprehend massive data sources - http://infochimps.org","location":"iPhone: 30.316122,-97.733817","profile_sidebar_fill_color":"ffffff","screen_name":"mrflip","profile_background_tile":false,"profile_sidebar_border_color":"f0edd8","statuses_count":1307,"followers_count":678,"protected":false,"url":"http://infochimps.org","created_at":"Mon Mar 19 21:08:24 +0000 2007","friends_count":514,"name":"Philip Flip Kromer","geo_enabled":false,"profile_background_color":"BCC0C8","id":1554031,"utc_offset":-21600,"favourites_count":61},"id":5254924802,"in_reply_to_user_id":null,"in_reply_to_status_id":null,"truncated":false}
20
+ #
21
+ # Try it with
22
+ # twuserpass='name:pass'
23
+ # curl -s -u $twpass http://stream.twitter.com/1/statuses/sample.json > /tmp/sample.json
24
+ # cat /tmp/sample.json | parse_twitter_stream_requests.rb --map
25
+ #
26
+ class TwitterRequestParser < Wukong::Streamer::RecordStreamer
27
+ def recordize *args
28
+ foo = args.first
29
+ [ TwitterStreamRequest.new(super(*args).first) ]
30
+ end
31
+ def process request, *args, &block
32
+ request.parse(*args) do |obj|
33
+ next if obj.is_a? BadRecord
34
+ yield obj.to_flat(false) # if obj.is_a?(DeleteTweet)
35
+ end
36
+ end
37
+ end
38
+
39
+ # This makes the script go.
40
+ Wukong::Script.new(TwitterRequestParser, nil).run
@@ -0,0 +1,95 @@
1
+ This is actually less rickety than it seems, but you'll have to hand edit a few paths and config files. Feel free to suggest a more polite organization of it all.
2
+
3
+ h2. Initial setup
4
+
5
+ * Install prerequisites using rubygems:
6
+
7
+ <pre>
8
+ sudo gem install htmlentities extlib god
9
+ </pre>
10
+
11
+ * check out each of the "monkeyshines":http://github.com/mrflip/monkeyshines, "wukong":http://github.com/mrflip/wukong, "wuclan":http://github.com/mrflip/wuclan, and "edamame":http://github.com/mrflip/edamame repos using git, preferably as neighbors in the same directory.
12
+
13
+ * follow instructions from http://mrflip.github.com/edamame/INSTALL.html for beanstalkd and tokyo tyrant
14
+
15
+ h2. Find the scraper
16
+
17
+ Although you can install wuclan as a gem, I actually recommend installing it from git source:
18
+
19
+ <pre>
20
+ git clone git://github.com/mrflip/wuclan.git
21
+ </pre>
22
+
23
+ You'll run the scraper from
24
+
25
+ <pre>
26
+ wuclan/examples/twitter/scrape_twitter_search
27
+ </pre>
28
+
29
+ h2. Make the scrape destination
30
+
31
+ You will need to set up a landing place for the files, probably by editing the work/ symlink (sorry, this is kludgy and should be fixed).
32
+
33
+ The naming scheme I use is good for running scrapers against a lot of targets. From the wuclan/examples/twitter/scrape_twitter_search directory:
34
+
35
+ <pre>
36
+ mkdir ../../../../data/ripd/com.tw/com.twitter.search
37
+ </pre>
38
+
39
+ (this constructs a tree that is a sibling of the wuclan dir).
40
+
41
+ Wherever you put the scrape destination,
42
+ * DO NOT add it to your code's git repo
43
+ * exclude it from spotlight indexing and so forth
44
+
45
+ h2. Add your search terms to seed.tsv
46
+
47
+ To add a search job, edit seed.tsv: add each search phrase and its priority, separated by a tab (Lower priority == more important). **Don't** url-encode your query terms. Spaces will be replaced by plus signs+ and other non-alphanumerics will be url-encoded.
48
+
49
+ h2. Start the queue daemons
50
+
51
+ Copy @edamame_global_config-template.yaml@ to your scrape destination, and name it @edamame_global_config.yaml@ Also, edit the @./twitter_search_daemons.god@ file to indicate the scrape destination.
52
+
53
+ Use god to start the daemons: @sudo god -c ./twitter_search_daemons.god@ (add the -D flag to debug)
54
+
55
+ h2. Load the search terms
56
+
57
+ Load this data with
58
+
59
+ <pre>
60
+ ./load_twitter_search_jobs.rb --handle=com.twitter.search --source-filename=./seed_lim.tsv
61
+ </pre>
62
+
63
+ You can check it was loaded with
64
+
65
+ <pre>
66
+ /path/to/edamame/bin/edamame-sync --handle=com.twitter.search --store=:11241 --queue=:11240
67
+ </pre>
68
+
69
+ (This unloads all jobs from the transient queue and stuffs them back in from the database).
70
+
71
+ Empty all search queues with
72
+
73
+ <pre>
74
+ /path/to/edamame/bin/edamame-nuke --handle=com.twitter.search --store=:11241 --queue=:11240
75
+ </pre>
76
+
77
+ h2. Run the scraper
78
+
79
+ <pre>
80
+ nohup ./scrape_twitter_search.rb --handle=com.twitter.search >> work/log/twitter_search-console.log 2>&1 &
81
+ </pre>
82
+
83
+ This will run forever. Check its progress with
84
+
85
+ <pre>
86
+ tail -f work/log/twitter_search-console.log
87
+ </pre>
88
+
89
+ If you want to watch the output files,
90
+
91
+ <pre>
92
+ datename=`date "+%Y%m%d"` ; tail -f work/$datename/* | cut -c 1-2000
93
+ </pre>
94
+
95
+ Be careful dumping the output files to screen -- each line can be tens of thousand characters long and will lock your terminal right up.
@@ -0,0 +1,17 @@
1
+ --- # -*- YAML -*-
2
+ #
3
+ # Save this file in your god dir, *then* change the settings below.
4
+ # Make sure your version control system is set to ignore the file.
5
+ #
6
+
7
+ :email:
8
+ :domain: your.domain.com
9
+ :username: robot@your.domain.com
10
+ :password: YOURPASSWORD
11
+ :to: people_who_can_fix_errors@your.domain.com
12
+ :to_name: People who can fix the scraper
13
+
14
+ # these apply to all processes
15
+ :god_process:
16
+ :flapping_notify: default
17
+
@@ -34,7 +34,7 @@ loop do
34
34
  :dest => { :type => :chunked_flat_file_store, :rootdir => WORK_DIR, :filemode => 'a' },
35
35
  # :dest => { :type => :flat_file_store, :filename => WORK_DIR+"/test_output.tsv" },
36
36
  # :fetcher => { :type => TwitterSearchFakeFetcher },
37
- :sleep_time => 1 ,
37
+ :sleep_time => 1.25 ,
38
38
  })
39
39
  Log.info "Starting a run!"
40
40
  scraper.run
@@ -0,0 +1,19 @@
1
+ # See the readme for instructions.
2
+
3
+ # To add a search job, put the search phrase and its priority, separated by a
4
+ # tab (Lower priority == more important).
5
+
6
+ red sox 1000
7
+ yankees 1000
8
+
9
+ # You can recycle the output of dump_twitter_search_jobs
10
+ hadoop 50 0.103063874053513 4985477488 3852 9700.94447529118952 20091019021751
11
+ infochimp 50 0.106963481416675 4827288395 62 9347.39116701332932 20091019022759
12
+ infochimps 50 0.102891460905350 4922555326 575 9717.31832415175086 20091019025808
13
+ semantic 50 0.103387063739869 4985327841 4400 9670.39361100528913 20091019021435
14
+ semanticweb 50 0.102956390169747 4986072386 3156 9711.01359700656940 20091019025943
15
+
16
+ # These will quickly generate a buttload of data
17
+ # RT 110000 6.977299880525690 4986408447 9628514 115.87012880821899 20091019032753
18
+ # http 110000 28.312757201646100 4986411833 70327665 6.90163844186046 20091019032825
19
+ # twitter 110000 1.319672131147540 4986376554 7432567 733.71915915527904 20091019032511
@@ -1,25 +1,45 @@
1
- $: << File.dirname(__FILE__)+'/../../../../edamame/lib'
2
- require 'edamame/monitoring'
3
- WORK_DIR = File.dirname(__FILE__)+'/work'
1
+ require 'yaml'
2
+ require 'extlib'
3
+ require 'wukong/extensions/hash'
4
+ require "edamame/monitoring"
4
5
 
5
6
  #
6
- # For debugging:
7
+ # You can load this file with
8
+ # sudo god -c ./twitter_search_daemons.god
9
+ # To debug, run
10
+ # sudo god -c ./twitter_search_daemons.god -D
7
11
  #
8
- # sudo god -c this_file.god -D
12
+
13
+ #
14
+ # Change this to point to your scrape destination.
15
+ #
16
+ WORK_DIR = '/data/ripd/com.tw/com.twitter.search'
17
+
18
+ #
19
+ # Also, make a copy of edamame_global_config-template.yaml in that directory,
20
+ # but rename it edamame_global_config.yaml and edit it to suit.
9
21
  #
10
- # (for production, use the etc/initc.d script in this directory)
22
+ GodProcess::GLOBAL_SITE_OPTIONS_FILES << WORK_DIR+'/edamame_global_config.yaml'
23
+
24
+ # Files will be timestamped by when god is started.
25
+ DATESTAMP = Time.now.utc.strftime("%Y%m%d")
26
+
27
+ # Uncomment for a bunch of diagnostics:
28
+ # p GodProcess.global_site_options,
29
+ # TyrantGod.site_options, TyrantGod.default_options.deep_merge(TyrantGod.site_options),
30
+ # GodProcess.site_options
31
+
11
32
  #
12
- # TODO: define an EdamameDirector that lets us name these collections.
33
+ # Define email notifiers and attach one by default
13
34
  #
14
- THE_FAITHFUL = [
15
- # twitter_search
16
- [BeanstalkdGod, { :port => 11240, :max_mem_usage => 100.megabytes, }],
17
- [TyrantGod, { :port => 11241, :db_dirname => WORK_DIR, :db_name => "twitter_search-queue.tct" }],
18
- #
19
- # [TyrantGod, { :port => 11249, :db_dirname => WORK_DIR, :db_name => "twitter_search-flat.tct" }],
20
- ]
35
+ God.setup_email GodProcess.global_site_options[:email]
36
+ GodProcess::DEFAULT_OPTIONS[:flapping_notify] = 'default'
21
37
 
22
- THE_FAITHFUL.each do |klass, config|
23
- proc = klass.create(config.merge :flapping_notify => 'default')
24
- proc.mkdirs!
25
- end
38
+ #
39
+ # Twitter Search
40
+ #
41
+ handle = 'comtwittersearch'
42
+ base_port = 11220
43
+ db_dirname = WORK_DIR+'/distdb/'+DATESTAMP
44
+ BeanstalkdGod.create :port => base_port + 0, :max_mem_usage => 100.megabytes
45
+ TyrantGod.create :port => base_port + 1, :db_name => handle+'-queue.tct', :db_dirname => db_dirname
@@ -1,3 +1,5 @@
1
+ $KCODE='u' unless "1.9".respond_to?(:encoding)
2
+
1
3
  module Wuclan
2
4
  module Twitter
3
5
  autoload :Scrape, 'wuclan/twitter/scrape'
@@ -9,6 +9,7 @@ module Wuclan
9
9
  autoload :TwitterUserSearchId, 'wuclan/twitter/model/twitter_user'
10
10
  autoload :TwitterUserId, 'wuclan/twitter/model/twitter_user'
11
11
  autoload :Tweet, 'wuclan/twitter/model/tweet'
12
+ autoload :DeleteTweet, 'wuclan/twitter/model/tweet'
12
13
  autoload :SearchTweet, 'wuclan/twitter/model/tweet'
13
14
  autoload :AFollowsB, 'wuclan/twitter/model/relationship'
14
15
  autoload :AFavoritesB, 'wuclan/twitter/model/relationship'
@@ -7,6 +7,14 @@ module Wuclan::Twitter::Model
7
7
  end
8
8
  end
9
9
 
10
+ def status_id
11
+ tweet_id
12
+ end
13
+
14
+ def in_reply_to_status_id
15
+ in_reply_to_status_id
16
+ end
17
+
10
18
  def self.included base
11
19
  base.class_eval{ extend ClassMethods }
12
20
  end
@@ -28,43 +36,43 @@ module Wuclan::Twitter::Model
28
36
  class AFavoritesB < TypedStruct.new(
29
37
  [:user_a_id, Integer],
30
38
  [:user_b_id, Integer],
31
- [:status_id, Integer]
39
+ [:tweet_id, Integer]
32
40
  )
33
41
  include ModelCommon
34
42
  include RelationshipBase
35
- # Key on user_a-user_b-status_id (really just user_a-status_id is enough)
43
+ # Key on user_a-user_b-tweet_id (really just user_a-tweet_id is enough)
36
44
  def num_key_fields() 3 end
37
- def numeric_id_fields() [:user_a_id, :user_b_id, :status_id] ; end
45
+ def numeric_id_fields() [:user_a_id, :user_b_id, :tweet_id] ; end
38
46
  end
39
47
 
40
48
  # Direct (threaded) replies: occur at the start of a tweet.
41
49
  class ARepliesB < TypedStruct.new(
42
50
  [:user_a_id, Integer],
43
51
  [:user_b_id, Integer],
44
- [:status_id, Integer],
45
- [:in_reply_to_status_id, Integer]
52
+ [:tweet_id, Integer],
53
+ [:in_reply_to_tweet_id, Integer]
46
54
  )
47
55
  include ModelCommon
48
56
  include RelationshipBase
49
- # Key on user_a-user_b-status_id
57
+ # Key on user_a-user_b-tweet_id
50
58
  def num_key_fields() 3 end
51
- def numeric_id_fields() [:user_a_id, :user_b_id, :status_id, :in_reply_to_status_id] ; end
59
+ def numeric_id_fields() [:user_a_id, :user_b_id, :tweet_id, :in_reply_to_tweet_id] ; end
52
60
  end
53
61
 
54
62
  # Direct (threaded) replies: occur at the start of a tweet.
55
63
  class ARepliesBName < TypedStruct.new(
56
- [:user_a_name, Integer],
57
- [:user_b_name, Integer],
58
- [:status_id, Integer],
59
- [:in_reply_to_status_id, Integer],
60
- [:user_a_sid, Integer],
61
- [:user_b_sid, Integer]
64
+ [:user_a_name, String],
65
+ [:user_b_name, String],
66
+ [:tweet_id, Integer],
67
+ [:in_reply_to_tweet_id, Integer],
68
+ [:user_a_sid, Integer],
69
+ [:user_b_sid, Integer]
62
70
  )
63
71
  include ModelCommon
64
72
  include RelationshipBase
65
- # Key on user_a-user_b-status_id
73
+ # Key on user_a-user_b-tweet_id
66
74
  def num_key_fields() 3 end
67
- def numeric_id_fields() [:user_a_id, :user_b_id, :status_id, :in_reply_to_status_id] ; end
75
+ def numeric_id_fields() [:user_a_id, :user_b_id, :tweet_id, :in_reply_to_tweet_id] ; end
68
76
  end
69
77
 
70
78
  # Atsign mentions anywhere in the tweet
@@ -72,13 +80,13 @@ module Wuclan::Twitter::Model
72
80
  class AAtsignsB < TypedStruct.new(
73
81
  [:user_a_id, Integer],
74
82
  [:user_b_name, String],
75
- [:status_id, Integer]
83
+ [:tweet_id, Integer]
76
84
  )
77
85
  include ModelCommon
78
86
  include RelationshipBase
79
- # Key on user_a-user_b-status_id
87
+ # Key on user_a-user_b-tweet_id
80
88
  def num_key_fields() 3 end
81
- def numeric_id_fields() [:user_a_id, :status_id] ; end
89
+ def numeric_id_fields() [:user_a_id, :tweet_id] ; end
82
90
  end
83
91
 
84
92
  # Atsign mentions anywhere in the tweet
@@ -86,13 +94,13 @@ module Wuclan::Twitter::Model
86
94
  class AAtsignsBId < TypedStruct.new(
87
95
  [:user_a_id, Integer],
88
96
  [:user_b_id, Integer],
89
- [:status_id, Integer]
97
+ [:tweet_id, Integer]
90
98
  )
91
99
  include ModelCommon
92
100
  include RelationshipBase
93
- # Key on user_a-user_b-status_id
101
+ # Key on user_a-user_b-tweet_id
94
102
  def num_key_fields() 3 end
95
- def numeric_id_fields() [:user_a_id, :user_b_id, :status_id] ; end
103
+ def numeric_id_fields() [:user_a_id, :user_b_id, :tweet_id] ; end
96
104
  end
97
105
 
98
106
 
@@ -112,7 +120,7 @@ module Wuclan::Twitter::Model
112
120
  # non-retweet-whore-requests have user_b_name set and unset respectively.)
113
121
  #
114
122
  # +user_a_id:+ the user who sent the re-tweet
115
- # +status_id:+ the id of the tweet *containing* the re-tweet (for the ID of the original tweet you're on your own.)
123
+ # +tweet_id:+ the id of the tweet *containing* the re-tweet (for the ID of the original tweet you're on your own.)
116
124
  # +user_b_name:+ the user citied as originating: RT @user_b_name
117
125
  # +please_flag:+ a 1 if the text contains 'please' or 'plz' as a stand-alone word
118
126
  # +text:+ the *full* text of the tweet
@@ -120,7 +128,7 @@ module Wuclan::Twitter::Model
120
128
  class ARetweetsB < TypedStruct.new(
121
129
  [:user_a_id, Integer],
122
130
  [:user_b_name, String],
123
- [:status_id, Integer],
131
+ [:tweet_id, Integer],
124
132
  [:please_flag, Integer],
125
133
  [:text, String]
126
134
  )
@@ -133,7 +141,7 @@ module Wuclan::Twitter::Model
133
141
  end
134
142
  # Key on retweeting_user-user-tweet_id
135
143
  def num_key_fields() 3 end
136
- def numeric_id_fields() [:user_a_id, :status_id] ; end
144
+ def numeric_id_fields() [:user_a_id, :tweet_id] ; end
137
145
  #
138
146
  # If there's no user we'll assume this
139
147
  # is a retweet and not an rtwhore.
@@ -146,7 +154,7 @@ module Wuclan::Twitter::Model
146
154
  class ARetweetsBId < TypedStruct.new(
147
155
  [:user_a_id, Integer],
148
156
  [:user_b_id, Integer],
149
- [:status_id, Integer],
157
+ [:tweet_id, Integer],
150
158
  [:please_flag, Integer],
151
159
  [:text, String]
152
160
  )
@@ -160,7 +168,7 @@ module Wuclan::Twitter::Model
160
168
 
161
169
  # Key on retweeting_user-user-tweet_id
162
170
  def num_key_fields() 3 end
163
- def numeric_id_fields() [:user_a_id, :user_b_id, :status_id] ; end
171
+ def numeric_id_fields() [:user_a_id, :user_b_id, :tweet_id] ; end
164
172
 
165
173
  #
166
174
  # If there's no user we'll assume this
@@ -31,6 +31,13 @@ module Wuclan::Twitter::Model
31
31
  def numeric_id_fields() [:id, :twitter_user_id, :in_reply_to_status_id, :in_reply_to_user_id] ; end
32
32
  end
33
33
 
34
+ class DeleteTweet < TypedStruct.new(
35
+ [:id, Integer ],
36
+ [:created_at, Bignum ],
37
+ [:twitter_user_id, Integer ]
38
+ )
39
+ include ModelCommon
40
+ end
34
41
 
35
42
  #
36
43
  # SearchTweet
@@ -30,6 +30,8 @@ module Wuclan::Twitter::Model
30
30
 
31
31
  end
32
32
 
33
+
34
+
33
35
  #
34
36
  # Fundamental information on a user.
35
37
  #
@@ -57,6 +59,9 @@ module Wuclan::Twitter::Model
57
59
  def tweets_per_day() tweets_count.to_i / days_since_created end
58
60
  end
59
61
 
62
+
63
+
64
+
60
65
  #
61
66
  # Outside of a users/show page, when a user is mentioned
62
67
  # only this subset of fields appear.
@@ -0,0 +1,3 @@
1
+ require 'monkeyshines';
2
+ require 'wuclan/twitter/scrape/twitter_search_request';
3
+ require 'wuclan/twitter/parse/twitter_search_parse';
@@ -15,6 +15,7 @@ module Wuclan
15
15
  # Parse
16
16
  #
17
17
  def parse *args, &block
18
+ return unless items
18
19
  items.each do |item|
19
20
  self.encode_and_sanitize!(item)
20
21
  tweet = tweet_from_parse(item)
@@ -14,8 +14,10 @@ module Wuclan
14
14
  autoload :TwitterFriendsIdsRequest, 'wuclan/twitter/scrape/twitter_ff_ids_request'
15
15
  autoload :TwitterUserTimelineRequest, 'wuclan/twitter/scrape/twitter_timeline_request'
16
16
  autoload :TwitterPublicTimelineRequest, 'wuclan/twitter/scrape/twitter_timeline_request'
17
+ autoload :TwitterStreamRequest, 'wuclan/twitter/scrape/twitter_stream_request'
17
18
  autoload :JsonUserWithTweet, 'wuclan/twitter/scrape/twitter_json_response'
18
19
  autoload :JsonTweetWithUser, 'wuclan/twitter/scrape/twitter_json_response'
20
+ autoload :JsonDeleteTweet, 'wuclan/twitter/scrape/twitter_json_response'
19
21
 
20
22
  end
21
23
  end
@@ -13,8 +13,7 @@ module Wuclan::Twitter::Scrape
13
13
 
14
14
  def parse *args, &block
15
15
  handle_special_cases!(*args, &block) or return
16
- # super *args
17
- yield self
16
+ super *args
18
17
  end
19
18
 
20
19
  def handle_special_cases! *args, &block
@@ -26,10 +25,14 @@ module Wuclan::Twitter::Scrape
26
25
  end
27
26
  end
28
27
 
29
- class Followers < TwitterFollowersRequest ; include OldSkoolRequest ; end
30
- class Friends < TwitterFriendsRequest ; include OldSkoolRequest ; end
31
- class Favorites < TwitterFavoritesRequest ; include OldSkoolRequest ; end
28
+ class User < TwitterUserRequest ; include OldSkoolRequest ; end
29
+ class Followers < TwitterFollowersRequest ; include OldSkoolRequest ; end
30
+ class Friends < TwitterFriendsRequest ; include OldSkoolRequest ; end
31
+ class FollowersIds < TwitterFollowersIdsRequest ; include OldSkoolRequest ; end
32
+ class FriendsIds < TwitterFriendsIdsRequest ; include OldSkoolRequest ; end
33
+ class Favorites < TwitterFavoritesRequest ; include OldSkoolRequest ; end
32
34
  class UserTimeline < TwitterUserTimelineRequest ; include OldSkoolRequest ; end
35
+
33
36
  class Bogus < BadRecord ;
34
37
  def parse suffix=nil, *args
35
38
  errors = suffix.split('-')
@@ -27,10 +27,11 @@ module Wuclan
27
27
  # unpacks the raw API response, yielding all the relationships.
28
28
  #
29
29
  def parse *args, &block
30
+ return unless healthy?
30
31
  parsed_contents.each do |user_b_id|
31
32
  user_b_id = "%010d"%user_b_id.to_i
32
33
  # B is a follower: B follows user.
33
- yield AFollowsB.new(user_b_id, user_a_id)
34
+ yield AFollowsB.new(user_b_id, twitter_user_id)
34
35
  end
35
36
  end
36
37
  end
@@ -62,10 +63,11 @@ module Wuclan
62
63
  # unpacks the raw API response, yielding all the relationships.
63
64
  #
64
65
  def parse *args, &block
66
+ return unless healthy?
65
67
  parsed_contents.each do |user_b_id|
66
68
  user_b_id = "%010d"%user_b_id.to_i
67
69
  # B is a friend: user follows B
68
- yield AFollowsB.new(user_a_id, user_b_id)
70
+ yield AFollowsB.new(twitter_user_id, user_b_id)
69
71
  end
70
72
  end
71
73
  end
@@ -20,6 +20,7 @@ module Wuclan::Twitter::Scrape
20
20
  # generate all the contained TwitterXXX objects
21
21
  #
22
22
  def each
23
+ return unless healthy?
23
24
  if is_partial?
24
25
  yield user
25
26
  else
@@ -38,10 +39,10 @@ module Wuclan::Twitter::Scrape
38
39
  # This method tries to guess, based on the fields in the raw_user, which it has.
39
40
  #
40
41
  def is_partial?
42
+ p(raw) if !raw_user
41
43
  not raw_user.include?('friends_count')
42
44
  end
43
45
 
44
-
45
46
  def tweet
46
47
  Tweet.from_hash raw_tweet if raw_tweet
47
48
  end
@@ -66,7 +67,7 @@ module Wuclan::Twitter::Scrape
66
67
  #
67
68
  def fix_raw_user!
68
69
  return unless raw_user
69
- raw_user['scraped_at'] = self.moreinfo['scraped_at']
70
+ raw_user['scraped_at'] = ModelCommon.flatten_date(self.moreinfo['scraped_at'])
70
71
  raw_user['created_at'] = ModelCommon.flatten_date(raw_user['created_at'])
71
72
  raw_user['id'] = ModelCommon.zeropad_id( raw_user['id'])
72
73
  raw_user['protected'] = ModelCommon.unbooleanize(raw_user['protected'])
@@ -88,7 +89,7 @@ module Wuclan::Twitter::Scrape
88
89
  raw_tweet['created_at'] = ModelCommon.flatten_date(raw_tweet['created_at'])
89
90
  raw_tweet['favorited'] = ModelCommon.unbooleanize(raw_tweet['favorited'])
90
91
  raw_tweet['truncated'] = ModelCommon.unbooleanize(raw_tweet['truncated'])
91
- raw_tweet['twitter_user_id'] = ModelCommon.zeropad_id( raw_tweet['twitter_user_id'] )
92
+ raw_tweet['twitter_user_id'] = ModelCommon.zeropad_id( raw_user['id'] )
92
93
  raw_tweet['in_reply_to_user_id'] = ModelCommon.zeropad_id( raw_tweet['in_reply_to_user_id']) unless raw_tweet['in_reply_to_user_id'].blank? || (raw_tweet['in_reply_to_user_id'].to_i == 0)
93
94
  raw_tweet['in_reply_to_status_id'] = ModelCommon.zeropad_id( raw_tweet['in_reply_to_status_id']) unless raw_tweet['in_reply_to_status_id'].blank? || (raw_tweet['in_reply_to_status_id'].to_i == 0)
94
95
  Wukong.encode_components raw_tweet, 'text', 'in_reply_to_screen_name'
@@ -96,9 +97,7 @@ module Wuclan::Twitter::Scrape
96
97
  end
97
98
  end
98
99
 
99
-
100
100
  class JsonUserWithTweet < JsonUserTweetPair
101
-
102
101
  def raw_tweet
103
102
  return @raw_tweet if @raw_tweet
104
103
  @raw_tweet = raw['status']
@@ -112,7 +111,6 @@ end
112
111
 
113
112
 
114
113
  class JsonTweetWithUser < JsonUserTweetPair
115
-
116
114
  def raw_tweet
117
115
  @raw_tweet ||= raw
118
116
  end
@@ -122,3 +120,38 @@ class JsonTweetWithUser < JsonUserTweetPair
122
120
  @raw_user
123
121
  end
124
122
  end
123
+
124
+
125
+
126
+ class JsonDeleteTweet
127
+ attr_accessor :raw, :moreinfo, :scraped_at
128
+ def initialize raw, moreinfo={}
129
+ self.raw = raw
130
+ self.moreinfo = moreinfo
131
+ self.scraped_at = nil # TODO -- extract this from neighbors
132
+ end
133
+
134
+ # Extracted JSON should be an array
135
+ def healthy?()
136
+ raw && raw.is_a?(Hash)
137
+ end
138
+
139
+ def delete_tweet
140
+ Wuclan::Twitter::Model::DeleteTweet.new(
141
+ raw['delete']['status']['id'],
142
+ self.scraped_at,
143
+ raw['delete']['status']['user_id']
144
+ ) rescue nil
145
+ end
146
+
147
+ def each *args, &block
148
+ return unless healthy?
149
+ yield delete_tweet
150
+ end
151
+
152
+ # true if this model looks like it will parse the given JSON
153
+ def self.parses? hsh
154
+ # KLUDGE
155
+ hsh =~ /"delete":\{/
156
+ end
157
+ end
@@ -21,6 +21,7 @@ class TwitterRequestStream < Monkeyshines::RequestStream::SimpleRequestStream
21
21
  # can be a screen_name, but we need the numeric ID for followers_request's, etc.
22
22
  def each_request twitter_user_id, *args
23
23
  user_req = TwitterUserRequest.new(twitter_user_id)
24
+ # this performs the request in-place: req holds the fulfilled response
24
25
  yield(user_req)
25
26
  return unless user_req.healthy?
26
27
  twitter_user_id = user_req.parsed_contents['id'].to_i if (user_req.parsed_contents['id'].to_i > 0)
@@ -0,0 +1,44 @@
1
+ module Wuclan
2
+ module Twitter
3
+ module Scrape
4
+
5
+ class TwitterStreamRequest < Struct.new(:contents)
6
+ # Contents are JSON
7
+ include Monkeyshines::RawJsonContents
8
+
9
+ # self.hard_request_limit = 1
10
+ # def make_url() "http://stream.twitter.com/1/statuses/sample.json" end
11
+
12
+ # Extracted JSON should be an array
13
+ def healthy?()
14
+ parsed_contents && parsed_contents.is_a?(Hash)
15
+ end
16
+
17
+ def parsed_as_delete_tweet *args, &block
18
+ p parsed_contents
19
+ json_obj = JsonDeleteTweet.new(parsed_contents)
20
+ json_obj.each(&block)
21
+ end
22
+
23
+ # Extract user and tweet
24
+ def parsed_as_tweet *args, &block
25
+ json_obj = JsonTweetWithUser.new(
26
+ parsed_contents, 'scraped_at' => parsed_contents['created_at'])
27
+ json_obj.each(&block)
28
+ end
29
+
30
+ #
31
+ # unpacks the raw API response, yielding all the interesting objects
32
+ # and relationships within.
33
+ #
34
+ def parse *args, &block
35
+ return unless healthy?
36
+ return parsed_as_delete_tweet(*args, &block) if JsonDeleteTweet.parses?(contents)
37
+ # else
38
+ parsed_as_tweet(*args, &block)
39
+ end
40
+ end
41
+
42
+ end
43
+ end
44
+ end
@@ -5,11 +5,11 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = %q{wuclan}
8
- s.version = "0.2.0"
8
+ s.version = "0.2.1"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Philip (flip) Kromer"]
12
- s.date = %q{2009-10-12}
12
+ s.date = %q{2009-11-02}
13
13
  s.description = %q{Massive-scale social network analysis. Nothing to f with.}
14
14
  s.email = %q{flip@infochimps.org}
15
15
  s.extra_rdoc_files = [
@@ -35,6 +35,7 @@ Gem::Specification.new do |s|
35
35
  "examples/twitter/old/scrape_twitter_trending.rb",
36
36
  "examples/twitter/parse/parse_twitter_requests.rb",
37
37
  "examples/twitter/parse/parse_twitter_search_requests.rb",
38
+ "examples/twitter/parse/parse_twitter_stream_requests.rb",
38
39
  "examples/twitter/scrape_twitter_api/scrape_twitter_api.rb",
39
40
  "examples/twitter/scrape_twitter_api/seed.tsv",
40
41
  "examples/twitter/scrape_twitter_api/start_cache_twitter.sh",
@@ -49,9 +50,13 @@ Gem::Specification.new do |s|
49
50
  "examples/twitter/scrape_twitter_hosebird/scrape_twitter_hosebird.rb",
50
51
  "examples/twitter/scrape_twitter_hosebird/test_spewer.rb",
51
52
  "examples/twitter/scrape_twitter_hosebird/twitter_hosebird_god.yaml",
53
+ "examples/twitter/scrape_twitter_search/README.textile",
54
+ "examples/twitter/scrape_twitter_search/README.textile",
52
55
  "examples/twitter/scrape_twitter_search/dump_twitter_search_jobs.rb",
56
+ "examples/twitter/scrape_twitter_search/edamame_global_config-template.yaml",
53
57
  "examples/twitter/scrape_twitter_search/load_twitter_search_jobs.rb",
54
58
  "examples/twitter/scrape_twitter_search/scrape_twitter_search.rb",
59
+ "examples/twitter/scrape_twitter_search/seed.tsv",
55
60
  "examples/twitter/scrape_twitter_search/twitter_search_daemons.god",
56
61
  "lib/old/twitter_api.rb",
57
62
  "lib/wuclan.rb",
@@ -102,6 +107,7 @@ Gem::Specification.new do |s|
102
107
  "lib/wuclan/twitter/model/tweet/tweet_token.rb",
103
108
  "lib/wuclan/twitter/model/twitter_user.rb",
104
109
  "lib/wuclan/twitter/model/twitter_user/style/color_to_hsv.rb",
110
+ "lib/wuclan/twitter/parse.rb",
105
111
  "lib/wuclan/twitter/parse/ff_ids_parser.rb",
106
112
  "lib/wuclan/twitter/parse/friends_followers_parser.rb",
107
113
  "lib/wuclan/twitter/parse/generic_json_parser.rb",
@@ -123,6 +129,7 @@ Gem::Specification.new do |s|
123
129
  "lib/wuclan/twitter/scrape/twitter_search_job.rb",
124
130
  "lib/wuclan/twitter/scrape/twitter_search_request.rb",
125
131
  "lib/wuclan/twitter/scrape/twitter_search_request_stream.rb",
132
+ "lib/wuclan/twitter/scrape/twitter_stream_request.rb",
126
133
  "lib/wuclan/twitter/scrape/twitter_timeline_request.rb",
127
134
  "lib/wuclan/twitter/scrape/twitter_user_request.rb",
128
135
  "spec/spec_helper.rb",
@@ -151,6 +158,7 @@ Gem::Specification.new do |s|
151
158
  "examples/twitter/old/scrape_twitter_trending.rb",
152
159
  "examples/twitter/parse/parse_twitter_requests.rb",
153
160
  "examples/twitter/parse/parse_twitter_search_requests.rb",
161
+ "examples/twitter/parse/parse_twitter_stream_requests.rb",
154
162
  "examples/twitter/scrape_twitter_api/scrape_twitter_api.rb",
155
163
  "examples/twitter/scrape_twitter_api/support/make_request_stats.rb",
156
164
  "examples/twitter/scrape_twitter_api/support/make_requests_by_id_and_date_1.rb",
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: wuclan
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Philip (flip) Kromer
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-10-12 00:00:00 -05:00
12
+ date: 2009-11-02 00:00:00 -06:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -70,6 +70,7 @@ files:
70
70
  - examples/twitter/old/scrape_twitter_trending.rb
71
71
  - examples/twitter/parse/parse_twitter_requests.rb
72
72
  - examples/twitter/parse/parse_twitter_search_requests.rb
73
+ - examples/twitter/parse/parse_twitter_stream_requests.rb
73
74
  - examples/twitter/scrape_twitter_api/scrape_twitter_api.rb
74
75
  - examples/twitter/scrape_twitter_api/seed.tsv
75
76
  - examples/twitter/scrape_twitter_api/start_cache_twitter.sh
@@ -84,9 +85,12 @@ files:
84
85
  - examples/twitter/scrape_twitter_hosebird/scrape_twitter_hosebird.rb
85
86
  - examples/twitter/scrape_twitter_hosebird/test_spewer.rb
86
87
  - examples/twitter/scrape_twitter_hosebird/twitter_hosebird_god.yaml
88
+ - examples/twitter/scrape_twitter_search/README.textile
87
89
  - examples/twitter/scrape_twitter_search/dump_twitter_search_jobs.rb
90
+ - examples/twitter/scrape_twitter_search/edamame_global_config-template.yaml
88
91
  - examples/twitter/scrape_twitter_search/load_twitter_search_jobs.rb
89
92
  - examples/twitter/scrape_twitter_search/scrape_twitter_search.rb
93
+ - examples/twitter/scrape_twitter_search/seed.tsv
90
94
  - examples/twitter/scrape_twitter_search/twitter_search_daemons.god
91
95
  - lib/old/twitter_api.rb
92
96
  - lib/wuclan.rb
@@ -136,6 +140,7 @@ files:
136
140
  - lib/wuclan/twitter/model/tweet/tweet_token.rb
137
141
  - lib/wuclan/twitter/model/twitter_user.rb
138
142
  - lib/wuclan/twitter/model/twitter_user/style/color_to_hsv.rb
143
+ - lib/wuclan/twitter/parse.rb
139
144
  - lib/wuclan/twitter/parse/ff_ids_parser.rb
140
145
  - lib/wuclan/twitter/parse/friends_followers_parser.rb
141
146
  - lib/wuclan/twitter/parse/generic_json_parser.rb
@@ -157,6 +162,7 @@ files:
157
162
  - lib/wuclan/twitter/scrape/twitter_search_job.rb
158
163
  - lib/wuclan/twitter/scrape/twitter_search_request.rb
159
164
  - lib/wuclan/twitter/scrape/twitter_search_request_stream.rb
165
+ - lib/wuclan/twitter/scrape/twitter_stream_request.rb
160
166
  - lib/wuclan/twitter/scrape/twitter_timeline_request.rb
161
167
  - lib/wuclan/twitter/scrape/twitter_user_request.rb
162
168
  - spec/spec_helper.rb
@@ -207,6 +213,7 @@ test_files:
207
213
  - examples/twitter/old/scrape_twitter_trending.rb
208
214
  - examples/twitter/parse/parse_twitter_requests.rb
209
215
  - examples/twitter/parse/parse_twitter_search_requests.rb
216
+ - examples/twitter/parse/parse_twitter_stream_requests.rb
210
217
  - examples/twitter/scrape_twitter_api/scrape_twitter_api.rb
211
218
  - examples/twitter/scrape_twitter_api/support/make_request_stats.rb
212
219
  - examples/twitter/scrape_twitter_api/support/make_requests_by_id_and_date_1.rb