DRMacIver-reddilicious 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE ADDED
@@ -0,0 +1,25 @@
1
+ Copyright (c) 2009, David R. MacIver
2
+ All rights reserved.
3
+
4
+ Redistribution and use in source and binary forms, with or without
5
+ modification, are permitted provided that the following conditions are met:
6
+ * Redistributions of source code must retain the above copyright
7
+ notice, this list of conditions and the following disclaimer.
8
+ * Redistributions in binary form must reproduce the above copyright
9
+ notice, this list of conditions and the following disclaimer in the
10
+ documentation and/or other materials provided with the distribution.
11
+ * Neither the name of the reddilicious nor the
12
+ names of its contributors may be used to endorse or promote products
13
+ derived from this software without specific prior written permission.
14
+
15
+ THIS SOFTWARE IS PROVIDED BY David R. MacIver 'AS IS'' AND ANY
16
+ EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
17
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
18
+ DISCLAIMED. IN NO EVENT SHALL David R. MacIver BE LIABLE FOR ANY
19
+ DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
20
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
21
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
22
+ ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
23
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
24
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
25
+
data/README.markdown ADDED
@@ -0,0 +1,26 @@
1
+ ## Reddilicious
2
+
3
+ Reddilicious automatically imports links you've up voted on reddit and other social bookmarking sites into delicious. You simply provide it with your account details on each and set up a cron job to run it regularly. It takes care of the rest.
4
+
5
+ ### Notes on behaviour:
6
+
7
+ * Despite the name, reddilicous actually imports from a bunch of different sites. Currently only twitter, reddit and stumbleupon, but that's purely a function of the fact that those are the ones I use at the moment.
8
+ * Links are tagged with via:source, plus any tags that can be obtained from there.
9
+ * The date on the link will be set to that at which the URL was initially posted on the source site, not the time you imported it (which plays badly with historical data) or the time you up voted it (which doesn't seem to be available information everywhere).
10
+ * It will import all your histories, so the initial run will take a while.
11
+ * This is very slow. There's a mixture of reasons for this - it's written in Ruby, it generate a fair bit of HTTP traffic and it deliberately rate limits itself in a lot of cases. The single biggest reason though is that I don't particularly care and I haven't optimised it. It's intended to be run a couple times an hour by an automated task, and you have to hit pretty damn heavy traffic before it's too slow for that.
12
+ * Twitter support will pull in any URLs mentioned on your friends time-line, automatically tagging them based on delicious suggestions and tagging them with information about who it was to and from.
13
+
14
+
15
+ ### Some general comments:
16
+
17
+ * The code is currently a bit grim in places. Some of this is inevitable - site scraping is never going to look pretty - some of it will probably be cleaned up at various points.
18
+ * Patches are *exceedingly* welcome. There are a pile of sites this could reasonably import from, and I don't use so much as a tenth of them. If you want this to handle your favourite social bookmarking or similar site, please feel free to submit a patch.
19
+ * I'm currently changing the internal format on a semi regular basis as I figure things out. Once I've got an actual release out I'll have a proper versioning system for upgrades, etc. but I don't yet.
20
+
21
+ ### Dependencies
22
+
23
+ * [json](http://json.rubyforge.org/)
24
+ * [nokogiri](http://github.com/tenderlove/nokogiri/tree/master)
25
+ * [httparty](http://github.com/jnunemaker/httparty/tree/master)
26
+ * [mechanize](http://mechanize.rubyforge.org/mechanize/) (for stumbleupon)
data/Rakefile ADDED
@@ -0,0 +1,13 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+
4
+ require 'jeweler'
5
+ Jeweler::Tasks.new do |gem|
6
+ gem.name = "reddilicious"
7
+ gem.summary = "reddilicious is a tool for automatically importing links into delicious"
8
+ gem.email = "david.maciver@gmail.com"
9
+ gem.homepage = "http://github.com/DRMacIver/reddilicious"
10
+ gem.authors = ["David R. MacIver"]
11
+
12
+ # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
13
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.0.2
data/bin/reddilicious ADDED
@@ -0,0 +1,51 @@
1
+ #!/usr/bin/env ruby
2
+ $: << File.join(File.dirname(__FILE__), "../lib")
3
+
4
+ require "reddilicious.rb"
5
+
6
+ def usage
7
+ STDERR.puts <<-USAGE
8
+ Usage:
9
+ reddilicious update # Update a reddilicious instance, posting the new bookmarks to delicious"
10
+ reddilicious undo [site] # deletes all imported posts (from specific site, or all if not specified)
11
+ USAGE
12
+ Reddilicious.site_names.each do |site|
13
+ STDERR.puts " reddilicious #{site} # set your #{site} user"
14
+ end
15
+
16
+ exit(1)
17
+ end
18
+
19
+ dir=ENV["REDDILICIOUS_HOME"] || File.join(ENV["HOME"], ".reddilicious")
20
+ reddilicious = Reddilicious.new(dir)
21
+
22
+ if !File.directory?(dir)
23
+ puts "no such directory #{dir}. Creating..."
24
+ Dir.mkdir(dir)
25
+
26
+ puts "Delicious user name:"
27
+ delicious = STDIN.gets.strip
28
+ delicious = nil if delicious == ""
29
+ puts "Delicious password:"
30
+ delicious_password = STDIN.gets.strip
31
+ reddilicious.create!(delicious, delicious_password)
32
+ end
33
+
34
+ case ARGV[0]
35
+ when "update"
36
+ File.open("#{dir}/reddilicious.log", "a") do |log|
37
+ log.sync = true
38
+ $stdout = log
39
+ $stderr = log
40
+ reddilicious.transfer_to_delicious
41
+ end
42
+ when *Reddilicious.site_names:
43
+ reddilicious.site_for(ARGV[0]).ask_for_credentials
44
+ when "undo"
45
+ sites = ARGV[1..-1].empty? ? Reddilicious.site_names : ARGV[1..-1]
46
+ puts "undo import for sites #{sites.inspect}: are you sure? (y/n)"
47
+ if STDIN.gets.strip.downcase == 'y'
48
+ sites.each { |s| reddilicious.site_for(s).undo_import! }
49
+ end
50
+ else usage
51
+ end
data/lib/blacklist.rb ADDED
@@ -0,0 +1,44 @@
1
+ class Blacklist
2
+ def initialize(blacklist)
3
+ @blacklist = Hash.new{|h, k| h[k] = [] }
4
+
5
+ blacklist.each { |list|
6
+ list.each { |tag|
7
+ @blacklist[tag] << list
8
+ }
9
+ }
10
+ end
11
+
12
+ def self.from_file(file)
13
+ return nil if !File.exists?(file)
14
+ Blacklist.new(IO.read(file).split("\n").map{|l| l.split})
15
+ end
16
+
17
+ def blacklisted?(tags)
18
+ if tags.is_a? String
19
+ tags = tags.split
20
+ end
21
+
22
+ if !tags.is_a? Array
23
+ raise "unrecognised argument #{tags.inspect}"
24
+ end
25
+
26
+ shallow_flatten(tags.
27
+ map{|t| @blacklist[t]}.
28
+ compact).
29
+ any?{|set|
30
+ !set.empty? && set.all?{|t|
31
+ tags.include?(t)
32
+ }
33
+ }
34
+ end
35
+
36
+ private
37
+
38
+ def shallow_flatten(enum)
39
+ it = []
40
+ enum.each{|x| it += x }
41
+ it
42
+ end
43
+
44
+ end
data/lib/delicious.rb ADDED
@@ -0,0 +1,15 @@
1
+ require "rubygems"
2
+ require "httparty"
3
+
4
+ module Delicious
5
+ include HTTParty
6
+ base_uri "https://api.del.icio.us/v1"
7
+
8
+ #Please set your User-Agent to something identifiable.
9
+ #The default identifiers like "Java/1.4.3" or "lwp-perl" etc tend to get banned from time to time.
10
+ headers 'User-Agent' => 'reddilicious (0.1)'
11
+
12
+ format :xml
13
+ end
14
+
15
+
data/lib/post.rb ADDED
@@ -0,0 +1,90 @@
1
+ require "ostruct"
2
+ require "set"
3
+ require "rubygems"
4
+ require "nokogiri"
5
+ require "open-uri"
6
+
7
+ class Post
8
+ NEW_MARKER = "imported_by:reddilicious"
9
+
10
+ Data = :url, :dt, :description, :extended, :tags
11
+
12
+ attr_accessor *Data
13
+
14
+ def initialize(hash=nil)
15
+ yield self if block_given?
16
+ if hash
17
+ hash.each do |key, value|
18
+ instance_variable_set("@" + key, value)
19
+ end
20
+ end
21
+ raise "all posts must have a URL" if !self.url
22
+ end
23
+
24
+ def to_h
25
+ it = {}
26
+ Data.each{|d| it[d] = instance_variable_get("@" + d.to_s)}
27
+ it
28
+ end
29
+
30
+ def auto_imported?
31
+ tag_set.include?(NEW_MARKER)
32
+ end
33
+
34
+ def tag_set
35
+ if !self.tags then Set.new else Set[*self.tags.split] end
36
+ end
37
+
38
+ def fetch_metadata!(suggest_tags=true)
39
+ self.description ||= begin
40
+ Nokogiri::HTML(open(url)).xpath("//title").text.gsub("\n", " ").gsub(/ +/, " ").strip
41
+ rescue Exception => e
42
+ puts "WARNING: #{e}"
43
+ url
44
+ end
45
+
46
+ self.description = url if description.empty?
47
+
48
+ if suggest_tags
49
+ suggest = Delicious.get("/posts/suggest", :query => {:url => url} )
50
+ if suggest['suggest']
51
+ suggested_tags = suggest["suggest"]["popular"] || []
52
+ self.tags = suggested_tags.is_a? Array ? suggested_tags.join(" ") : suggested_tags
53
+ end
54
+ sleep 1
55
+ end
56
+ end
57
+
58
+ def merge(that)
59
+ return self if !that
60
+ raise "cannot merge posts with different URLS: #{self.url} != #{that.url}" if self.url != that.url
61
+
62
+ result = Post.new{|p|
63
+ p.url = self.url
64
+ p.description = that.description
65
+
66
+ if(self.extended && !that.extended)
67
+ p.extended = self.extended
68
+ else
69
+ p.extended = that.extended
70
+ end
71
+
72
+ old_tags = that.tag_set
73
+ new_tags = self.tag_set - old_tags
74
+
75
+ # remove the new marker unless it was already present in old post
76
+ new_tags -= [NEW_MARKER] unless that.auto_imported?
77
+
78
+ p.tags = if !new_tags.empty?
79
+ new_tags.to_a.join(" ") + " " + that.tags
80
+ else
81
+ that.tags
82
+ end
83
+
84
+ p.dt = [self.dt, that.dt].compact.min
85
+ }
86
+
87
+ result = nil if result == that # FIXME
88
+ result
89
+ end
90
+ end
@@ -0,0 +1,195 @@
1
+ require "site"
2
+ require "httparty"
3
+ require "delicious"
4
+ require "json"
5
+ require 'net/http'
6
+ require 'uri'
7
+ require "blacklist"
8
+
9
+ class Reddilicious
10
+ # lambdas to get lazy loading
11
+ SitesToClasses = {
12
+ "reddit" => lambda{
13
+ require "reddit";
14
+ Reddit::Liked
15
+ },
16
+
17
+ "stumbleupon" => lambda{
18
+ require "stumbleupon"
19
+ StumbleUpon::Favourites
20
+ },
21
+ "twitter" => lambda{
22
+ require "twitter"
23
+ Twitter::FriendsTimeline
24
+ }
25
+ }
26
+
27
+ attr_accessor :dir
28
+ def initialize(dir)
29
+ @dir = dir
30
+ @blacklist = Blacklist.from_file(File.join(dir, "blacklist"))
31
+ @untiny_cache = if File.exists?(untiny_cache_file)
32
+ JSON.parse(IO.read(untiny_cache_file))
33
+ else
34
+ {}
35
+ end
36
+
37
+ if File.exists?(details_file)
38
+ Delicious.basic_auth(details["delicious_user"], details["delicious_password"])
39
+ end
40
+ end
41
+
42
+ def sites
43
+ @sites ||= Dir["#{@dir}/*"].map{|x| site_for(File.basename(x))}.compact
44
+ end
45
+
46
+ def site_for(x)
47
+ c = SitesToClasses[x]
48
+ c && c.call.new(self)
49
+ end
50
+
51
+ def create!(delicious, delicious_password)
52
+ File.open(details_file, "w"){|o|
53
+ o.puts({:delicious_user => delicious, :delicious_password => delicious_password}.to_json)
54
+ }
55
+ end
56
+
57
+ def details
58
+ @details ||= JSON.parse(IO.read(details_file))
59
+ end
60
+
61
+ def update_time
62
+ puts "Checking the server for last update time"
63
+ sleep 1
64
+ time = Delicious.get("/posts/update")["update"]["time"]
65
+ puts "last updated at #{time}"
66
+ time
67
+ end
68
+
69
+ def update_untiny_cache
70
+ File.open(untiny_cache_file, 'w') { |f| f << JSON.pretty_generate(@untiny_cache) }
71
+ end
72
+
73
+ def untiny_url(url, n=0)
74
+ @untiny_cache[url] ||= begin
75
+ puts "untiny #{url}"
76
+ resp = Net::HTTP.get_response(URI.parse(url))
77
+ url = resp['location'] || url
78
+ if [301, 302].include?(resp.code.to_i) && n<=3 && resp['location']
79
+ untiny_url(resp['location'], ++n)
80
+ else
81
+ url
82
+ end
83
+ rescue Exception => e
84
+ puts "WARNING: #{e}"
85
+ url
86
+ end
87
+ end
88
+
89
+ def bookmark_for(url, suggest_tags=true)
90
+ url = untiny_url(url)
91
+ Post.new do |post|
92
+ post.url = url
93
+ end
94
+ end
95
+
96
+ def delicious_posts
97
+ puts "Reading existing delicious bookmarks"
98
+ delicious_post_file = File.join(@dir, "bookmarks.json")
99
+
100
+ @existing_posts = nil
101
+
102
+ if File.exist? delicious_post_file
103
+ @existing_posts ||= JSON.parse(IO.read(delicious_post_file))
104
+ end
105
+
106
+ if !@existing_posts || (update_time > @existing_posts["updated"])
107
+ puts "Bookmarks out of date. Fetching from server"
108
+ posts = Delicious.get("/posts/all")["posts"]
109
+ raise "error fetching all posts" unless posts
110
+
111
+ @existing_posts = {"updated" => posts["update"], "posts" => posts["post"] || []}
112
+ File.open(delicious_post_file, "w") do |o|
113
+ o.puts JSON.pretty_generate(@existing_posts)
114
+ end
115
+ else
116
+ puts "Nothing to do here: Our existing bookmarks are up to date"
117
+ end
118
+
119
+ posts = (@existing_posts["posts"] || []).map{|x| Post.new(x){|p| p.url = x["href"]; p.tags = x["tag"]}}
120
+ puts "found #{posts.length} existing bookmarks"
121
+ posts
122
+ end
123
+
124
+ def transfer_to_delicious
125
+ puts "Beginning import at #{Time.now}"
126
+ puts "Found importers for #{sites.join(", ")}"
127
+ new_updates = sites.map{|x| x.update!}.flatten
128
+ update_untiny_cache
129
+
130
+ puts "#{new_updates.length} urls to import"
131
+
132
+ return if new_updates.empty?
133
+
134
+ urls_to_posts = {}
135
+
136
+ new_updates.each do |post|
137
+ urls_to_posts[post.url] = post.merge(urls_to_posts[post.url])
138
+ end
139
+
140
+ puts "checking for existing bookmarks"
141
+ delicious_posts.each do |post|
142
+ update = urls_to_posts[post.url]
143
+ if update
144
+ puts "merging existing post for #{post.url}"
145
+ urls_to_posts[post.url] = update.merge post
146
+ end
147
+ end
148
+
149
+ new_updates = urls_to_posts.values.compact
150
+
151
+ puts "#{new_updates.length} urls after merging"
152
+
153
+ new_updates.each do |update|
154
+
155
+ blacklisted = @blacklist && @blacklist.blacklisted?(update.tags)
156
+
157
+ if blacklisted
158
+ puts "ignoring #{update.description} due to (#{update.url}) is blacklisting (tags #{update.tags})"
159
+ else
160
+ puts "importing #{update.description} (#{update.url})"
161
+
162
+ update.fetch_metadata!(false)
163
+
164
+ res = Delicious.post("/posts/add", :query => update.to_h)
165
+ if !res['result'] || res['result']['code'] != 'done'
166
+ puts "error importing post: #{res.inspect}"
167
+ end
168
+
169
+ sleep(1)
170
+ end
171
+ end
172
+ puts "Saving data to storage"
173
+ sites.each{|x| x.save!}
174
+
175
+ puts "Import complete"
176
+
177
+ nil
178
+ end
179
+
180
+
181
+ def self.site_names
182
+ SitesToClasses.keys
183
+ end
184
+
185
+
186
+ private
187
+ def details_file
188
+ File.join(dir, "details.json")
189
+ end
190
+
191
+ def untiny_cache_file
192
+ File.join(dir, "untiny.json")
193
+ end
194
+
195
+ end
data/lib/reddit.rb ADDED
@@ -0,0 +1,60 @@
1
+ require "site"
2
+ require "httparty"
3
+
4
+ # Ruby script for fetching the "posts" history of a reddit user as json
5
+
6
+ module Reddit
7
+ include HTTParty
8
+ base_uri "http://reddit.com"
9
+ format :json
10
+
11
+ class Liked < Site
12
+ def name
13
+ "reddit"
14
+ end
15
+
16
+ def update!
17
+ puts "Updating Reddit"
18
+ balance
19
+ results = []
20
+ new_results = nil
21
+ after = nil
22
+ i = 0
23
+ while !(new_results = merge_results(Reddit.get("/user/#{credentials["user"].strip}/liked/.json", :query => {"after" => after})["data"]["children"].map{|x| x["data"]})).empty?
24
+ puts "fetching reddit page #{i}"
25
+ results += new_results
26
+ after = new_results[-1]["name"]
27
+ i += 1
28
+ end
29
+
30
+ results.map{ |update| to_post(update) }
31
+ end
32
+
33
+ def to_post(data)
34
+ Post.new({
35
+ "url" => data["url"].gsub("&amp;", "&"), # TODO: Better unescaping
36
+ "description" => data["title"],
37
+ "tags" => ["via:reddit", data["subreddit"]].join(" "),
38
+ "replace" => "yes",
39
+ "dt" => Time.at(data["created_utc"]).strftime("%Y-%m-%dT%H:%M:%SZ")
40
+ })
41
+ end
42
+
43
+ def date(post)
44
+ post["created_utc"]
45
+ end
46
+
47
+ private
48
+
49
+ def merge_results(results)
50
+ return if !results
51
+ done = false
52
+ new_results = results.select do |x|
53
+ !@ids.include?(identifier(x))
54
+ end
55
+ @posts += new_results
56
+ @ids.merge(results.map{|x| x["name"]})
57
+ new_results
58
+ end
59
+ end
60
+ end
data/lib/site.rb ADDED
@@ -0,0 +1,90 @@
1
+ require "post"
2
+ require "rubygems"
3
+ require "json"
4
+ require "set"
5
+
6
+ class Site
7
+ attr_accessor :posts
8
+
9
+ def initialize(reddilicous)
10
+ @reddilicious = reddilicous
11
+ @dir = File.join(reddilicous.dir, name)
12
+ @ids = Set.new
13
+ Dir.mkdir(@dir) if !File.exists? @dir
14
+ yield(self) if block_given?
15
+ end
16
+
17
+ def credentials=(credentials)
18
+ @credentials = credentials
19
+ File.open(credentials_file, "w"){|o| o.puts JSON.pretty_generate(@credentials)}
20
+ end
21
+
22
+ def credentials
23
+ @credentials ||= if File.exists? credentials_file then JSON.parse(IO.read(credentials_file)) else {} end
24
+ end
25
+
26
+ def posts
27
+ @posts ||= if File.exists? posts_file then JSON.parse(IO.read(posts_file)) else [] end;
28
+ end
29
+
30
+ def balance
31
+ @ids.merge(posts.map{|p| identifier(p)})
32
+ posts.sort!{|x, y| date(y) <=> date(x)}
33
+ end
34
+
35
+ def save!
36
+ balance
37
+ File.open(posts_file, "w"){|o| o.puts JSON.pretty_generate(@posts)} if @posts
38
+ end
39
+
40
+ def to_post(data)
41
+ Post.new(data)
42
+ end
43
+
44
+ def identifier(post)
45
+ post["url"]
46
+ end
47
+
48
+ # Not required to return an actual date, only something which compares
49
+ # in the correct order to be read as such
50
+ def date(post)
51
+ post["dt"]
52
+ end
53
+
54
+ def to_s
55
+ name
56
+ end
57
+
58
+ def imported_links
59
+ Delicious.get("/posts/all", :query=>{:tag=>["via:#{name}", Post::NEW_MARKER].join(' ')})['posts']['post'] || []
60
+ end
61
+
62
+ def undo_import!
63
+ posts = imported_links
64
+ posts = [posts] if posts.is_a?(Hash)
65
+ puts "undo imports for #{self}" unless posts.empty?
66
+
67
+ posts.each do |p|
68
+ puts "deleting #{p['description']}"
69
+ res = Delicious.delete('/posts/delete', :query=>{:url=>p['href']})
70
+ if !res['result'] || res['result']['code'] != 'done'
71
+ puts "delete failed: #{res.inspect}"
72
+ end
73
+ end
74
+ end
75
+
76
+ def ask_for_credentials
77
+ puts "#{name} user name:"
78
+ self.credentials = {"username" => STDIN.gets.strip}
79
+ end
80
+
81
+ private
82
+
83
+ def posts_file
84
+ File.join(@dir, "posts.json")
85
+ end
86
+
87
+ def credentials_file
88
+ File.join(@dir, "credentials.json")
89
+ end
90
+ end
@@ -0,0 +1,106 @@
1
+ require "rubygems"
2
+ require "mechanize"
3
+ require "json"
4
+ require "set"
5
+ require "site"
6
+
7
+ module StumbleUpon
8
+
9
+ class Favourites < Site
10
+
11
+ def name
12
+ "stumbleupon"
13
+ end
14
+
15
+ def update!
16
+ puts "Updating Stumbleupon"
17
+ balance
18
+ i = 0
19
+
20
+ results = []
21
+ new_results = nil
22
+
23
+ while !new_results || !new_results.empty?
24
+ puts "fetching stumbleupon page #{i}"
25
+ new_results = StumbleUpon.fetch_page(credentials["user"], i).select{|x| !@ids.include? identifier(x)}
26
+
27
+ results += new_results
28
+ i += 1
29
+ end
30
+
31
+
32
+ @posts += results
33
+ balance
34
+ results.map{|u| to_post(u)}
35
+ end
36
+ end
37
+
38
+ def self.fetch_page(user, page_number=nil)
39
+ url = "http://www.stumbleupon.com/stumbler/#{user}/favorites/"
40
+ if page_number && (page_number > 0)
41
+ url += (page_number * 10).to_s
42
+ url += "/"
43
+ end
44
+
45
+ mechanize = WWW::Mechanize.new{|agent| agent.user_agent_alias = "Linux Mozilla"}
46
+
47
+ list_view = mechanize.get(url).search("a").select{|x| x["href"] =~ /viewmode=list/}[0]
48
+
49
+ mechanize.click(list_view) if list_view
50
+
51
+ mechanize.page.search("dl.dlBlog").map{|post| parse_review(post)}.compact
52
+ end
53
+
54
+
55
+ private
56
+
57
+ def self.parse_review(review)
58
+ title_elem = review.search("dt")[0]
59
+ title = title_elem.text
60
+
61
+ href = nil
62
+ # stumbleupon seems to be doing some mangling which means
63
+ # I'm seeing different results here than in firefox
64
+ # For the moment the following is our "best guess" as to
65
+ # what the URL should be.
66
+
67
+ urls = review.search("a").map{|x| x["href"]}
68
+
69
+ urls.reject!{|x| x !~ /^http:\/\//} # remove javacript only and relative links
70
+ urls.reject!{|x| x =~ /^http:\/\/www.stumbleupon.com/} # remove internal links
71
+
72
+ if urls.length == 1
73
+ href = urls[0]
74
+ else
75
+ STDERR.puts "I didn't know what to do with the post #{title}. Its URLs made no sense"
76
+ return
77
+ end
78
+
79
+ tags = review.search("a").map{|x| x["href"]}.grep(/\/tag\//).map{|h| h.gsub("/tag/", "").gsub("/", "")}
80
+
81
+ tags << "via:stumbleupon"
82
+
83
+ contents = review.search("dd").select{|x| x["id"] =~ /blog_contents/}
84
+ if !contents.empty?
85
+ contents = contents[0].children.select{|x| x.text?}.join("\n")
86
+ else
87
+ contents = nil
88
+ end
89
+
90
+ datetime_string = review.search(".stats")[0].text.gsub(/(am|pm).*$/){$1}.strip.gsub(",", " ").gsub(/ +/, " ")
91
+
92
+ # Try and parse something useful out of the SU date string.
93
+ time_re = /([0-9:]+(?:am|pm))/
94
+ date_string = datetime_string.gsub(time_re, "").strip
95
+ date = (if date_string == "" then DateTime.now else DateTime.parse(date_string) end)
96
+ time = DateTime.parse(datetime_string.scan(time_re)[0][0])
97
+ dt = Time.utc(date.year, date.month, date.day, time.hour, time.min)
98
+
99
+ it = {"url" => href, "description" => title, "tags" => tags.join(" "), "dt" => dt && dt.strftime("%Y-%m-%dT%H:%M:%SZ")}
100
+
101
+ it["extended"] = contents if contents
102
+
103
+ it
104
+ end
105
+ end
106
+
data/lib/twitter.rb ADDED
@@ -0,0 +1,96 @@
1
+ require "rubygems"
2
+ require "httparty"
3
+ require "reddilicious"
4
+
5
+ module Twitter
6
+ include HTTParty
7
+ base_uri "http://twitter.com"
8
+ format :json
9
+
10
+ class FriendsTimeline < Site
11
+ def name
12
+ "twitter"
13
+ end
14
+
15
+ def update!
16
+ puts "Updating twitter"
17
+ balance
18
+
19
+ last_post_id = posts[0] && posts[0]["id"]
20
+
21
+ results = []
22
+
23
+ new_tweets = nil
24
+
25
+ page = 0
26
+
27
+ query = {:count => 200 }
28
+
29
+ query[:since_id] = last_post_id if last_post_id
30
+
31
+ while !(new_tweets = get_tweets(query)).empty?
32
+ results += new_tweets
33
+ puts "importing twitter page #{query[:page] || 0}"
34
+ query[:page] = (query[:page] || 0) + 1
35
+ end
36
+
37
+ @posts += results
38
+
39
+ results.map do |res|
40
+ urls = res["text"].scan(/(http:\/\/[^,()" ]+)/).flatten
41
+ ats = res["text"].scan(/@([[:alnum:]]+)/).flatten
42
+ hashtags = res["text"].scan(/#([[:alnum:]]+)/).flatten
43
+ retweet = res["text"] =~ /RT[^a-zA-Z]/ || res["text"] =~ /\(via @[^)]+\)/
44
+
45
+ urls.map do |url|
46
+ post = @reddilicious.bookmark_for(url)
47
+ post.tags = [
48
+ "via:twitter",
49
+ Post::NEW_MARKER,
50
+ ats.map{|a| "to:" + a}.sort,
51
+ "from:#{res["user"]["screen_name"]}",
52
+ hashtags,
53
+ ("retweet" if retweet),
54
+ post.tags
55
+ ].compact.flatten.join(" ").strip
56
+
57
+ post.extended = "Imported from http://twitter.com/#{res["user"]["screen_name"]}/status/#{res["id"]}\n\n#{res["text"]}"
58
+ post.dt = date(res).strftime("%Y-%m-%dT%H:%M:%SZ")
59
+
60
+ post
61
+ end
62
+ end.flatten
63
+ end
64
+
65
+
66
+ def get_tweets(query)
67
+ begin
68
+ res = Twitter.get("/statuses/friends_timeline.json", :query => query, :basic_auth => {:username => credentials["username"], :password => credentials["password"]})
69
+ raise "Error fetching timeline: '#{res['error']}'" if res.is_a?(Hash) && res['error']
70
+ res
71
+ rescue Crack::ParseError
72
+ raise # TODO
73
+ end
74
+ end
75
+
76
+
77
+ def identifier(post)
78
+ post["id"]
79
+ end
80
+
81
+ def date(post)
82
+ DateTime.parse(post["created_at"])
83
+ end
84
+
85
+
86
+ def ask_for_credentials
87
+ puts "#{name} user name:"
88
+ user = STDIN.gets.strip
89
+ puts "#{name} password:"
90
+ pass = STDIN.gets.strip
91
+ self.credentials = {"username" => user, "password" => pass}
92
+ end
93
+ end
94
+
95
+
96
+ end
@@ -0,0 +1,48 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = %q{reddilicious}
5
+ s.version = "0.0.1"
6
+
7
+ s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
8
+ s.authors = ["David R. MacIver"]
9
+ s.date = %q{2009-07-16}
10
+ s.default_executable = %q{reddilicious}
11
+ s.email = %q{david.maciver@gmail.com}
12
+ s.executables = ["reddilicious"]
13
+ s.extra_rdoc_files = [
14
+ "LICENSE",
15
+ "README.markdown"
16
+ ]
17
+ s.files = [
18
+ "LICENSE",
19
+ "README.markdown",
20
+ "Rakefile",
21
+ "VERSION",
22
+ "bin/reddilicious",
23
+ "lib/blacklist.rb",
24
+ "lib/delicious.rb",
25
+ "lib/post.rb",
26
+ "lib/reddilicious.rb",
27
+ "lib/reddit.rb",
28
+ "lib/site.rb",
29
+ "lib/stumbleupon.rb",
30
+ "lib/twitter.rb",
31
+ "reddilicious.gemspec"
32
+ ]
33
+ s.homepage = %q{http://github.com/DRMacIver/reddilicious}
34
+ s.rdoc_options = ["--charset=UTF-8"]
35
+ s.require_paths = ["lib"]
36
+ s.rubygems_version = %q{1.3.4}
37
+ s.summary = %q{reddilicious is a tool for automatically importing links into delicious}
38
+
39
+ if s.respond_to? :specification_version then
40
+ current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
41
+ s.specification_version = 3
42
+
43
+ if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
44
+ else
45
+ end
46
+ else
47
+ end
48
+ end
metadata ADDED
@@ -0,0 +1,67 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: DRMacIver-reddilicious
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - David R. MacIver
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-07-16 00:00:00 -07:00
13
+ default_executable: reddilicious
14
+ dependencies: []
15
+
16
+ description:
17
+ email: david.maciver@gmail.com
18
+ executables:
19
+ - reddilicious
20
+ extensions: []
21
+
22
+ extra_rdoc_files:
23
+ - LICENSE
24
+ - README.markdown
25
+ files:
26
+ - LICENSE
27
+ - README.markdown
28
+ - Rakefile
29
+ - VERSION
30
+ - bin/reddilicious
31
+ - lib/blacklist.rb
32
+ - lib/delicious.rb
33
+ - lib/post.rb
34
+ - lib/reddilicious.rb
35
+ - lib/reddit.rb
36
+ - lib/site.rb
37
+ - lib/stumbleupon.rb
38
+ - lib/twitter.rb
39
+ - reddilicious.gemspec
40
+ has_rdoc: false
41
+ homepage: http://github.com/DRMacIver/reddilicious
42
+ post_install_message:
43
+ rdoc_options:
44
+ - --charset=UTF-8
45
+ require_paths:
46
+ - lib
47
+ required_ruby_version: !ruby/object:Gem::Requirement
48
+ requirements:
49
+ - - ">="
50
+ - !ruby/object:Gem::Version
51
+ version: "0"
52
+ version:
53
+ required_rubygems_version: !ruby/object:Gem::Requirement
54
+ requirements:
55
+ - - ">="
56
+ - !ruby/object:Gem::Version
57
+ version: "0"
58
+ version:
59
+ requirements: []
60
+
61
+ rubyforge_project:
62
+ rubygems_version: 1.2.0
63
+ signing_key:
64
+ specification_version: 3
65
+ summary: reddilicious is a tool for automatically importing links into delicious
66
+ test_files: []
67
+