ratr 0.2.0 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- ZTJjODU4ZWJjMTU5MmVlMzEyOGNkMDY1MDA1NThlODM4MjhlNzNkYw==
4
+ ZTcyMWMwNDQwMzZlMjVkYTQyM2UxYWNlNzE2ZTNkNTczYTZkMmIxMA==
5
5
  data.tar.gz: !binary |-
6
- ZDIyZWMzZGMwNzc5NDk4YjY4NWRlNWQ2M2U0Yzk1YjU2NGJmM2Q3NA==
6
+ ZjJiOThiYzQxNTU4NDkyMmQ5MGVkMjkwMDQxNGM2YjQ4ZmI5OTIwMQ==
7
7
  SHA512:
8
8
  metadata.gz: !binary |-
9
- Y2FmMDVlOTJmYTc1ZjViNDJiMTcwNjY2NGYwYWYwZDdhYjc0ZDk2MGFlNDdj
10
- Y2Q5OTg1MjU1NjU4Y2YwZWZkMWNkMjBjODkzYjhjZjI4NmVlNDQ3YzZkN2U2
11
- ZTAwYjk2YTg3NjJjMGMzNmNhNWRlZDQ3NDkzZTZlZGQ0YmNhYTY=
9
+ Yzc0ZTkwOTVjOGMxZDQxZDBlZjY0ODBkYzM3YmFmNWY5MWFlZGY1NmRiODI1
10
+ MjI2MGNmNzNlNmVhYzRlNjIzNDQ5NjM0MmM0NzkyYTMzYmM2M2EzMGNhNzA2
11
+ YTljMzZjYmRjMjQxZjdlODc2ODdiYWFjMmMwNWE2Y2I5OGMzNDU=
12
12
  data.tar.gz: !binary |-
13
- ZDM3YzRlNGMyODQwNmZiMjAxZjljYWNjNzhkYzkwMzE1NjcwMWJlZWEzNTAz
14
- NTdmMWRjOTkwZDM0OTYwMjE4NTY4MWNhYmJmM2RlYmEzODRmOWZkN2Y3ZmNi
15
- NjY1M2UyZmZlZmFhZTU5NDkzYTQ5M2I3NzQ5ZTJjZDk4MjliNTI=
13
+ MTUzZTA3Y2VlZmY0ZmU0ZTkwNDFhOGIyODY5ZjhkMDI4OWFmNDdhZjE3YTcw
14
+ MWFhMzY5YzgzZWZjNTM2MzIyYmI5MGJlOTc4NmMzYmQxOGM5NmNkNGI2ZTlk
15
+ YmIzNDNjYjhjZThkZjhmMmQzZjhiNzlmMjYyNzUyMDFkNzg5Yjc=
@@ -0,0 +1 @@
1
+ *.gem
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --color
2
+ --require spec_helper
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- ratr (0.2.0)
4
+ ratr (1.0.0)
5
5
  faraday (~> 0.9)
6
6
  faraday_middleware (~> 0.9)
7
7
  typhoeus (~> 0.7)
@@ -9,6 +9,10 @@ PATH
9
9
  GEM
10
10
  remote: https://rubygems.org/
11
11
  specs:
12
+ addressable (2.3.6)
13
+ crack (0.4.2)
14
+ safe_yaml (~> 1.0.0)
15
+ diff-lcs (1.2.5)
12
16
  ethon (0.7.3)
13
17
  ffi (>= 1.3.0)
14
18
  faraday (0.9.1)
@@ -17,11 +21,32 @@ GEM
17
21
  faraday (>= 0.7.4, < 0.10)
18
22
  ffi (1.9.8)
19
23
  multipart-post (2.0.0)
24
+ rspec (3.2.0)
25
+ rspec-core (~> 3.2.0)
26
+ rspec-expectations (~> 3.2.0)
27
+ rspec-mocks (~> 3.2.0)
28
+ rspec-core (3.2.3)
29
+ rspec-support (~> 3.2.0)
30
+ rspec-expectations (3.2.1)
31
+ diff-lcs (>= 1.2.0, < 2.0)
32
+ rspec-support (~> 3.2.0)
33
+ rspec-mocks (3.2.1)
34
+ diff-lcs (>= 1.2.0, < 2.0)
35
+ rspec-support (~> 3.2.0)
36
+ rspec-support (3.2.2)
37
+ safe_yaml (1.0.4)
20
38
  typhoeus (0.7.1)
21
39
  ethon (>= 0.7.1)
40
+ vcr (2.9.3)
41
+ webmock (1.11.0)
42
+ addressable (>= 2.2.7)
43
+ crack (>= 0.3.2)
22
44
 
23
45
  PLATFORMS
24
46
  ruby
25
47
 
26
48
  DEPENDENCIES
27
49
  ratr!
50
+ rspec (~> 3.1)
51
+ vcr (~> 2.9)
52
+ webmock (~> 1.11)
data/README.md CHANGED
@@ -12,39 +12,108 @@ USAGE
12
12
  gem install ratr
13
13
  ratr
14
14
 
15
- 20 Feet from Stardom 2013 99 82 9.05
16
- 12 Years a Slave 2013 8.1 96 90 8.9
17
- 7th Heaven 1927 7.9 100 87 8.866666666666665
18
- 7 Faces of Dr. Lao 1964 7.3 100 78 8.366666666666667
19
- 8 Mile 2002 6.9 76 54 6.633333333333333
15
+ 20 Feet from Stardom (2013): 9.1
16
+ 12 Years a Slave (2013): 8.9
17
+ 7th Heaven (1927): 8.0
18
+ 7 Faces of Dr. Lao (1964): 8.4
19
+ 8 Mile (2002): 6.6
20
20
  ```
21
21
 
22
- TODOs
23
- -----
22
+ DEVELOPMENT
23
+ -----------
24
+ To run the test suite:
25
+
26
+ ```sh
27
+ git clone git@github.com:patricksrobertson/ratr.git
28
+ cd ratr
29
+ bundle
30
+
31
+ rspec
32
+ ```
33
+
34
+ To throw out the existing VCR cassettes:
35
+
36
+ ```sh
37
+ rm -rf spec/vcr
38
+ rspec
39
+ ```
40
+
41
+ (note the VCR cassette doesn't seem to retry quite as well as the real code,
42
+ so there may be non-deterministic failures as a result of the recording. I
43
+ suggest for the sake of this example that you keep the cassettes)
44
+
45
+ PERFORMANCE
46
+ -----------
47
+
48
+ The application should run within the bounds of the Rotten Tomatoes (RT) expected
49
+ response time which I calculate at roughly between 3m45s and 7m50s or so. The
50
+ idea bethind the calculation is that RT actually limits on a per second basis,
51
+ so you can at best case complete 5 requests per second (.2 sec/req) but in a failure
52
+ ridden case you are probably doubling the time on request( .4 sec/req). I'm running
53
+ two parallel requests at a time on the OMDB API, as it tends to respond more slowly
54
+ than the RT API and they are comfortable.
55
+
56
+ The application pools both sources into independent threads and then blocks until
57
+ both operations are complete. Earlier versions ran this in serial, resulting in
58
+ unacceptable runtimes.
59
+
60
+ The way to speed this up would be to have multiple RT API keys, and split the movies
61
+ collection up into N components. Probably not what RT had in mind while doing rate
62
+ limiting.
63
+
64
+ DURABILITY
65
+ ----------
66
+ Neither API particualrly documents the error codes. The only error code I (frequently)
67
+ ran into was a 403 on the RT API -- which should've been a 503 for rate limit exceeded.
68
+
69
+ The OMDB API scraper handles the 404 case well -- so well that I didn't realize some of
70
+ the movies weren't found on OMDB until I looked at the VCR cassettes.
71
+
72
+ Given that I wasn't able to find a maintenance code response from either API -- I deferred
73
+ implementing specific error handling in that event. The last case to consider was timeouts
74
+ on the applications network inoperability, I ran out of time to add that handling in at the end.
75
+
76
+ I would normally do that.
77
+
78
+ EXTENSIBILITY
79
+ -------------
80
+
81
+ The bulk of my effort was devoted to making this slightly more extensible. I constructed a Source
82
+ class that combines an HTTP processor (serial or concurrent) and an HTTP wrapper that tells the source
83
+ how to fetch an individual movie and then scrub the responses down to pertinent scores. The thread manager
84
+ can take N number of sources so in order to add a new source you'd do the following:
85
+
86
+ * Write HTTP wrapper class, defining the interface of #get, #scrub, #adapter.
87
+ * Add the source (indicating whether it is serial or concurrent) to the Application class
88
+ * Add the source into the sources array passed into the thread manager.
89
+
90
+ I got a little hand-wavy on my testing of these things. With time allotment light, I aimed for a
91
+ 'refactor away a spike' technique where I wrote acceptance tests, then once that was complete
92
+ started moving code around. So on the individual basis the tests are a little light, and I totally
93
+ avoided doubling concrete objects. In order to be able to test with doubles, I'd probably do this:
24
94
 
25
- * Thread management in a real-world example is going to need to be a lil better
26
- than my off-the-cuff example.
95
+ * Add tests for the specific interfaces/roles (HttpProcessor, HttpWrapper)
96
+ * Share said tests on the concrete examples
97
+ * Feel much more confident about doubling -- now if the interface or a concrete example fails
98
+ the test suite can catch that at the same rate of the implementation.
27
99
 
28
- * handle exceptional errors with approriate measures
29
- ** 500 level responses are probably going to be universal -- should stop activity
100
+ To also be fair, I'm not happy about the HTTP wrappers. The scrub method smells of doing too much.
101
+ I'd look at that with caution and potentially break it out if another source makes this look bad.
30
102
 
31
- * TATFT
103
+ I didn't make the movies collection a real object -- there are some smells of accessing the movies
104
+ collection like a data structure instead of sending messages to it. Making something that implements
105
+ Enumerable would've been preferrable (in addition to doing find methods). The same can be said for
106
+ the collection of results that the Manager returns.
32
107
 
33
- * Make this OO instead of proceedural
108
+ TIME SPENT
109
+ ----------
34
110
 
35
- My personal notes:
36
- * One API source can have multiple ratings (ie rotten tomatoes has two scores).
37
- * An individual rating source seems to have two to three important dimensions.
38
- ** Scalar (how to get this to a ten point scale)
39
- ** N/A (stored as nil, N/A, -1)
40
- ** Weight -- should I apply the same weight to audience ratings as critic ratings?
111
+ Spike 1 (serial access) [branch pr-spike] - 45 minutes. Goal was to gain understanding of APIs,
112
+ worst case runtime.
41
113
 
42
- * From best case to worst case in terms of RT runtime: 3.7 - 7.7 minutes (.2 sec per req
43
- to doubling request time due to timeouts at .4 seconds per req). This solution
44
- actually runs fairly comfortably in that timeframe. From what I can tell, it looks like
45
- RT may have a short lived cache, where those hits don't count against your rate limit. Makes
46
- subsequent runs faster if done quickly after first.
47
- * For the sake of this exercise, I think client side caching of the requests is probably
48
- a no-no in terms of runtime, but will need to be captured on the testing end to avoid
49
- quickly hitting my request limit on RT's API.
114
+ Spike 2 (parallel access) [branch pr-multi_spike] - 1 hour. Goal was to bring runtime down to
115
+ the time it takes to pull results from the RT API.
50
116
 
117
+ Production implementation [master / pr-refactor_from_spike] - 3-4 hours. I dabbled in this throughout
118
+ an evening and did a little work the following morning. Goal was to write tests that could re-run without
119
+ hitting the APIs (rate limit rules everything around me) and bring the required application into an MVP.
@@ -0,0 +1 @@
1
+ require "bundler/gem_tasks"
data/bin/ratr CHANGED
@@ -1,113 +1,6 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
- require 'csv'
4
- require 'ostruct'
5
- require 'faraday'
6
- require 'faraday_middleware'
7
- require 'typhoeus'
8
- require 'typhoeus/adapters/faraday'
3
+ require 'ratr'
9
4
 
10
- # RT is really the problem child when it comes to requesting
11
- # multiple searches at once. It will return a 403 (forbidden)
12
- # code on rate limit exceeded which is a pain in the bum.
13
- #
14
- # Faraday has ERR::Timeout retries built in, but this is
15
- # unfortunatley not able to tap into it. The best I can do
16
- # is to roll my own custom retry mechanism. The idea is
17
- # to try and retry sooner than the 0.02 under most circumstances.
18
- def tomato_request(movie, http)
19
- retry_count = 3
20
- http.get do |request|
21
- request.url '/api/public/v1.0/movies.json'
22
- request.params['q'] = "#{movie.title} #{movie.year}"
23
- request.params['apikey'] = "ww8qgxbhjbqudvupbr8sqd7x"
24
- request.params['page_limit'] = 1
25
- end
26
- rescue Faraday::Error::ClientError
27
- case retry_count
28
- when 3
29
- sleep 0.05
30
- when 2
31
- sleep 0.1
32
- when 1
33
- sleep 0.2
34
- end
35
5
 
36
- retry_count -=1
37
- retry
38
- end
39
-
40
- # read incoming csv file
41
- movies = []
42
- CSV.foreach("examples/movies_small.csv") do |row|
43
- #assign each movie a title and year. Note: year is not guaranteed
44
- movies << OpenStruct.new(title: row[0], year: row[1])
45
- end
46
-
47
- manager = Typhoeus::Hydra.new(max_concurrency: 2)
48
- omdb = Faraday.new(url: 'http://www.omdbapi.com') do |faraday|
49
- faraday.request :json # form-encode POST params
50
- faraday.response :json # log requests to STDOUT
51
- faraday.adapter :typhoeus # make requests with Net::HTTP
52
- end
53
-
54
- tomatoes = Faraday.new(url: 'http://api.rottentomatoes.com') do |faraday|
55
- faraday.request :json
56
- faraday.response :json
57
- faraday.response :raise_error
58
- faraday.adapter :typhoeus
59
- end
60
-
61
- # Fork a thread out for the OMDB requests
62
- omdb_responses = {}
63
- t1 = Thread.new {
64
- omdb.in_parallel(manager) do
65
- movies.each_with_index do |movie, index|
66
- omdb_responses[index] = omdb.get do |request|
67
- request.url '/'
68
- request.params['t'] = movie.title
69
- request.params['y'] = movie.year
70
- end
71
- end
72
- end
73
- omdb_responses
74
- }
75
-
76
- # Fork out a thread for RT requests
77
- tomato_responses = {}
78
- t2 = Thread.new {
79
- movies.each_with_index do |movie, index|
80
- tomato_responses[index] = tomato_request(movie, tomatoes)
81
- end
82
- }
83
-
84
- # block until the threads are complete
85
- t1.join
86
- t2.join
87
-
88
- movies.each_with_index do |movie, index|
89
- movie.imdb_rating = omdb_responses[index].body['imdbRating']
90
- tomato_ratings = tomato_responses[index].body["movies"][0] && tomato_responses[index].body["movies"][0]["ratings"]
91
-
92
- if tomato_ratings
93
- movie.tomatoes_critic_rating = tomato_ratings["critics_score"]
94
- movie.tomatoes_audience_rating = tomato_ratings["audience_score"]
95
- end
96
-
97
- scores = []
98
- scores << movie.imdb_rating.to_f unless movie.imdb_rating == nil || movie.imdb_rate == 'N/A'
99
- scores << (movie.tomatoes_critic_rating.to_f / 10.0) unless movie.tomatoes_critic_rating == nil || movie.tomatoes_critic_rating == '-1'
100
- scores << (movie.tomatoes_audience_rating.to_f / 10.0) unless movie.tomatoes_audience_rating == nil || movie.tomatoes_audience_rating == '-1'
101
-
102
- if scores.size > 0
103
- movie.average = scores.reduce(0) {|m,v| m+=v} / scores.size if scores.size > 0
104
- else
105
- movie.average = 0
106
- end
107
- end
108
-
109
-
110
- # sort the movies collection by average rating -- then display.
111
- movies.sort {|a,b| b.average <=> a.average}.each do |movie|
112
- puts "#{movie.title} (#{movie.year}): #{movie.average}"
113
- end
6
+ puts Ratr::Application.new("examples/movies.csv").call
@@ -3,23 +3,3 @@
3
3
  8 Mile,2002
4
4
  12 Years a Slave,2013
5
5
  20 Feet from Stardom,2013
6
- 2001: A Space Odyssey,1968
7
- 20000 Leagues Under the Sea,1954
8
- The Abyss,1989
9
- The Accidental Tourist,1988
10
- The Accountant,2001
11
- The Accused,1988
12
- Adaptation,2002
13
- Adventures of Don Juan,1948
14
- "The Adventures of Priscilla, Queen of the Desert",1994
15
- The Adventures of Robin Hood,1938
16
- Affliction,1997
17
- The African Queen,1951
18
- The Age of Innocence,1993
19
- Air Force,1943
20
- Airport,1970
21
- Aladdin,1992
22
- The Alamo,1960
23
- The Alaskan Eskimo,1953
24
- Albert Schweitzer,1957
25
- Alexander's Ragtime Band,1938
@@ -0,0 +1,12 @@
1
+ module Ratr
2
+ autoload :Application, 'ratr/application'
3
+ autoload :ParallelHttpProcessor, 'ratr/parallel_http_processor'
4
+ autoload :SerialHttpProcessor, 'ratr/serial_http_processor'
5
+ autoload :OmdbHttpWrapper, 'ratr/omdb_http_wrapper'
6
+ autoload :RottenTomatoesHttpWrapper, 'ratr/rotten_tomatoes_http_wrapper'
7
+ autoload :Source, 'ratr/source'
8
+ autoload :PoolManager, 'ratr/pool_manager'
9
+ autoload :ArrayOfHashes, 'ratr/utils/array_of_hashes'
10
+ autoload :AverageCalculator, 'ratr/utils/average_calculator'
11
+ autoload :MetaRating, 'ratr/meta_rating'
12
+ end
@@ -0,0 +1,58 @@
1
+ require 'csv'
2
+ require 'ostruct'
3
+ require 'faraday'
4
+ require 'faraday_middleware'
5
+ require 'typhoeus'
6
+ require 'typhoeus/adapters/faraday'
7
+
8
+ module Ratr
9
+ class Application
10
+ attr_reader :file
11
+ def initialize(file = "examples/movies_small.csv")
12
+ @file = file
13
+ end
14
+
15
+ def call
16
+ # read incoming csv file
17
+ movies = {}
18
+ movie_key = 0
19
+ CSV.foreach(file) do |row|
20
+ #assign each movie a title and year. Note: year is not guaranteed
21
+ movies[movie_key] = OpenStruct.new(title: row[0], year: row[1])
22
+ movie_key += 1
23
+ end
24
+
25
+ manager = Ratr::PoolManager.new(sources)
26
+ manager.process(movies)
27
+ manager.join
28
+
29
+ movies.each do |index, movie|
30
+ scores = manager.merged_results[index].compact
31
+
32
+ movie.average = Ratr::AverageCalculator.call(scores)
33
+ end
34
+
35
+ # sort the movies collection by average rating -- then display.
36
+ output = []
37
+ movies.values.sort {|a,b| b.average <=> a.average}.each do |movie|
38
+ output << "#{movie.title} (#{movie.year}): #{movie.average}"
39
+ end
40
+
41
+ return output
42
+ end
43
+
44
+ private
45
+
46
+ def sources
47
+ @sources ||= [omdb, rotten_tomatoes]
48
+ end
49
+
50
+ def omdb
51
+ @omdb ||= Ratr::Source.new(Ratr::ParallelHttpProcessor, Ratr::OmdbHttpWrapper.new)
52
+ end
53
+
54
+ def rotten_tomatoes
55
+ @rotten_tomatoes = Ratr::Source.new(Ratr::SerialHttpProcessor, Ratr::RottenTomatoesHttpWrapper.new)
56
+ end
57
+ end
58
+ end
@@ -0,0 +1,26 @@
1
+ module Ratr
2
+ class MetaRating
3
+ include Comparable
4
+
5
+ attr_reader :score, :no_score
6
+ def initialize(score, no_score = 'pity party')
7
+ @score, @no_score = score, no_score
8
+ end
9
+
10
+ def <=>(other)
11
+ return -1 if self.not_comparable?
12
+ return 1 if other.not_comparable?
13
+
14
+ self.score <=> other.score
15
+ end
16
+
17
+ def to_s
18
+ score.to_s
19
+ end
20
+
21
+ protected
22
+ def not_comparable?
23
+ score == no_score
24
+ end
25
+ end
26
+ end