ratr 0.2.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- ZTJjODU4ZWJjMTU5MmVlMzEyOGNkMDY1MDA1NThlODM4MjhlNzNkYw==
4
+ ZTcyMWMwNDQwMzZlMjVkYTQyM2UxYWNlNzE2ZTNkNTczYTZkMmIxMA==
5
5
  data.tar.gz: !binary |-
6
- ZDIyZWMzZGMwNzc5NDk4YjY4NWRlNWQ2M2U0Yzk1YjU2NGJmM2Q3NA==
6
+ ZjJiOThiYzQxNTU4NDkyMmQ5MGVkMjkwMDQxNGM2YjQ4ZmI5OTIwMQ==
7
7
  SHA512:
8
8
  metadata.gz: !binary |-
9
- Y2FmMDVlOTJmYTc1ZjViNDJiMTcwNjY2NGYwYWYwZDdhYjc0ZDk2MGFlNDdj
10
- Y2Q5OTg1MjU1NjU4Y2YwZWZkMWNkMjBjODkzYjhjZjI4NmVlNDQ3YzZkN2U2
11
- ZTAwYjk2YTg3NjJjMGMzNmNhNWRlZDQ3NDkzZTZlZGQ0YmNhYTY=
9
+ Yzc0ZTkwOTVjOGMxZDQxZDBlZjY0ODBkYzM3YmFmNWY5MWFlZGY1NmRiODI1
10
+ MjI2MGNmNzNlNmVhYzRlNjIzNDQ5NjM0MmM0NzkyYTMzYmM2M2EzMGNhNzA2
11
+ YTljMzZjYmRjMjQxZjdlODc2ODdiYWFjMmMwNWE2Y2I5OGMzNDU=
12
12
  data.tar.gz: !binary |-
13
- ZDM3YzRlNGMyODQwNmZiMjAxZjljYWNjNzhkYzkwMzE1NjcwMWJlZWEzNTAz
14
- NTdmMWRjOTkwZDM0OTYwMjE4NTY4MWNhYmJmM2RlYmEzODRmOWZkN2Y3ZmNi
15
- NjY1M2UyZmZlZmFhZTU5NDkzYTQ5M2I3NzQ5ZTJjZDk4MjliNTI=
13
+ MTUzZTA3Y2VlZmY0ZmU0ZTkwNDFhOGIyODY5ZjhkMDI4OWFmNDdhZjE3YTcw
14
+ MWFhMzY5YzgzZWZjNTM2MzIyYmI5MGJlOTc4NmMzYmQxOGM5NmNkNGI2ZTlk
15
+ YmIzNDNjYjhjZThkZjhmMmQzZjhiNzlmMjYyNzUyMDFkNzg5Yjc=
@@ -0,0 +1 @@
1
+ *.gem
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --color
2
+ --require spec_helper
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- ratr (0.2.0)
4
+ ratr (1.0.0)
5
5
  faraday (~> 0.9)
6
6
  faraday_middleware (~> 0.9)
7
7
  typhoeus (~> 0.7)
@@ -9,6 +9,10 @@ PATH
9
9
  GEM
10
10
  remote: https://rubygems.org/
11
11
  specs:
12
+ addressable (2.3.6)
13
+ crack (0.4.2)
14
+ safe_yaml (~> 1.0.0)
15
+ diff-lcs (1.2.5)
12
16
  ethon (0.7.3)
13
17
  ffi (>= 1.3.0)
14
18
  faraday (0.9.1)
@@ -17,11 +21,32 @@ GEM
17
21
  faraday (>= 0.7.4, < 0.10)
18
22
  ffi (1.9.8)
19
23
  multipart-post (2.0.0)
24
+ rspec (3.2.0)
25
+ rspec-core (~> 3.2.0)
26
+ rspec-expectations (~> 3.2.0)
27
+ rspec-mocks (~> 3.2.0)
28
+ rspec-core (3.2.3)
29
+ rspec-support (~> 3.2.0)
30
+ rspec-expectations (3.2.1)
31
+ diff-lcs (>= 1.2.0, < 2.0)
32
+ rspec-support (~> 3.2.0)
33
+ rspec-mocks (3.2.1)
34
+ diff-lcs (>= 1.2.0, < 2.0)
35
+ rspec-support (~> 3.2.0)
36
+ rspec-support (3.2.2)
37
+ safe_yaml (1.0.4)
20
38
  typhoeus (0.7.1)
21
39
  ethon (>= 0.7.1)
40
+ vcr (2.9.3)
41
+ webmock (1.11.0)
42
+ addressable (>= 2.2.7)
43
+ crack (>= 0.3.2)
22
44
 
23
45
  PLATFORMS
24
46
  ruby
25
47
 
26
48
  DEPENDENCIES
27
49
  ratr!
50
+ rspec (~> 3.1)
51
+ vcr (~> 2.9)
52
+ webmock (~> 1.11)
data/README.md CHANGED
@@ -12,39 +12,108 @@ USAGE
12
12
  gem install ratr
13
13
  ratr
14
14
 
15
- 20 Feet from Stardom 2013 99 82 9.05
16
- 12 Years a Slave 2013 8.1 96 90 8.9
17
- 7th Heaven 1927 7.9 100 87 8.866666666666665
18
- 7 Faces of Dr. Lao 1964 7.3 100 78 8.366666666666667
19
- 8 Mile 2002 6.9 76 54 6.633333333333333
15
+ 20 Feet from Stardom (2013): 9.1
16
+ 12 Years a Slave (2013): 8.9
17
+ 7th Heaven (1927): 8.0
18
+ 7 Faces of Dr. Lao (1964): 8.4
19
+ 8 Mile (2002): 6.6
20
20
  ```
21
21
 
22
- TODOs
23
- -----
22
+ DEVELOPMENT
23
+ -----------
24
+ To run the test suite:
25
+
26
+ ```sh
27
+ git clone git@github.com:patricksrobertson/ratr.git
28
+ cd ratr
29
+ bundle
30
+
31
+ rspec
32
+ ```
33
+
34
+ To throw out the existing VCR cassettes:
35
+
36
+ ```sh
37
+ rm -rf spec/vcr
38
+ rspec
39
+ ```
40
+
41
+ (note the VCR cassette doesn't seem to retry quite as well as the real code,
42
+ so there may be non-deterministic failures as a result of the recording. I
43
+ suggest for the sake of this example that you keep the cassettes)
44
+
45
+ PERFORMANCE
46
+ -----------
47
+
48
+ The application should run within the bounds of the Rotten Tomatoes (RT) expected
49
+ response time which I calculate at roughly between 3m45s and 7m50s or so. The
50
+ idea bethind the calculation is that RT actually limits on a per second basis,
51
+ so you can at best case complete 5 requests per second (.2 sec/req) but in a failure
52
+ ridden case you are probably doubling the time on request( .4 sec/req). I'm running
53
+ two parallel requests at a time on the OMDB API, as it tends to respond more slowly
54
+ than the RT API and they are comfortable.
55
+
56
+ The application pools both sources into independent threads and then blocks until
57
+ both operations are complete. Earlier versions ran this in serial, resulting in
58
+ unacceptable runtimes.
59
+
60
+ The way to speed this up would be to have multiple RT API keys, and split the movies
61
+ collection up into N components. Probably not what RT had in mind while doing rate
62
+ limiting.
63
+
64
+ DURABILITY
65
+ ----------
66
+ Neither API particualrly documents the error codes. The only error code I (frequently)
67
+ ran into was a 403 on the RT API -- which should've been a 503 for rate limit exceeded.
68
+
69
+ The OMDB API scraper handles the 404 case well -- so well that I didn't realize some of
70
+ the movies weren't found on OMDB until I looked at the VCR cassettes.
71
+
72
+ Given that I wasn't able to find a maintenance code response from either API -- I deferred
73
+ implementing specific error handling in that event. The last case to consider was timeouts
74
+ on the applications network inoperability, I ran out of time to add that handling in at the end.
75
+
76
+ I would normally do that.
77
+
78
+ EXTENSIBILITY
79
+ -------------
80
+
81
+ The bulk of my effort was devoted to making this slightly more extensible. I constructed a Source
82
+ class that combines an HTTP processor (serial or concurrent) and an HTTP wrapper that tells the source
83
+ how to fetch an individual movie and then scrub the responses down to pertinent scores. The thread manager
84
+ can take N number of sources so in order to add a new source you'd do the following:
85
+
86
+ * Write HTTP wrapper class, defining the interface of #get, #scrub, #adapter.
87
+ * Add the source (indicating whether it is serial or concurrent) to the Application class
88
+ * Add the source into the sources array passed into the thread manager.
89
+
90
+ I got a little hand-wavy on my testing of these things. With time allotment light, I aimed for a
91
+ 'refactor away a spike' technique where I wrote acceptance tests, then once that was complete
92
+ started moving code around. So on the individual basis the tests are a little light, and I totally
93
+ avoided doubling concrete objects. In order to be able to test with doubles, I'd probably do this:
24
94
 
25
- * Thread management in a real-world example is going to need to be a lil better
26
- than my off-the-cuff example.
95
+ * Add tests for the specific interfaces/roles (HttpProcessor, HttpWrapper)
96
+ * Share said tests on the concrete examples
97
+ * Feel much more confident about doubling -- now if the interface or a concrete example fails
98
+ the test suite can catch that at the same rate of the implementation.
27
99
 
28
- * handle exceptional errors with approriate measures
29
- ** 500 level responses are probably going to be universal -- should stop activity
100
+ To also be fair, I'm not happy about the HTTP wrappers. The scrub method smells of doing too much.
101
+ I'd look at that with caution and potentially break it out if another source makes this look bad.
30
102
 
31
- * TATFT
103
+ I didn't make the movies collection a real object -- there are some smells of accessing the movies
104
+ collection like a data structure instead of sending messages to it. Making something that implements
105
+ Enumerable would've been preferrable (in addition to doing find methods). The same can be said for
106
+ the collection of results that the Manager returns.
32
107
 
33
- * Make this OO instead of proceedural
108
+ TIME SPENT
109
+ ----------
34
110
 
35
- My personal notes:
36
- * One API source can have multiple ratings (ie rotten tomatoes has two scores).
37
- * An individual rating source seems to have two to three important dimensions.
38
- ** Scalar (how to get this to a ten point scale)
39
- ** N/A (stored as nil, N/A, -1)
40
- ** Weight -- should I apply the same weight to audience ratings as critic ratings?
111
+ Spike 1 (serial access) [branch pr-spike] - 45 minutes. Goal was to gain understanding of APIs,
112
+ worst case runtime.
41
113
 
42
- * From best case to worst case in terms of RT runtime: 3.7 - 7.7 minutes (.2 sec per req
43
- to doubling request time due to timeouts at .4 seconds per req). This solution
44
- actually runs fairly comfortably in that timeframe. From what I can tell, it looks like
45
- RT may have a short lived cache, where those hits don't count against your rate limit. Makes
46
- subsequent runs faster if done quickly after first.
47
- * For the sake of this exercise, I think client side caching of the requests is probably
48
- a no-no in terms of runtime, but will need to be captured on the testing end to avoid
49
- quickly hitting my request limit on RT's API.
114
+ Spike 2 (parallel access) [branch pr-multi_spike] - 1 hour. Goal was to bring runtime down to
115
+ the time it takes to pull results from the RT API.
50
116
 
117
+ Production implementation [master / pr-refactor_from_spike] - 3-4 hours. I dabbled in this throughout
118
+ an evening and did a little work the following morning. Goal was to write tests that could re-run without
119
+ hitting the APIs (rate limit rules everything around me) and bring the required application into an MVP.
@@ -0,0 +1 @@
1
+ require "bundler/gem_tasks"
data/bin/ratr CHANGED
@@ -1,113 +1,6 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
- require 'csv'
4
- require 'ostruct'
5
- require 'faraday'
6
- require 'faraday_middleware'
7
- require 'typhoeus'
8
- require 'typhoeus/adapters/faraday'
3
+ require 'ratr'
9
4
 
10
- # RT is really the problem child when it comes to requesting
11
- # multiple searches at once. It will return a 403 (forbidden)
12
- # code on rate limit exceeded which is a pain in the bum.
13
- #
14
- # Faraday has ERR::Timeout retries built in, but this is
15
- # unfortunatley not able to tap into it. The best I can do
16
- # is to roll my own custom retry mechanism. The idea is
17
- # to try and retry sooner than the 0.02 under most circumstances.
18
- def tomato_request(movie, http)
19
- retry_count = 3
20
- http.get do |request|
21
- request.url '/api/public/v1.0/movies.json'
22
- request.params['q'] = "#{movie.title} #{movie.year}"
23
- request.params['apikey'] = "ww8qgxbhjbqudvupbr8sqd7x"
24
- request.params['page_limit'] = 1
25
- end
26
- rescue Faraday::Error::ClientError
27
- case retry_count
28
- when 3
29
- sleep 0.05
30
- when 2
31
- sleep 0.1
32
- when 1
33
- sleep 0.2
34
- end
35
5
 
36
- retry_count -=1
37
- retry
38
- end
39
-
40
- # read incoming csv file
41
- movies = []
42
- CSV.foreach("examples/movies_small.csv") do |row|
43
- #assign each movie a title and year. Note: year is not guaranteed
44
- movies << OpenStruct.new(title: row[0], year: row[1])
45
- end
46
-
47
- manager = Typhoeus::Hydra.new(max_concurrency: 2)
48
- omdb = Faraday.new(url: 'http://www.omdbapi.com') do |faraday|
49
- faraday.request :json # form-encode POST params
50
- faraday.response :json # log requests to STDOUT
51
- faraday.adapter :typhoeus # make requests with Net::HTTP
52
- end
53
-
54
- tomatoes = Faraday.new(url: 'http://api.rottentomatoes.com') do |faraday|
55
- faraday.request :json
56
- faraday.response :json
57
- faraday.response :raise_error
58
- faraday.adapter :typhoeus
59
- end
60
-
61
- # Fork a thread out for the OMDB requests
62
- omdb_responses = {}
63
- t1 = Thread.new {
64
- omdb.in_parallel(manager) do
65
- movies.each_with_index do |movie, index|
66
- omdb_responses[index] = omdb.get do |request|
67
- request.url '/'
68
- request.params['t'] = movie.title
69
- request.params['y'] = movie.year
70
- end
71
- end
72
- end
73
- omdb_responses
74
- }
75
-
76
- # Fork out a thread for RT requests
77
- tomato_responses = {}
78
- t2 = Thread.new {
79
- movies.each_with_index do |movie, index|
80
- tomato_responses[index] = tomato_request(movie, tomatoes)
81
- end
82
- }
83
-
84
- # block until the threads are complete
85
- t1.join
86
- t2.join
87
-
88
- movies.each_with_index do |movie, index|
89
- movie.imdb_rating = omdb_responses[index].body['imdbRating']
90
- tomato_ratings = tomato_responses[index].body["movies"][0] && tomato_responses[index].body["movies"][0]["ratings"]
91
-
92
- if tomato_ratings
93
- movie.tomatoes_critic_rating = tomato_ratings["critics_score"]
94
- movie.tomatoes_audience_rating = tomato_ratings["audience_score"]
95
- end
96
-
97
- scores = []
98
- scores << movie.imdb_rating.to_f unless movie.imdb_rating == nil || movie.imdb_rate == 'N/A'
99
- scores << (movie.tomatoes_critic_rating.to_f / 10.0) unless movie.tomatoes_critic_rating == nil || movie.tomatoes_critic_rating == '-1'
100
- scores << (movie.tomatoes_audience_rating.to_f / 10.0) unless movie.tomatoes_audience_rating == nil || movie.tomatoes_audience_rating == '-1'
101
-
102
- if scores.size > 0
103
- movie.average = scores.reduce(0) {|m,v| m+=v} / scores.size if scores.size > 0
104
- else
105
- movie.average = 0
106
- end
107
- end
108
-
109
-
110
- # sort the movies collection by average rating -- then display.
111
- movies.sort {|a,b| b.average <=> a.average}.each do |movie|
112
- puts "#{movie.title} (#{movie.year}): #{movie.average}"
113
- end
6
+ puts Ratr::Application.new("examples/movies.csv").call
@@ -3,23 +3,3 @@
3
3
  8 Mile,2002
4
4
  12 Years a Slave,2013
5
5
  20 Feet from Stardom,2013
6
- 2001: A Space Odyssey,1968
7
- 20000 Leagues Under the Sea,1954
8
- The Abyss,1989
9
- The Accidental Tourist,1988
10
- The Accountant,2001
11
- The Accused,1988
12
- Adaptation,2002
13
- Adventures of Don Juan,1948
14
- "The Adventures of Priscilla, Queen of the Desert",1994
15
- The Adventures of Robin Hood,1938
16
- Affliction,1997
17
- The African Queen,1951
18
- The Age of Innocence,1993
19
- Air Force,1943
20
- Airport,1970
21
- Aladdin,1992
22
- The Alamo,1960
23
- The Alaskan Eskimo,1953
24
- Albert Schweitzer,1957
25
- Alexander's Ragtime Band,1938
@@ -0,0 +1,12 @@
1
+ module Ratr
2
+ autoload :Application, 'ratr/application'
3
+ autoload :ParallelHttpProcessor, 'ratr/parallel_http_processor'
4
+ autoload :SerialHttpProcessor, 'ratr/serial_http_processor'
5
+ autoload :OmdbHttpWrapper, 'ratr/omdb_http_wrapper'
6
+ autoload :RottenTomatoesHttpWrapper, 'ratr/rotten_tomatoes_http_wrapper'
7
+ autoload :Source, 'ratr/source'
8
+ autoload :PoolManager, 'ratr/pool_manager'
9
+ autoload :ArrayOfHashes, 'ratr/utils/array_of_hashes'
10
+ autoload :AverageCalculator, 'ratr/utils/average_calculator'
11
+ autoload :MetaRating, 'ratr/meta_rating'
12
+ end
@@ -0,0 +1,58 @@
1
+ require 'csv'
2
+ require 'ostruct'
3
+ require 'faraday'
4
+ require 'faraday_middleware'
5
+ require 'typhoeus'
6
+ require 'typhoeus/adapters/faraday'
7
+
8
+ module Ratr
9
+ class Application
10
+ attr_reader :file
11
+ def initialize(file = "examples/movies_small.csv")
12
+ @file = file
13
+ end
14
+
15
+ def call
16
+ # read incoming csv file
17
+ movies = {}
18
+ movie_key = 0
19
+ CSV.foreach(file) do |row|
20
+ #assign each movie a title and year. Note: year is not guaranteed
21
+ movies[movie_key] = OpenStruct.new(title: row[0], year: row[1])
22
+ movie_key += 1
23
+ end
24
+
25
+ manager = Ratr::PoolManager.new(sources)
26
+ manager.process(movies)
27
+ manager.join
28
+
29
+ movies.each do |index, movie|
30
+ scores = manager.merged_results[index].compact
31
+
32
+ movie.average = Ratr::AverageCalculator.call(scores)
33
+ end
34
+
35
+ # sort the movies collection by average rating -- then display.
36
+ output = []
37
+ movies.values.sort {|a,b| b.average <=> a.average}.each do |movie|
38
+ output << "#{movie.title} (#{movie.year}): #{movie.average}"
39
+ end
40
+
41
+ return output
42
+ end
43
+
44
+ private
45
+
46
+ def sources
47
+ @sources ||= [omdb, rotten_tomatoes]
48
+ end
49
+
50
+ def omdb
51
+ @omdb ||= Ratr::Source.new(Ratr::ParallelHttpProcessor, Ratr::OmdbHttpWrapper.new)
52
+ end
53
+
54
+ def rotten_tomatoes
55
+ @rotten_tomatoes = Ratr::Source.new(Ratr::SerialHttpProcessor, Ratr::RottenTomatoesHttpWrapper.new)
56
+ end
57
+ end
58
+ end
@@ -0,0 +1,26 @@
1
+ module Ratr
2
+ class MetaRating
3
+ include Comparable
4
+
5
+ attr_reader :score, :no_score
6
+ def initialize(score, no_score = 'pity party')
7
+ @score, @no_score = score, no_score
8
+ end
9
+
10
+ def <=>(other)
11
+ return -1 if self.not_comparable?
12
+ return 1 if other.not_comparable?
13
+
14
+ self.score <=> other.score
15
+ end
16
+
17
+ def to_s
18
+ score.to_s
19
+ end
20
+
21
+ protected
22
+ def not_comparable?
23
+ score == no_score
24
+ end
25
+ end
26
+ end