royw-imdb 0.0.17 → 0.0.19

Sign up to get free protection for your applications and to get access to all the features.
Files changed (31) hide show
  1. data/History.txt +90 -0
  2. data/README +1 -1
  3. data/Rakefile +96 -0
  4. data/VERSION.yml +4 -0
  5. data/lib/imdb/imdb_profile.rb +277 -0
  6. data/spec/cache_extensions.rb +91 -0
  7. data/spec/imdb_image_spec.rb +23 -0
  8. data/spec/imdb_movie_spec.rb +232 -0
  9. data/spec/imdb_profile_spec.rb +182 -0
  10. data/spec/imdb_search_spec.rb +191 -0
  11. data/spec/samples/sample_image.html +409 -0
  12. data/spec/samples/sample_incomplete_movie.html +588 -0
  13. data/spec/samples/sample_jet_pilot.html +929 -0
  14. data/spec/samples/sample_meltdown.html +528 -0
  15. data/spec/samples/sample_movie.html +1135 -0
  16. data/spec/samples/sample_open_season.html +524 -0
  17. data/spec/samples/sample_search.html +446 -0
  18. data/spec/samples/sample_spanish_search.html +390 -0
  19. data/spec/samples/sample_who_am_i_search.html +517 -0
  20. data/spec/samples/www.imdb.com/find?q=Meltdown;s=tt +528 -0
  21. data/spec/samples/www.imdb.com/find?q=National+Treasure%3A+Book+of+Secrets;s=tt +1036 -0
  22. data/spec/samples/www.imdb.com/find?q=National+Treasure+2;s=tt +1036 -0
  23. data/spec/samples/www.imdb.com/find?q=Open+Season;s=tt +524 -0
  24. data/spec/samples/www.imdb.com/find?q=The+Alamo;s=tt +531 -0
  25. data/spec/samples/www.imdb.com/find?q=Who+Am+I%3F;s=tt +517 -0
  26. data/spec/samples/www.imdb.com/find?q=indiana+jones;s=tt +517 -0
  27. data/spec/samples/www.imdb.com/find?q=some+extremely+specific+search+for+indiana+jones;s=tt +1135 -0
  28. data/spec/samples/www.imdb.com/rg/action-box-title/primary-photo/media/rm1203608832/tt0097576 +409 -0
  29. data/spec/spec_helper.rb +5 -0
  30. data/spec/string_extensions_spec.rb +25 -0
  31. metadata +53 -37
@@ -0,0 +1,90 @@
1
+ 0.0.7 royw-imdb
2
+ added Also Known As (aka) title matching
3
+ added ImdbSearch.find_id
4
+ added ImdbMovie.release_year
5
+ added ImdbMovie.raw_title
6
+ added ImdbMovie.video_game?
7
+ added ImdbMovie.also_known_as
8
+ added ImdbMovie.mpaa
9
+ added ImdbMovie.certifications
10
+
11
+ cleaned up merges, spec now passes
12
+ Merge branch 'master' of git://github.com/caleb/imdb
13
+ merged jasonrudolph branch
14
+ merged kieranj branch
15
+ forked from porras-imdb
16
+
17
+ 0.0.6
18
+ Merge patch from different fork
19
+ Correctly parse movies with only one writer
20
+ Add CHANGELOG and bump version to v0.0.6
21
+ Add support for searches that result in an exact match
22
+ Add method to retrieve movie year ...
23
+ - We could already retrieve the release date, but not all movies
24
+ have a release date (e.g., http://www.imdb.com/title/tt0464826/).
25
+ - All movies seem to have a year.
26
+ Strip duplicate movies from the search results
27
+ updated gemspec
28
+ changed poster url
29
+ Updated README
30
+ Added method to retrieve user rating
31
+ ImdbImage
32
+ now returns movie title instead of page title
33
+ now fetches large poster from different url
34
+ now works for multiple directors
35
+ Revert "Revert "Revert "Revert "default rake task""""
36
+
37
+ This reverts commit ed13a2fc544a0b9bbb2276a6ab38bc6a049c5972.
38
+ Revert "Revert "Revert "default rake task"""
39
+
40
+ This reverts commit b2114e2b342e11d3ad9bf8e857d2179d254ecf4c.
41
+ Revert "Revert "default rake task""
42
+
43
+ This reverts commit c662da7f4de5ad510ce706d38de1606f08b211d3.
44
+ Revert "default rake task"
45
+
46
+ This reverts commit 25dcceb4ba63f028b9683f7841b831b96db4d3f1.
47
+ default rake task
48
+ Dependencies on gemspec
49
+ Gitignoring *.gem
50
+ Uses IMDB to search because of partial matches (which Google doesn't find)
51
+ Doc updated + Gem version increment
52
+ Testing encoding issues
53
+ ImdbMovie#aspect_ratio
54
+ ImdbMovie#tagline
55
+ Rake task for incrementing the gem version
56
+ Small refactorization & added ImdbMovie#get_data
57
+ Using Google for searching
58
+ Included GemSpec
59
+ Updated README
60
+ Fixed encoding problems
61
+ String Extensions to unescape & recode HTML
62
+ - Testing against incomplete movie
63
+ - Handling of unkwnown data
64
+ - Some specs failing because of encodings
65
+ ImdbSearch#movies spec
66
+ Small refactoring
67
+ File reorganization
68
+ File reorganization
69
+ Small spec refactorization
70
+ ImdbMovie#plot
71
+ ImdbMovie#release_date
72
+ ImdbMovie#length
73
+ README with specs
74
+ ImdbMovie#company + ImdbMovie#photos
75
+ ImdbMovie#color
76
+ ImdbMovie#languages
77
+ ImdbMovie#countries + small fixes
78
+ ImdbMovie#genres
79
+ ImdbMovie#writers
80
+ ImdbMovie#cast_members
81
+ Specified ImdbMovie (and some implementation, but there are pending issues)
82
+ Some fixes
83
+ Git ignores
84
+ Updated paths
85
+ Directory organization
86
+ Specs refactoring
87
+ Specified ImdbSearch
88
+ Some tests & fixes
89
+ Hpricot searching
90
+ First version
data/README CHANGED
@@ -110,6 +110,6 @@ ImdbProfile
110
110
  - should load from a file if a :filespec option is passed and the file exists
111
111
  - should not load from a file if a :filespec option is passed and the file does not exists
112
112
 
113
- Finished in 9.20481 seconds
113
+ Finished in 9.302841 seconds
114
114
 
115
115
  85 examples, 0 failures
@@ -0,0 +1,96 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+ require 'spec/rake/spectask'
4
+
5
+ desc "Run all specs"
6
+ task :default => :spec
7
+
8
+ #desc "Run all specs"
9
+ #Spec::Rake::SpecTask.new('spec') do |t|
10
+ # t.spec_files = FileList['spec/**/*.rb']
11
+ #end
12
+ #
13
+ #desc "Run all specs and generate HTML report"
14
+ #Spec::Rake::SpecTask.new('spec:html') do |t|
15
+ # t.spec_files = FileList['spec/**/*.rb']
16
+ # t.spec_opts = ["--format", "html:spec.html"]
17
+ #end
18
+ #
19
+ #desc "Run all specs and dump the result to README"
20
+ #Spec::Rake::SpecTask.new('spec:readme') do |t|
21
+ # t.spec_files = FileList['spec/**/*.rb']
22
+ # t.spec_opts = ["--format", "specdoc:README"]
23
+ #end
24
+
25
+
26
+ Spec::Rake::SpecTask.new(:spec) do |spec|
27
+ spec.libs << 'lib' << 'spec'
28
+ spec.spec_files = FileList['spec/**/*_spec.rb']
29
+ end
30
+
31
+ Spec::Rake::SpecTask.new('spec:html') do |spec|
32
+ spec.libs << 'lib' << 'spec'
33
+ spec.spec_files = FileList['spec/**/*_spec.rb']
34
+ spec.spec_opts = ["--format", "html:spec.html"]
35
+ end
36
+
37
+ Spec::Rake::SpecTask.new('spec:readme') do |spec|
38
+ spec.libs << 'lib' << 'spec'
39
+ spec.spec_files = FileList['spec/**/*_spec.rb']
40
+ spec.spec_opts = ["--format", "specdoc:README"]
41
+ end
42
+
43
+ Spec::Rake::SpecTask.new(:rcov) do |spec|
44
+ spec.libs << 'lib' << 'spec'
45
+ spec.pattern = 'spec/**/*_spec.rb'
46
+ spec.rcov = true
47
+ end
48
+
49
+
50
+ #namespace :gem do
51
+ # desc "Increments the Gem version in imdb.gemspec"
52
+ # task :increment do
53
+ # lines = File.new('imdb.gemspec').readlines
54
+ # lines.each do |line|
55
+ # next unless line =~ /version = '\d+\.\d+\.(\d+)'/
56
+ # line.gsub!(/\d+'/, "#{$1.to_i + 1}'")
57
+ # end
58
+ # File.open('imdb.gemspec', 'w') do |f|
59
+ # lines.each do |line|
60
+ # f.write(line)
61
+ # end
62
+ # end
63
+ # end
64
+ #end
65
+
66
+
67
+ begin
68
+ require 'jeweler'
69
+ Jeweler::Tasks.new do |gem|
70
+ gem.name = "imdb"
71
+ gem.summary = %Q{TODO}
72
+ gem.email = "roy@wright.org"
73
+ gem.homepage = "http://github.com/royw/imdb"
74
+ gem.authors = ["Roy Wright"]
75
+ gem.files.reject! { |fn| File.basename(fn) =~ /^tt\d+\.html/}
76
+
77
+ # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
78
+ end
79
+ rescue LoadError
80
+ puts "Jeweler not available. Install it with: sudo gem install technicalpickles-jeweler -s http://gems.github.com"
81
+ end
82
+
83
+ require 'rake/rdoctask'
84
+ Rake::RDocTask.new do |rdoc|
85
+ if File.exist?('VERSION.yml')
86
+ config = YAML.load(File.read('VERSION.yml'))
87
+ version = "#{config[:major]}.#{config[:minor]}.#{config[:patch]}"
88
+ else
89
+ version = ""
90
+ end
91
+
92
+ rdoc.rdoc_dir = 'rdoc'
93
+ rdoc.title = "imdb #{version}"
94
+ rdoc.rdoc_files.include('README*')
95
+ rdoc.rdoc_files.include('lib/**/*.rb')
96
+ end
@@ -0,0 +1,4 @@
1
+ ---
2
+ :minor: 0
3
+ :patch: 19
4
+ :major: 0
@@ -0,0 +1,277 @@
1
+ # This is the model for the IDMB profile which is used
2
+ # to find ImdbMovie meta data from either online or from
3
+ # a cached file.
4
+ #
5
+ # Usage:
6
+ #
7
+ # profiles = ImdbProfile.all(:titles => ['The Alamo'])
8
+ #
9
+ # profile = ImdbProfile.first(:imdb_id => 'tt0123456')
10
+ # or
11
+ # profile = ImdbProfile.first(:titles => ['movie title 1', 'movie title 2',...]
12
+ # :media_years => ['2000'],
13
+ # :production_years => ['1999'],
14
+ # :released_years => ['2002', '2008']
15
+ # :filespec => media.path_to(:imdb_xml))
16
+ # puts profile.movie['key'].first
17
+ # puts profile.to_xml
18
+ # puts profile.imdb_id
19
+ #
20
+
21
+ # An optional logger.
22
+ # If initialized with a logger instance, uses the logger
23
+ # otherwise doesn't do anything.
24
+ # Basically trying to not require a particular logger class.
25
+ class OptionalLogger
26
+ # logger may be nil or a logger instance
27
+ def initialize(logger)
28
+ @logger = logger
29
+ end
30
+
31
+ # debug {...}
32
+ def debug(&blk)
33
+ @logger.debug(blk.call) unless @logger.nil?
34
+ end
35
+
36
+ # info {...}
37
+ def info(&blk)
38
+ @logger.info(blk.call) unless @logger.nil?
39
+ end
40
+
41
+ # warn {...}
42
+ def warn(&blk)
43
+ @logger.warn(blk.call) unless @logger.nil?
44
+ end
45
+
46
+ # error {...}
47
+ def error(&blk)
48
+ @logger.error(blk.call) unless @logger.nil?
49
+ end
50
+ end
51
+
52
+ class ImdbProfile
53
+
54
+ # options:
55
+ # :imdb_id => String containing the IMDB ID (ex: 'tt0465234')
56
+ # Note, the leading 'tt' is optional.
57
+ # :titles => titles,
58
+ # :media_years => Array of integer, 4 digit years (ex: [1998]).
59
+ # Should be the year(s) from the media file name.
60
+ # This let's the user say what year when they name
61
+ # the file.
62
+ # :production_years => Array of integer, 4 digit years (ex: [1997,1998]).
63
+ # Should be the year(s) the movie was made.
64
+ # Note, some databases differ on the production year.
65
+ # :released_years => Array of integer, 4 digit years (ex: [1998, 2008])
66
+ # Should be the year(s) the movie was released.
67
+ # :logger => Logger instance
68
+ # returns Array of ImdbProfile instances
69
+ def self.all(options={})
70
+ @class_logger = OptionalLogger.new(options[:logger])
71
+ @class_logger.debug {"ImdbProfile.all(#{options.inspect})"} unless options[:logger].nil?
72
+ result = []
73
+ if has_option?(options, :imdb_id) || (has_option?(options, :filespec) && File.exist?(options[:filespec]))
74
+ result << ImdbProfile.new(options[:imdb_id], options[:filespec], options[:logger])
75
+ elsif has_option?(options, :titles)
76
+ result += self.lookup(options[:titles],
77
+ options[:media_years],
78
+ options[:production_years],
79
+ options[:released_years]
80
+ ).collect{|ident| ImdbProfile.new(ident, options[:filespec], options[:logger])}
81
+ end
82
+ result
83
+ end
84
+
85
+ # see ImdbProfile.all for options description
86
+ def self.first(options={})
87
+ self.all(options).first
88
+ end
89
+
90
+ protected
91
+
92
+ def self.has_option?(options, key)
93
+ options.has_key?(key) && !options[key].blank?
94
+ end
95
+
96
+ def initialize(ident, filespec, logger)
97
+ @imdb_id = ident
98
+
99
+ @filespec = filespec
100
+ @logger = OptionalLogger.new(logger)
101
+ load
102
+ end
103
+
104
+ public
105
+
106
+ # returns the IMDB ID String
107
+ attr_reader :imdb_id
108
+
109
+ # returns a Hash with the movie's meta data generated from ImdbMovie.to_hash.
110
+ # See ImdbMovie for details.
111
+ attr_reader :movie
112
+
113
+ # return the xml as a String
114
+ def to_xml
115
+ xml = ''
116
+ unless @movie.blank?
117
+ @movie.delete_if { |key, value| value.nil? }
118
+ xml = XmlSimple.xml_out(@movie, 'NoAttr' => true, 'RootName' => 'movie')
119
+ end
120
+ xml
121
+ end
122
+
123
+ protected
124
+
125
+ # @movie keys => [:title, :directors, :poster_url, :tiny_poster_url, :poster,
126
+ # :rating, :cast_members, :writers, :year, :genres, :plot,
127
+ # :tagline, :aspect_ratio, :length, :release_date, :countries,
128
+ # :languages, :color, :company, :photos, :raw_title,
129
+ # :release_year, :also_known_as, :mpaa, :certifications]
130
+ # returns Hash or nil
131
+ def load
132
+ if !@filespec.blank? && File.exist?(@filespec)
133
+ @logger.debug { "loading movie filespec=> #{@filespec.inspect}" }
134
+ @movie = from_xml(open(@filespec).read)
135
+ elsif !@imdb_id.blank?
136
+ @logger.debug { "loading movie from imdb.com, filespec=> #{@filespec.inspect}" }
137
+ @movie = ImdbMovie.new(@imdb_id.gsub(/^tt/, '')).to_hash
138
+ @movie['id'] = 'tt' + @imdb_id.gsub(/^tt/, '') unless @movie.blank?
139
+ save(@filespec) unless @filespec.blank?
140
+ end
141
+ unless @movie.blank?
142
+ @imdb_id = @movie['id']
143
+ @imdb_id = @imdb_id.first if @imdb_id.respond_to?('[]') && @imdb_id.length == 1
144
+ else
145
+ @movie = nil
146
+ end
147
+ end
148
+
149
+ def from_xml(xml)
150
+ begin
151
+ movie = XmlSimple.xml_in(xml)
152
+ rescue Exception => e
153
+ @logger.warn { "Error converting from xml: #{e.to_s}" }
154
+ movie = nil
155
+ end
156
+ movie
157
+ end
158
+
159
+ def save(filespec)
160
+ begin
161
+ xml = self.to_xml
162
+ unless xml.blank?
163
+ @logger.debug { "saving #{filespec}" }
164
+ save_to_file(filespec, xml)
165
+ end
166
+ rescue Exception => e
167
+ @logger.error {"Unable to save imdb profile to #{filespec} - #{e.to_s}"}
168
+ end
169
+ end
170
+
171
+ def save_to_file(filespec, data)
172
+ new_filespec = filespec + '.new'
173
+ File.open(new_filespec, "w") do |file|
174
+ file.puts(data)
175
+ end
176
+ backup_filespec = filespec + '~'
177
+ File.delete(backup_filespec) if File.exist?(backup_filespec)
178
+ File.rename(filespec, backup_filespec) if File.exist?(filespec)
179
+ File.rename(new_filespec, filespec)
180
+ File.delete(new_filespec) if File.exist?(new_filespec)
181
+ end
182
+
183
+ # lookup IMDB title using years as the secondary search key
184
+ # the titles should behave as an Array, the intent here is to be
185
+ # able to try to find the exact title from DVD Profiler and if that
186
+ # fails, to try to find the title pattern
187
+ # The search order is:
188
+ # 1) media_years should be from media filename
189
+ # 2) production years
190
+ # 3) production years plus/minus a year
191
+ # 4) released years
192
+ # 5) released years plus/minus a year
193
+ # 6) no years
194
+ def self.lookup(titles, media_years, production_years, released_years)
195
+ idents = []
196
+ year_sets = []
197
+ year_sets << media_years unless media_years.blank?
198
+ year_sets << fuzzy_years(production_years, 0) unless production_years.blank?
199
+ year_sets << fuzzy_years(production_years, -1..1) unless production_years.blank?
200
+ year_sets << fuzzy_years(released_years, 0) unless released_years.blank?
201
+ year_sets << fuzzy_years(released_years, -1..1) unless released_years.blank?
202
+ year_sets << [] if media_years.blank?
203
+
204
+ titles.flatten.uniq.compact.each do |title|
205
+ [false, true].each do |search_akas|
206
+ @class_logger.debug { (search_akas ? 'Search AKAs' : 'Do not search AKAs') }
207
+ imdb_search = ImdbSearch.new(title, search_akas)
208
+ @cache ||= {}
209
+ imdb_search.set_cache(@cache)
210
+
211
+ if year_sets.flatten.uniq.compact.empty?
212
+ idents = imdb_search.movies.collect{|m| m.id.to_s}.uniq.compact
213
+ else
214
+ year_sets.each do |years|
215
+ new_idents = find_id(imdb_search, title, years, search_akas)
216
+ @class_logger.debug { "new_idents => #{new_idents.inspect}" }
217
+ idents += new_idents
218
+ break unless new_idents.blank?
219
+ end
220
+ end
221
+ break unless idents.blank?
222
+ end
223
+ break unless idents.blank?
224
+ end
225
+ idents.uniq.compact
226
+ end
227
+
228
+ # Different databases seem to mix up released versus production years.
229
+ # So we combine both into a Array of integer years.
230
+ # fuzzy is an integer range, basically expand each known year by the fuzzy range
231
+ # i.e., let production and released year both be 2000 and fuzzy=-1..1,
232
+ # then the returned years would be [1999, 2000, 2001]
233
+ def self.fuzzy_years(source_years, fuzzy)
234
+ years = []
235
+ unless source_years.blank?
236
+ years = [source_years].flatten.collect do |date|
237
+ a = []
238
+ if date.to_s =~ /(\d{4})/
239
+ y = $1.to_i
240
+ a = [*fuzzy].collect do
241
+ |f| y.to_i + f
242
+ end
243
+ end
244
+ a
245
+ end
246
+ end
247
+ result = years.flatten.uniq.compact.sort
248
+ result
249
+ end
250
+
251
+ # try to find the imdb id for the movie
252
+ def self.find_id(imdb_search, title, years, search_akas)
253
+ idents = []
254
+
255
+ @class_logger.info { "Searching IMDB for \"#{title}\" (#{years.join(", ")})" }
256
+ unless title.blank?
257
+ begin
258
+ movies = imdb_search.movies
259
+ @class_logger.debug { "movies => (#{movies.collect{|m| [m.id, m.year, m.title]}.inspect})"}
260
+ if movies.size == 1
261
+ idents = [movies.first.id.to_s]
262
+ elsif movies.size > 1
263
+ @class_logger.debug { "years => #{years.inspect}"}
264
+ same_year_movies = movies.select{ |m| !m.year.blank? && years.include?(m.year.to_i) }
265
+ idents = same_year_movies.collect{|m| m.id.to_s}
266
+ @class_logger.debug { "same_year_movies => (#{same_year_movies.collect{|m| [m.id, m.year, m.title]}.inspect})"}
267
+ end
268
+ rescue Exception => e
269
+ @class_logger.error { "Error searching IMDB - " + e.to_s }
270
+ @class_logger.error { e.backtrace.join("\n") }
271
+ end
272
+ end
273
+ @class_logger.debug { "IMDB id => #{idents.join(', ')}" } unless idents.blank?
274
+ idents
275
+ end
276
+
277
+ end