royw-imdb 0.0.17 → 0.0.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. data/History.txt +90 -0
  2. data/README +1 -1
  3. data/Rakefile +96 -0
  4. data/VERSION.yml +4 -0
  5. data/lib/imdb/imdb_profile.rb +277 -0
  6. data/spec/cache_extensions.rb +91 -0
  7. data/spec/imdb_image_spec.rb +23 -0
  8. data/spec/imdb_movie_spec.rb +232 -0
  9. data/spec/imdb_profile_spec.rb +182 -0
  10. data/spec/imdb_search_spec.rb +191 -0
  11. data/spec/samples/sample_image.html +409 -0
  12. data/spec/samples/sample_incomplete_movie.html +588 -0
  13. data/spec/samples/sample_jet_pilot.html +929 -0
  14. data/spec/samples/sample_meltdown.html +528 -0
  15. data/spec/samples/sample_movie.html +1135 -0
  16. data/spec/samples/sample_open_season.html +524 -0
  17. data/spec/samples/sample_search.html +446 -0
  18. data/spec/samples/sample_spanish_search.html +390 -0
  19. data/spec/samples/sample_who_am_i_search.html +517 -0
  20. data/spec/samples/www.imdb.com/find?q=Meltdown;s=tt +528 -0
  21. data/spec/samples/www.imdb.com/find?q=National+Treasure%3A+Book+of+Secrets;s=tt +1036 -0
  22. data/spec/samples/www.imdb.com/find?q=National+Treasure+2;s=tt +1036 -0
  23. data/spec/samples/www.imdb.com/find?q=Open+Season;s=tt +524 -0
  24. data/spec/samples/www.imdb.com/find?q=The+Alamo;s=tt +531 -0
  25. data/spec/samples/www.imdb.com/find?q=Who+Am+I%3F;s=tt +517 -0
  26. data/spec/samples/www.imdb.com/find?q=indiana+jones;s=tt +517 -0
  27. data/spec/samples/www.imdb.com/find?q=some+extremely+specific+search+for+indiana+jones;s=tt +1135 -0
  28. data/spec/samples/www.imdb.com/rg/action-box-title/primary-photo/media/rm1203608832/tt0097576 +409 -0
  29. data/spec/spec_helper.rb +5 -0
  30. data/spec/string_extensions_spec.rb +25 -0
  31. metadata +53 -37
@@ -0,0 +1,90 @@
1
+ 0.0.7 royw-imdb
2
+ added Also Known As (aka) title matching
3
+ added ImdbSearch.find_id
4
+ added ImdbMovie.release_year
5
+ added ImdbMovie.raw_title
6
+ added ImdbMovie.video_game?
7
+ added ImdbMovie.also_known_as
8
+ added ImdbMovie.mpaa
9
+ added ImdbMovie.certifications
10
+
11
+ cleaned up merges, spec now passes
12
+ Merge branch 'master' of git://github.com/caleb/imdb
13
+ merged jasonrudolph branch
14
+ merged kieranj branch
15
+ forked from porras-imdb
16
+
17
+ 0.0.6
18
+ Merge patch from different fork
19
+ Correctly parse movies with only one writer
20
+ Add CHANGELOG and bump version to v0.0.6
21
+ Add support for searches that result in an exact match
22
+ Add method to retrieve movie year ...
23
+ - We could already retrieve the release date, but not all movies
24
+ have a release date (e.g., http://www.imdb.com/title/tt0464826/).
25
+ - All movies seem to have a year.
26
+ Strip duplicate movies from the search results
27
+ updated gemspec
28
+ changed poster url
29
+ Updated README
30
+ Added method to retrieve user rating
31
+ ImdbImage
32
+ now returns movie title instead of page title
33
+ now fetches large poster from different url
34
+ now works for multiple directors
35
+ Revert "Revert "Revert "Revert "default rake task""""
36
+
37
+ This reverts commit ed13a2fc544a0b9bbb2276a6ab38bc6a049c5972.
38
+ Revert "Revert "Revert "default rake task"""
39
+
40
+ This reverts commit b2114e2b342e11d3ad9bf8e857d2179d254ecf4c.
41
+ Revert "Revert "default rake task""
42
+
43
+ This reverts commit c662da7f4de5ad510ce706d38de1606f08b211d3.
44
+ Revert "default rake task"
45
+
46
+ This reverts commit 25dcceb4ba63f028b9683f7841b831b96db4d3f1.
47
+ default rake task
48
+ Dependencies on gemspec
49
+ Gitignoring *.gem
50
+ Uses IMDB to search because of partial matches (which Google doesn't find)
51
+ Doc updated + Gem version increment
52
+ Testing encoding issues
53
+ ImdbMovie#aspect_ratio
54
+ ImdbMovie#tagline
55
+ Rake task for incrementing the gem version
56
+ Small refactorization & added ImdbMovie#get_data
57
+ Using Google for searching
58
+ Included GemSpec
59
+ Updated README
60
+ Fixed encoding problems
61
+ String Extensions to unescape & recode HTML
62
+ - Testing against incomplete movie
63
+ - Handling of unkwnown data
64
+ - Some specs failing because of encodings
65
+ ImdbSearch#movies spec
66
+ Small refactoring
67
+ File reorganization
68
+ File reorganization
69
+ Small spec refactorization
70
+ ImdbMovie#plot
71
+ ImdbMovie#release_date
72
+ ImdbMovie#length
73
+ README with specs
74
+ ImdbMovie#company + ImdbMovie#photos
75
+ ImdbMovie#color
76
+ ImdbMovie#languages
77
+ ImdbMovie#countries + small fixes
78
+ ImdbMovie#genres
79
+ ImdbMovie#writers
80
+ ImdbMovie#cast_members
81
+ Specified ImdbMovie (and some implementation, but there are pending issues)
82
+ Some fixes
83
+ Git ignores
84
+ Updated paths
85
+ Directory organization
86
+ Specs refactoring
87
+ Specified ImdbSearch
88
+ Some tests & fixes
89
+ Hpricot searching
90
+ First version
data/README CHANGED
@@ -110,6 +110,6 @@ ImdbProfile
110
110
  - should load from a file if a :filespec option is passed and the file exists
111
111
  - should not load from a file if a :filespec option is passed and the file does not exists
112
112
 
113
- Finished in 9.20481 seconds
113
+ Finished in 9.302841 seconds
114
114
 
115
115
  85 examples, 0 failures
@@ -0,0 +1,96 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+ require 'spec/rake/spectask'
4
+
5
+ desc "Run all specs"
6
+ task :default => :spec
7
+
8
+ #desc "Run all specs"
9
+ #Spec::Rake::SpecTask.new('spec') do |t|
10
+ # t.spec_files = FileList['spec/**/*.rb']
11
+ #end
12
+ #
13
+ #desc "Run all specs and generate HTML report"
14
+ #Spec::Rake::SpecTask.new('spec:html') do |t|
15
+ # t.spec_files = FileList['spec/**/*.rb']
16
+ # t.spec_opts = ["--format", "html:spec.html"]
17
+ #end
18
+ #
19
+ #desc "Run all specs and dump the result to README"
20
+ #Spec::Rake::SpecTask.new('spec:readme') do |t|
21
+ # t.spec_files = FileList['spec/**/*.rb']
22
+ # t.spec_opts = ["--format", "specdoc:README"]
23
+ #end
24
+
25
+
26
+ Spec::Rake::SpecTask.new(:spec) do |spec|
27
+ spec.libs << 'lib' << 'spec'
28
+ spec.spec_files = FileList['spec/**/*_spec.rb']
29
+ end
30
+
31
+ Spec::Rake::SpecTask.new('spec:html') do |spec|
32
+ spec.libs << 'lib' << 'spec'
33
+ spec.spec_files = FileList['spec/**/*_spec.rb']
34
+ spec.spec_opts = ["--format", "html:spec.html"]
35
+ end
36
+
37
+ Spec::Rake::SpecTask.new('spec:readme') do |spec|
38
+ spec.libs << 'lib' << 'spec'
39
+ spec.spec_files = FileList['spec/**/*_spec.rb']
40
+ spec.spec_opts = ["--format", "specdoc:README"]
41
+ end
42
+
43
+ Spec::Rake::SpecTask.new(:rcov) do |spec|
44
+ spec.libs << 'lib' << 'spec'
45
+ spec.pattern = 'spec/**/*_spec.rb'
46
+ spec.rcov = true
47
+ end
48
+
49
+
50
+ #namespace :gem do
51
+ # desc "Increments the Gem version in imdb.gemspec"
52
+ # task :increment do
53
+ # lines = File.new('imdb.gemspec').readlines
54
+ # lines.each do |line|
55
+ # next unless line =~ /version = '\d+\.\d+\.(\d+)'/
56
+ # line.gsub!(/\d+'/, "#{$1.to_i + 1}'")
57
+ # end
58
+ # File.open('imdb.gemspec', 'w') do |f|
59
+ # lines.each do |line|
60
+ # f.write(line)
61
+ # end
62
+ # end
63
+ # end
64
+ #end
65
+
66
+
67
+ begin
68
+ require 'jeweler'
69
+ Jeweler::Tasks.new do |gem|
70
+ gem.name = "imdb"
71
+ gem.summary = %Q{TODO}
72
+ gem.email = "roy@wright.org"
73
+ gem.homepage = "http://github.com/royw/imdb"
74
+ gem.authors = ["Roy Wright"]
75
+ gem.files.reject! { |fn| File.basename(fn) =~ /^tt\d+\.html/}
76
+
77
+ # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
78
+ end
79
+ rescue LoadError
80
+ puts "Jeweler not available. Install it with: sudo gem install technicalpickles-jeweler -s http://gems.github.com"
81
+ end
82
+
83
+ require 'rake/rdoctask'
84
+ Rake::RDocTask.new do |rdoc|
85
+ if File.exist?('VERSION.yml')
86
+ config = YAML.load(File.read('VERSION.yml'))
87
+ version = "#{config[:major]}.#{config[:minor]}.#{config[:patch]}"
88
+ else
89
+ version = ""
90
+ end
91
+
92
+ rdoc.rdoc_dir = 'rdoc'
93
+ rdoc.title = "imdb #{version}"
94
+ rdoc.rdoc_files.include('README*')
95
+ rdoc.rdoc_files.include('lib/**/*.rb')
96
+ end
@@ -0,0 +1,4 @@
1
+ ---
2
+ :minor: 0
3
+ :patch: 19
4
+ :major: 0
@@ -0,0 +1,277 @@
1
+ # This is the model for the IDMB profile which is used
2
+ # to find ImdbMovie meta data from either online or from
3
+ # a cached file.
4
+ #
5
+ # Usage:
6
+ #
7
+ # profiles = ImdbProfile.all(:titles => ['The Alamo'])
8
+ #
9
+ # profile = ImdbProfile.first(:imdb_id => 'tt0123456')
10
+ # or
11
+ # profile = ImdbProfile.first(:titles => ['movie title 1', 'movie title 2',...]
12
+ # :media_years => ['2000'],
13
+ # :production_years => ['1999'],
14
+ # :released_years => ['2002', '2008']
15
+ # :filespec => media.path_to(:imdb_xml))
16
+ # puts profile.movie['key'].first
17
+ # puts profile.to_xml
18
+ # puts profile.imdb_id
19
+ #
20
+
21
+ # An optional logger.
22
+ # If initialized with a logger instance, uses the logger
23
+ # otherwise doesn't do anything.
24
+ # Basically trying to not require a particular logger class.
25
+ class OptionalLogger
26
+ # logger may be nil or a logger instance
27
+ def initialize(logger)
28
+ @logger = logger
29
+ end
30
+
31
+ # debug {...}
32
+ def debug(&blk)
33
+ @logger.debug(blk.call) unless @logger.nil?
34
+ end
35
+
36
+ # info {...}
37
+ def info(&blk)
38
+ @logger.info(blk.call) unless @logger.nil?
39
+ end
40
+
41
+ # warn {...}
42
+ def warn(&blk)
43
+ @logger.warn(blk.call) unless @logger.nil?
44
+ end
45
+
46
+ # error {...}
47
+ def error(&blk)
48
+ @logger.error(blk.call) unless @logger.nil?
49
+ end
50
+ end
51
+
52
+ class ImdbProfile
53
+
54
+ # options:
55
+ # :imdb_id => String containing the IMDB ID (ex: 'tt0465234')
56
+ # Note, the leading 'tt' is optional.
57
+ # :titles => titles,
58
+ # :media_years => Array of integer, 4 digit years (ex: [1998]).
59
+ # Should be the year(s) from the media file name.
60
+ # This let's the user say what year when they name
61
+ # the file.
62
+ # :production_years => Array of integer, 4 digit years (ex: [1997,1998]).
63
+ # Should be the year(s) the movie was made.
64
+ # Note, some databases differ on the production year.
65
+ # :released_years => Array of integer, 4 digit years (ex: [1998, 2008])
66
+ # Should be the year(s) the movie was released.
67
+ # :logger => Logger instance
68
+ # returns Array of ImdbProfile instances
69
+ def self.all(options={})
70
+ @class_logger = OptionalLogger.new(options[:logger])
71
+ @class_logger.debug {"ImdbProfile.all(#{options.inspect})"} unless options[:logger].nil?
72
+ result = []
73
+ if has_option?(options, :imdb_id) || (has_option?(options, :filespec) && File.exist?(options[:filespec]))
74
+ result << ImdbProfile.new(options[:imdb_id], options[:filespec], options[:logger])
75
+ elsif has_option?(options, :titles)
76
+ result += self.lookup(options[:titles],
77
+ options[:media_years],
78
+ options[:production_years],
79
+ options[:released_years]
80
+ ).collect{|ident| ImdbProfile.new(ident, options[:filespec], options[:logger])}
81
+ end
82
+ result
83
+ end
84
+
85
+ # see ImdbProfile.all for options description
86
+ def self.first(options={})
87
+ self.all(options).first
88
+ end
89
+
90
+ protected
91
+
92
+ def self.has_option?(options, key)
93
+ options.has_key?(key) && !options[key].blank?
94
+ end
95
+
96
+ def initialize(ident, filespec, logger)
97
+ @imdb_id = ident
98
+
99
+ @filespec = filespec
100
+ @logger = OptionalLogger.new(logger)
101
+ load
102
+ end
103
+
104
+ public
105
+
106
+ # returns the IMDB ID String
107
+ attr_reader :imdb_id
108
+
109
+ # returns a Hash with the movie's meta data generated from ImdbMovie.to_hash.
110
+ # See ImdbMovie for details.
111
+ attr_reader :movie
112
+
113
+ # return the xml as a String
114
+ def to_xml
115
+ xml = ''
116
+ unless @movie.blank?
117
+ @movie.delete_if { |key, value| value.nil? }
118
+ xml = XmlSimple.xml_out(@movie, 'NoAttr' => true, 'RootName' => 'movie')
119
+ end
120
+ xml
121
+ end
122
+
123
+ protected
124
+
125
+ # @movie keys => [:title, :directors, :poster_url, :tiny_poster_url, :poster,
126
+ # :rating, :cast_members, :writers, :year, :genres, :plot,
127
+ # :tagline, :aspect_ratio, :length, :release_date, :countries,
128
+ # :languages, :color, :company, :photos, :raw_title,
129
+ # :release_year, :also_known_as, :mpaa, :certifications]
130
+ # returns Hash or nil
131
+ def load
132
+ if !@filespec.blank? && File.exist?(@filespec)
133
+ @logger.debug { "loading movie filespec=> #{@filespec.inspect}" }
134
+ @movie = from_xml(open(@filespec).read)
135
+ elsif !@imdb_id.blank?
136
+ @logger.debug { "loading movie from imdb.com, filespec=> #{@filespec.inspect}" }
137
+ @movie = ImdbMovie.new(@imdb_id.gsub(/^tt/, '')).to_hash
138
+ @movie['id'] = 'tt' + @imdb_id.gsub(/^tt/, '') unless @movie.blank?
139
+ save(@filespec) unless @filespec.blank?
140
+ end
141
+ unless @movie.blank?
142
+ @imdb_id = @movie['id']
143
+ @imdb_id = @imdb_id.first if @imdb_id.respond_to?('[]') && @imdb_id.length == 1
144
+ else
145
+ @movie = nil
146
+ end
147
+ end
148
+
149
+ def from_xml(xml)
150
+ begin
151
+ movie = XmlSimple.xml_in(xml)
152
+ rescue Exception => e
153
+ @logger.warn { "Error converting from xml: #{e.to_s}" }
154
+ movie = nil
155
+ end
156
+ movie
157
+ end
158
+
159
+ def save(filespec)
160
+ begin
161
+ xml = self.to_xml
162
+ unless xml.blank?
163
+ @logger.debug { "saving #{filespec}" }
164
+ save_to_file(filespec, xml)
165
+ end
166
+ rescue Exception => e
167
+ @logger.error {"Unable to save imdb profile to #{filespec} - #{e.to_s}"}
168
+ end
169
+ end
170
+
171
+ def save_to_file(filespec, data)
172
+ new_filespec = filespec + '.new'
173
+ File.open(new_filespec, "w") do |file|
174
+ file.puts(data)
175
+ end
176
+ backup_filespec = filespec + '~'
177
+ File.delete(backup_filespec) if File.exist?(backup_filespec)
178
+ File.rename(filespec, backup_filespec) if File.exist?(filespec)
179
+ File.rename(new_filespec, filespec)
180
+ File.delete(new_filespec) if File.exist?(new_filespec)
181
+ end
182
+
183
+ # lookup IMDB title using years as the secondary search key
184
+ # the titles should behave as an Array, the intent here is to be
185
+ # able to try to find the exact title from DVD Profiler and if that
186
+ # fails, to try to find the title pattern
187
+ # The search order is:
188
+ # 1) media_years should be from media filename
189
+ # 2) production years
190
+ # 3) production years plus/minus a year
191
+ # 4) released years
192
+ # 5) released years plus/minus a year
193
+ # 6) no years
194
+ def self.lookup(titles, media_years, production_years, released_years)
195
+ idents = []
196
+ year_sets = []
197
+ year_sets << media_years unless media_years.blank?
198
+ year_sets << fuzzy_years(production_years, 0) unless production_years.blank?
199
+ year_sets << fuzzy_years(production_years, -1..1) unless production_years.blank?
200
+ year_sets << fuzzy_years(released_years, 0) unless released_years.blank?
201
+ year_sets << fuzzy_years(released_years, -1..1) unless released_years.blank?
202
+ year_sets << [] if media_years.blank?
203
+
204
+ titles.flatten.uniq.compact.each do |title|
205
+ [false, true].each do |search_akas|
206
+ @class_logger.debug { (search_akas ? 'Search AKAs' : 'Do not search AKAs') }
207
+ imdb_search = ImdbSearch.new(title, search_akas)
208
+ @cache ||= {}
209
+ imdb_search.set_cache(@cache)
210
+
211
+ if year_sets.flatten.uniq.compact.empty?
212
+ idents = imdb_search.movies.collect{|m| m.id.to_s}.uniq.compact
213
+ else
214
+ year_sets.each do |years|
215
+ new_idents = find_id(imdb_search, title, years, search_akas)
216
+ @class_logger.debug { "new_idents => #{new_idents.inspect}" }
217
+ idents += new_idents
218
+ break unless new_idents.blank?
219
+ end
220
+ end
221
+ break unless idents.blank?
222
+ end
223
+ break unless idents.blank?
224
+ end
225
+ idents.uniq.compact
226
+ end
227
+
228
+ # Different databases seem to mix up released versus production years.
229
+ # So we combine both into a Array of integer years.
230
+ # fuzzy is an integer range, basically expand each known year by the fuzzy range
231
+ # i.e., let production and released year both be 2000 and fuzzy=-1..1,
232
+ # then the returned years would be [1999, 2000, 2001]
233
+ def self.fuzzy_years(source_years, fuzzy)
234
+ years = []
235
+ unless source_years.blank?
236
+ years = [source_years].flatten.collect do |date|
237
+ a = []
238
+ if date.to_s =~ /(\d{4})/
239
+ y = $1.to_i
240
+ a = [*fuzzy].collect do
241
+ |f| y.to_i + f
242
+ end
243
+ end
244
+ a
245
+ end
246
+ end
247
+ result = years.flatten.uniq.compact.sort
248
+ result
249
+ end
250
+
251
+ # try to find the imdb id for the movie
252
+ def self.find_id(imdb_search, title, years, search_akas)
253
+ idents = []
254
+
255
+ @class_logger.info { "Searching IMDB for \"#{title}\" (#{years.join(", ")})" }
256
+ unless title.blank?
257
+ begin
258
+ movies = imdb_search.movies
259
+ @class_logger.debug { "movies => (#{movies.collect{|m| [m.id, m.year, m.title]}.inspect})"}
260
+ if movies.size == 1
261
+ idents = [movies.first.id.to_s]
262
+ elsif movies.size > 1
263
+ @class_logger.debug { "years => #{years.inspect}"}
264
+ same_year_movies = movies.select{ |m| !m.year.blank? && years.include?(m.year.to_i) }
265
+ idents = same_year_movies.collect{|m| m.id.to_s}
266
+ @class_logger.debug { "same_year_movies => (#{same_year_movies.collect{|m| [m.id, m.year, m.title]}.inspect})"}
267
+ end
268
+ rescue Exception => e
269
+ @class_logger.error { "Error searching IMDB - " + e.to_s }
270
+ @class_logger.error { e.backtrace.join("\n") }
271
+ end
272
+ end
273
+ @class_logger.debug { "IMDB id => #{idents.join(', ')}" } unless idents.blank?
274
+ idents
275
+ end
276
+
277
+ end