bootleg 0.0.6 → 0.0.7
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +1 -0
- data/Gemfile +0 -6
- data/README.md +43 -53
- data/bootleg.gemspec +6 -7
- data/lib/bootleg.rb +5 -7
- data/lib/bootleg/agent.rb +31 -0
- data/lib/bootleg/movie.rb +30 -0
- data/lib/bootleg/page.rb +26 -0
- data/lib/bootleg/theater.rb +54 -0
- data/lib/bootleg/version.rb +1 -1
- data/spec/lib/agent_spec.rb +11 -0
- data/spec/lib/movie_spec.rb +37 -0
- data/spec/lib/page_spec.rb +21 -0
- data/spec/lib/theater_spec.rb +52 -0
- metadata +31 -83
- data/README.rdoc +0 -3
- data/lib/extractor.rb +0 -32
- data/lib/finder.rb +0 -14
- data/lib/generators/bootleg/USAGE +0 -0
- data/lib/generators/bootleg/install_generator.rb +0 -15
- data/lib/generators/bootleg/movie_generator.rb +0 -24
- data/lib/generators/bootleg/showtime_generator.rb +0 -24
- data/lib/generators/bootleg/templates/movie_migration.rb +0 -10
- data/lib/generators/bootleg/templates/movie_model.rb +0 -10
- data/lib/generators/bootleg/templates/showtime_migration.rb +0 -12
- data/lib/generators/bootleg/templates/showtime_model.rb +0 -14
- data/lib/generators/bootleg/templates/theater_migration.rb +0 -10
- data/lib/generators/bootleg/templates/theater_model.rb +0 -10
- data/lib/generators/bootleg/theater_generator.rb +0 -24
- data/lib/manager.rb +0 -27
- data/lib/modules/href.rb +0 -18
- data/lib/modules/movie.rb +0 -24
- data/lib/modules/theater.rb +0 -45
- data/lib/modules/zipcode.rb +0 -12
- data/lib/presenter.rb +0 -8
- data/spec/extractor_spec.rb +0 -59
- data/spec/finder_spec.rb +0 -26
- data/spec/presenter_spec.rb +0 -4
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: dfffbc8d963307c1e4023281762f735b18b0b117
|
4
|
+
data.tar.gz: 0c0af543cf5c8b8c3da8fa0b2c00d5babcab9762
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 2f20a7f0caf9fdaa96fccc43dd38f85a5e8b9a29740c8aef03bfacc5e5da2b7a9ab264ca252272e567e47d4b4395a9076acb17974d9feec5b4848324e0413d3d
|
7
|
+
data.tar.gz: 74025fc09a0c95f8e7bc9900a4bdb4f8c20c5890c78cc2d6767dfa82e34ca2b64d483abf4a08fa67f08792666df628a1722c294ca254eec402524319a86abc74
|
data/.gitignore
CHANGED
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -1,71 +1,61 @@
|
|
1
1
|
# Bootleg
|
2
2
|
|
3
|
-
Bootleg
|
4
|
-
|
5
|
-
radius.
|
3
|
+
Bootleg represents my playground for scraping movies from
|
4
|
+
[moviefone.com](http://moviefone.com)
|
6
5
|
|
7
|
-
##
|
6
|
+
## If you want you can play with it as well
|
8
7
|
|
9
|
-
|
8
|
+
To gain access to the gem simply:
|
10
9
|
|
11
|
-
|
10
|
+
```
|
11
|
+
require 'bootleg'
|
12
|
+
```
|
12
13
|
|
13
|
-
|
14
|
+
Currently Bootleg supports search for movies and theaters within a certain
|
15
|
+
zipcode.
|
14
16
|
|
15
|
-
|
17
|
+
```
|
18
|
+
agent = Bootleg::Agent.new(zipcode: 20851)
|
19
|
+
```
|
16
20
|
|
17
|
-
|
21
|
+
The bootleg agent is responsible for initializing the enviornment and providing
|
22
|
+
a simple method ``page`` that gives us access to the first search results.
|
18
23
|
|
19
|
-
|
24
|
+
```
|
25
|
+
page = agent.page
|
26
|
+
```
|
20
27
|
|
21
|
-
|
28
|
+
A page has two handy methods. The ``theaters`` method gives you an array with
|
29
|
+
all the available theaters. You have access to a theater's title, link, price
|
30
|
+
for an adult or the price for a child as well as the address of the theater and
|
31
|
+
the movies that are currently palying at that theater. The page also has a
|
32
|
+
``next`` method that give you access to the next page. You can keep navigating
|
33
|
+
through the search results until ``page.next`` returns _Last Page_.
|
22
34
|
|
23
|
-
|
35
|
+
```
|
36
|
+
page = page.next
|
37
|
+
```
|
24
38
|
|
25
|
-
|
39
|
+
Within theaters you also have access to movies. A movie has attributes for title
|
40
|
+
link and showtimes.
|
26
41
|
|
27
|
-
|
42
|
+
```
|
43
|
+
theaters = page.theaters
|
44
|
+
```
|
28
45
|
|
29
|
-
|
46
|
+
This gives you a list of theaters on the current page.
|
30
47
|
|
31
|
-
|
48
|
+
For example if you would like access to all the movies that run in a certain
|
49
|
+
theater you could do the following.
|
32
50
|
|
33
|
-
|
34
|
-
|
51
|
+
```
|
52
|
+
theaters.first.movies
|
53
|
+
```
|
54
|
+
If you need to scrape additional information from either theaters or movies it is
|
55
|
+
fairly easy to extend either the theater class or the movies class.
|
35
56
|
|
36
|
-
|
57
|
+
## TODO
|
37
58
|
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
$ theaters = movie.theaters
|
43
|
-
|
44
|
-
Get all the showtimes:
|
45
|
-
|
46
|
-
$ showtimes = movie.showtimes
|
47
|
-
|
48
|
-
Get the theater of the showtimes:
|
49
|
-
|
50
|
-
$ theater = showtimes.first.theater
|
51
|
-
|
52
|
-
## Other Details
|
53
|
-
|
54
|
-
The content is stored in 3 Active Record models BootlegMovie,
|
55
|
-
BootlegTheater and BootlegShowtime. After you load a zipcode just start
|
56
|
-
a rails console and take a look at the models to see what information is
|
57
|
-
stored inside.
|
58
|
-
|
59
|
-
The zipcode is stored under BootlegShowtime.:w
|
60
|
-
|
61
|
-
## Contributing
|
62
|
-
|
63
|
-
1. Fork it
|
64
|
-
2. Create your feature branch (`git checkout -b my-new-feature`)
|
65
|
-
3. Commit your changes (`git commit -am 'Add some feature'`)
|
66
|
-
4. Push to the branch (`git push origin my-new-feature`)
|
67
|
-
5. Create new Pull Request
|
68
|
-
|
69
|
-
It is relatively easy to pull other information from moviefone.com. If you
|
70
|
-
need an extra feature and you would like to contribute feel free to shoot
|
71
|
-
me an email at marius@mlpinit.com beforehand.
|
59
|
+
Some of the main improvements this gem could use is better error support for
|
60
|
+
different edge cases. The gem could also benefit from more tests to ilustrate
|
61
|
+
some of the patterns identified while scrapping movies and theaters.
|
data/bootleg.gemspec
CHANGED
@@ -8,17 +8,16 @@ Gem::Specification.new do |gem|
|
|
8
8
|
gem.version = Bootleg::VERSION
|
9
9
|
gem.authors = ["Marius L. Pop"]
|
10
10
|
gem.email = ["marius@mlpinit.com"]
|
11
|
-
gem.description = %q{
|
12
|
-
gem.summary = %q{
|
11
|
+
gem.description = %q{ This gems allows you to navigate through the results }
|
12
|
+
gem.summary = %q{ Zipcode based scrapping for moviefone.com }
|
13
13
|
gem.homepage = ""
|
14
14
|
|
15
15
|
gem.files = `git ls-files`.split($/)
|
16
16
|
gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
|
17
17
|
gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
|
18
18
|
gem.require_paths = ["lib"]
|
19
|
-
|
20
|
-
gem.add_dependency
|
21
|
-
|
22
|
-
gem.
|
23
|
-
gem.add_development_dependency "rspec"
|
19
|
+
|
20
|
+
gem.add_dependency 'mechanize', '~> 2.7.3'
|
21
|
+
|
22
|
+
gem.add_development_dependency 'rspec', '3.0.0.beta2'
|
24
23
|
end
|
data/lib/bootleg.rb
CHANGED
@@ -0,0 +1,31 @@
|
|
1
|
+
module Bootleg
|
2
|
+
class Agent
|
3
|
+
|
4
|
+
attr_reader :zipcode
|
5
|
+
|
6
|
+
def initialize(args)
|
7
|
+
@zipcode = args.fetch(:zipcode)
|
8
|
+
end
|
9
|
+
|
10
|
+
def page
|
11
|
+
@page ||= Bootleg::Page.new page: mechanize.submit(search_form)
|
12
|
+
end
|
13
|
+
|
14
|
+
private
|
15
|
+
|
16
|
+
def mechanize
|
17
|
+
Mechanize.new
|
18
|
+
end
|
19
|
+
|
20
|
+
def home_page
|
21
|
+
mechanize.get('http://moviefone.com')
|
22
|
+
end
|
23
|
+
|
24
|
+
def search_form
|
25
|
+
home_page.form_with(id: 'frm-search').tap do |form|
|
26
|
+
form.fields.last.value = zipcode
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
end
|
31
|
+
end
|
@@ -0,0 +1,30 @@
|
|
1
|
+
module Bootleg
|
2
|
+
class Movie
|
3
|
+
|
4
|
+
attr_reader :movie
|
5
|
+
private :movie
|
6
|
+
|
7
|
+
def initialize(args)
|
8
|
+
@movie ||= args.fetch(:movie)
|
9
|
+
end
|
10
|
+
|
11
|
+
def title
|
12
|
+
movie_info.text
|
13
|
+
end
|
14
|
+
|
15
|
+
def link
|
16
|
+
movie_info.attributes['href'].value
|
17
|
+
end
|
18
|
+
|
19
|
+
def showtimes
|
20
|
+
movie.search('span.stDisplay').map { |times| times.text }
|
21
|
+
end
|
22
|
+
|
23
|
+
private
|
24
|
+
|
25
|
+
def movie_info
|
26
|
+
@movie_info ||= movie.search('div.movietitle a').last
|
27
|
+
end
|
28
|
+
|
29
|
+
end
|
30
|
+
end
|
data/lib/bootleg/page.rb
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
module Bootleg
|
2
|
+
class Page
|
3
|
+
|
4
|
+
attr_reader :page
|
5
|
+
|
6
|
+
def initialize(args)
|
7
|
+
@page ||= args.fetch(:page)
|
8
|
+
end
|
9
|
+
|
10
|
+
def next
|
11
|
+
link ? self.class.new(page: link.click) : 'Last Page'
|
12
|
+
end
|
13
|
+
|
14
|
+
def theaters
|
15
|
+
page.search('div.theater').
|
16
|
+
map { |theater| Bootleg::Theater.new(theater: theater) }
|
17
|
+
end
|
18
|
+
|
19
|
+
private
|
20
|
+
|
21
|
+
def link
|
22
|
+
@link ||= page.link_with(class: 'next-showtime')
|
23
|
+
end
|
24
|
+
|
25
|
+
end
|
26
|
+
end
|
@@ -0,0 +1,54 @@
|
|
1
|
+
module Bootleg
|
2
|
+
class Theater
|
3
|
+
|
4
|
+
attr_reader :theater
|
5
|
+
private :theater
|
6
|
+
|
7
|
+
def initialize(args)
|
8
|
+
@theater ||= args.fetch(:theater)
|
9
|
+
end
|
10
|
+
|
11
|
+
def title
|
12
|
+
title_info.text
|
13
|
+
end
|
14
|
+
|
15
|
+
def link
|
16
|
+
title_info.attributes['href'].value
|
17
|
+
end
|
18
|
+
|
19
|
+
def address
|
20
|
+
# removes the phone number at the end of the address and the white space
|
21
|
+
theater.search('p.address').text.sub(/\|.*/,'').strip
|
22
|
+
end
|
23
|
+
|
24
|
+
def adult_price
|
25
|
+
prices.first
|
26
|
+
end
|
27
|
+
|
28
|
+
def child_price
|
29
|
+
prices.last
|
30
|
+
end
|
31
|
+
|
32
|
+
def movies
|
33
|
+
theater.search('div.movie-data-wrap').
|
34
|
+
map { |movie| Bootleg::Movie.new(movie: movie) }
|
35
|
+
end
|
36
|
+
|
37
|
+
private
|
38
|
+
|
39
|
+
def title_info
|
40
|
+
@title_link ||= theater.search('div.title a').last
|
41
|
+
end
|
42
|
+
|
43
|
+
def prices
|
44
|
+
@prices ||= theater.
|
45
|
+
search("div.prices").
|
46
|
+
first.
|
47
|
+
text.
|
48
|
+
gsub(/\s/,'').
|
49
|
+
split('|').
|
50
|
+
map { |price| $& if price.match /\$.*\d{2}/ }
|
51
|
+
end
|
52
|
+
|
53
|
+
end
|
54
|
+
end
|
data/lib/bootleg/version.rb
CHANGED
@@ -0,0 +1,11 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'bootleg/agent'
|
3
|
+
|
4
|
+
describe Bootleg::Agent do
|
5
|
+
|
6
|
+
it 'raises ArgumentError if not initialized with a zipcode' do
|
7
|
+
expect{described_class.new(not_zipcode: 'hey')}.
|
8
|
+
to raise_error(KeyError, 'key not found: :zipcode')
|
9
|
+
end
|
10
|
+
|
11
|
+
end
|
@@ -0,0 +1,37 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'bootleg/movie'
|
3
|
+
|
4
|
+
describe Bootleg::Movie do
|
5
|
+
|
6
|
+
it 'raises KeyError if not initialized with a nokogiri-movie' do
|
7
|
+
expect{described_class.new(not_movie: 'not movie')}.
|
8
|
+
to raise_error(KeyError, 'key not found: :movie')
|
9
|
+
end
|
10
|
+
|
11
|
+
let(:attributes) {}
|
12
|
+
let(:nokogiri_movie) { double 'Nokogiri Movie' }
|
13
|
+
let(:href) { double( 'Href', value: 'http://moviefone.com') }
|
14
|
+
let(:movie) { double('Movie', text: 'Matrix', attributes: { 'href' => href }) }
|
15
|
+
let(:times) { [ double(text: '1am'), double(text: '2pm')] }
|
16
|
+
|
17
|
+
subject { described_class.new(movie: nokogiri_movie) }
|
18
|
+
|
19
|
+
it 'has a title' do
|
20
|
+
allow(nokogiri_movie).to receive_message_chain(:search, :last).
|
21
|
+
and_return(movie)
|
22
|
+
expect(subject.title).to eq('Matrix')
|
23
|
+
end
|
24
|
+
|
25
|
+
it 'has a link' do
|
26
|
+
allow(nokogiri_movie).to receive_message_chain(:search, :last).
|
27
|
+
and_return(movie)
|
28
|
+
expect(subject.link).to eq('http://moviefone.com')
|
29
|
+
end
|
30
|
+
|
31
|
+
it 'has showtimes' do
|
32
|
+
allow(nokogiri_movie).to receive(:search).with('span.stDisplay').
|
33
|
+
and_return(times)
|
34
|
+
expect(subject.showtimes).to include('1am', '2pm')
|
35
|
+
end
|
36
|
+
|
37
|
+
end
|
@@ -0,0 +1,21 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'bootleg/page'
|
3
|
+
|
4
|
+
describe Bootleg::Page do
|
5
|
+
|
6
|
+
it 'raises KeyError if not initialized with a nokogiri-page' do
|
7
|
+
expect{described_class.new(not_page: 'not page')}.
|
8
|
+
to raise_error(KeyError, 'key not found: :page')
|
9
|
+
end
|
10
|
+
|
11
|
+
let(:link) { double 'Link' }
|
12
|
+
let(:nokogiri_page) { double 'Nokogiri Page' }
|
13
|
+
subject { described_class.new(page: nokogiri_page) }
|
14
|
+
|
15
|
+
it 'returns "Last Page" if no more pages' do
|
16
|
+
allow(nokogiri_page).to receive(:link_with).
|
17
|
+
with(class: 'next-showtime').and_return(nil)
|
18
|
+
expect(subject.next).to eq('Last Page')
|
19
|
+
end
|
20
|
+
|
21
|
+
end
|
@@ -0,0 +1,52 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'bootleg/theater'
|
3
|
+
|
4
|
+
describe Bootleg::Theater do
|
5
|
+
|
6
|
+
it 'raises KeyError if not initialized with a nokogiri-theater' do
|
7
|
+
expect{described_class.new(not_theater: 'not theater')}.
|
8
|
+
to raise_error(KeyError, 'key not found: :theater')
|
9
|
+
end
|
10
|
+
|
11
|
+
let(:nokogiri_theater) { double 'Nokogiri Theater' }
|
12
|
+
let(:theater) { double('Theater', text: 'Royal', attributes: { 'href' => href }) }
|
13
|
+
let(:href) { double( 'Href', value: 'http://moviefone.com') }
|
14
|
+
let(:address) { double('Address', text: "\n\n\t Rockville, MD | 234-222") }
|
15
|
+
let(:prices) { double('Prices', text: 'RegularPrice:$11.50 | ChildPrice:$6.50') }
|
16
|
+
|
17
|
+
subject { described_class.new(theater: nokogiri_theater) }
|
18
|
+
|
19
|
+
it 'has a title' do
|
20
|
+
allow(nokogiri_theater).to receive_message_chain(:search, :last).
|
21
|
+
and_return(theater)
|
22
|
+
expect(subject.title).to eq('Royal')
|
23
|
+
end
|
24
|
+
|
25
|
+
it 'has a link' do
|
26
|
+
allow(nokogiri_theater).to receive_message_chain(:search, :last).
|
27
|
+
and_return(theater)
|
28
|
+
expect(subject.link).to eq('http://moviefone.com')
|
29
|
+
end
|
30
|
+
|
31
|
+
it 'has an address' do
|
32
|
+
allow(nokogiri_theater).to receive(:search).with('p.address').
|
33
|
+
and_return(address)
|
34
|
+
expect(subject.address).to eq("Rockville, MD")
|
35
|
+
end
|
36
|
+
|
37
|
+
context 'has price' do
|
38
|
+
before do
|
39
|
+
allow(nokogiri_theater).to receive_message_chain(:search, :first).
|
40
|
+
and_return(prices)
|
41
|
+
end
|
42
|
+
|
43
|
+
it 'for adult' do
|
44
|
+
expect(subject.adult_price).to eq('$11.50')
|
45
|
+
end
|
46
|
+
|
47
|
+
it 'for child' do
|
48
|
+
expect(subject.child_price).to eq('$6.50')
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
end
|
metadata
CHANGED
@@ -1,148 +1,96 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bootleg
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
5
|
-
prerelease:
|
4
|
+
version: 0.0.7
|
6
5
|
platform: ruby
|
7
6
|
authors:
|
8
7
|
- Marius L. Pop
|
9
8
|
autorequire:
|
10
9
|
bindir: bin
|
11
10
|
cert_chain: []
|
12
|
-
date:
|
11
|
+
date: 2016-05-25 00:00:00.000000000 Z
|
13
12
|
dependencies:
|
14
|
-
- !ruby/object:Gem::Dependency
|
15
|
-
name: nokogiri
|
16
|
-
requirement: !ruby/object:Gem::Requirement
|
17
|
-
none: false
|
18
|
-
requirements:
|
19
|
-
- - ! '>='
|
20
|
-
- !ruby/object:Gem::Version
|
21
|
-
version: '0'
|
22
|
-
type: :runtime
|
23
|
-
prerelease: false
|
24
|
-
version_requirements: !ruby/object:Gem::Requirement
|
25
|
-
none: false
|
26
|
-
requirements:
|
27
|
-
- - ! '>='
|
28
|
-
- !ruby/object:Gem::Version
|
29
|
-
version: '0'
|
30
13
|
- !ruby/object:Gem::Dependency
|
31
14
|
name: mechanize
|
32
15
|
requirement: !ruby/object:Gem::Requirement
|
33
|
-
none: false
|
34
|
-
requirements:
|
35
|
-
- - ! '>='
|
36
|
-
- !ruby/object:Gem::Version
|
37
|
-
version: '0'
|
38
|
-
type: :runtime
|
39
|
-
prerelease: false
|
40
|
-
version_requirements: !ruby/object:Gem::Requirement
|
41
|
-
none: false
|
42
|
-
requirements:
|
43
|
-
- - ! '>='
|
44
|
-
- !ruby/object:Gem::Version
|
45
|
-
version: '0'
|
46
|
-
- !ruby/object:Gem::Dependency
|
47
|
-
name: activerecord
|
48
|
-
requirement: !ruby/object:Gem::Requirement
|
49
|
-
none: false
|
50
16
|
requirements:
|
51
|
-
- -
|
17
|
+
- - "~>"
|
52
18
|
- !ruby/object:Gem::Version
|
53
|
-
version:
|
19
|
+
version: 2.7.3
|
54
20
|
type: :runtime
|
55
21
|
prerelease: false
|
56
22
|
version_requirements: !ruby/object:Gem::Requirement
|
57
|
-
none: false
|
58
23
|
requirements:
|
59
|
-
- -
|
24
|
+
- - "~>"
|
60
25
|
- !ruby/object:Gem::Version
|
61
|
-
version:
|
26
|
+
version: 2.7.3
|
62
27
|
- !ruby/object:Gem::Dependency
|
63
28
|
name: rspec
|
64
29
|
requirement: !ruby/object:Gem::Requirement
|
65
|
-
none: false
|
66
30
|
requirements:
|
67
|
-
- -
|
31
|
+
- - '='
|
68
32
|
- !ruby/object:Gem::Version
|
69
|
-
version:
|
33
|
+
version: 3.0.0.beta2
|
70
34
|
type: :development
|
71
35
|
prerelease: false
|
72
36
|
version_requirements: !ruby/object:Gem::Requirement
|
73
|
-
none: false
|
74
37
|
requirements:
|
75
|
-
- -
|
38
|
+
- - '='
|
76
39
|
- !ruby/object:Gem::Version
|
77
|
-
version:
|
78
|
-
description:
|
79
|
-
movifone.com'
|
40
|
+
version: 3.0.0.beta2
|
41
|
+
description: " This gems allows you to navigate through the results "
|
80
42
|
email:
|
81
43
|
- marius@mlpinit.com
|
82
44
|
executables: []
|
83
45
|
extensions: []
|
84
46
|
extra_rdoc_files: []
|
85
47
|
files:
|
86
|
-
- .gitignore
|
48
|
+
- ".gitignore"
|
87
49
|
- Gemfile
|
88
50
|
- LICENSE.txt
|
89
51
|
- README.md
|
90
|
-
- README.rdoc
|
91
52
|
- Rakefile
|
92
53
|
- bootleg.gemspec
|
93
54
|
- lib/bootleg.rb
|
55
|
+
- lib/bootleg/agent.rb
|
56
|
+
- lib/bootleg/movie.rb
|
57
|
+
- lib/bootleg/page.rb
|
58
|
+
- lib/bootleg/theater.rb
|
94
59
|
- lib/bootleg/version.rb
|
95
|
-
- lib/extractor.rb
|
96
|
-
- lib/finder.rb
|
97
|
-
- lib/generators/bootleg/USAGE
|
98
|
-
- lib/generators/bootleg/install_generator.rb
|
99
|
-
- lib/generators/bootleg/movie_generator.rb
|
100
|
-
- lib/generators/bootleg/showtime_generator.rb
|
101
|
-
- lib/generators/bootleg/templates/movie_migration.rb
|
102
|
-
- lib/generators/bootleg/templates/movie_model.rb
|
103
|
-
- lib/generators/bootleg/templates/showtime_migration.rb
|
104
|
-
- lib/generators/bootleg/templates/showtime_model.rb
|
105
|
-
- lib/generators/bootleg/templates/theater_migration.rb
|
106
|
-
- lib/generators/bootleg/templates/theater_model.rb
|
107
|
-
- lib/generators/bootleg/theater_generator.rb
|
108
|
-
- lib/manager.rb
|
109
|
-
- lib/modules/href.rb
|
110
|
-
- lib/modules/movie.rb
|
111
|
-
- lib/modules/theater.rb
|
112
|
-
- lib/modules/zipcode.rb
|
113
|
-
- lib/presenter.rb
|
114
60
|
- spec/.rspec
|
115
|
-
- spec/
|
116
|
-
- spec/
|
117
|
-
- spec/
|
61
|
+
- spec/lib/agent_spec.rb
|
62
|
+
- spec/lib/movie_spec.rb
|
63
|
+
- spec/lib/page_spec.rb
|
64
|
+
- spec/lib/theater_spec.rb
|
118
65
|
- spec/spec_helper.rb
|
119
66
|
homepage: ''
|
120
67
|
licenses: []
|
68
|
+
metadata: {}
|
121
69
|
post_install_message:
|
122
70
|
rdoc_options: []
|
123
71
|
require_paths:
|
124
72
|
- lib
|
125
73
|
required_ruby_version: !ruby/object:Gem::Requirement
|
126
|
-
none: false
|
127
74
|
requirements:
|
128
|
-
- -
|
75
|
+
- - ">="
|
129
76
|
- !ruby/object:Gem::Version
|
130
77
|
version: '0'
|
131
78
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
132
|
-
none: false
|
133
79
|
requirements:
|
134
|
-
- -
|
80
|
+
- - ">="
|
135
81
|
- !ruby/object:Gem::Version
|
136
82
|
version: '0'
|
137
83
|
requirements: []
|
138
84
|
rubyforge_project:
|
139
|
-
rubygems_version:
|
85
|
+
rubygems_version: 2.5.1
|
140
86
|
signing_key:
|
141
|
-
specification_version:
|
142
|
-
summary:
|
87
|
+
specification_version: 4
|
88
|
+
summary: Zipcode based scrapping for moviefone.com
|
143
89
|
test_files:
|
144
90
|
- spec/.rspec
|
145
|
-
- spec/
|
146
|
-
- spec/
|
147
|
-
- spec/
|
91
|
+
- spec/lib/agent_spec.rb
|
92
|
+
- spec/lib/movie_spec.rb
|
93
|
+
- spec/lib/page_spec.rb
|
94
|
+
- spec/lib/theater_spec.rb
|
148
95
|
- spec/spec_helper.rb
|
96
|
+
has_rdoc:
|
data/README.rdoc
DELETED
data/lib/extractor.rb
DELETED
@@ -1,32 +0,0 @@
|
|
1
|
-
require 'mechanize'
|
2
|
-
require 'nokogiri'
|
3
|
-
require 'open-uri'
|
4
|
-
require_relative 'finder'
|
5
|
-
require_relative 'modules/theater'
|
6
|
-
|
7
|
-
class Extractor
|
8
|
-
|
9
|
-
attr_reader :page_theaters
|
10
|
-
|
11
|
-
def initialize(page, zipcode)
|
12
|
-
@page = (Nokogiri::HTML(open(page)))
|
13
|
-
@page_theaters = []
|
14
|
-
extract_movies
|
15
|
-
@zipcode ||= zipcode
|
16
|
-
end
|
17
|
-
|
18
|
-
def extract_movies
|
19
|
-
theaters.each do |theater|
|
20
|
-
theater.extend Theater
|
21
|
-
BootlegTheater.create!(name: theater.name, href: theater.link)
|
22
|
-
theater_info = { name: theater.name, href: theater.link, movies: theater.movies}
|
23
|
-
@page_theaters << theater_info
|
24
|
-
end
|
25
|
-
end
|
26
|
-
|
27
|
-
private
|
28
|
-
|
29
|
-
def theaters
|
30
|
-
@page.css('div.theater')
|
31
|
-
end
|
32
|
-
end
|
data/lib/finder.rb
DELETED
File without changes
|
@@ -1,15 +0,0 @@
|
|
1
|
-
require 'rails/generators/migration'
|
2
|
-
|
3
|
-
module Bootleg
|
4
|
-
module Generators
|
5
|
-
class InstallGenerator < Rails::Generators::Base
|
6
|
-
source_root File.expand_path('../templates', __FILE__)
|
7
|
-
|
8
|
-
def run_generators
|
9
|
-
generate "bootleg:movie"
|
10
|
-
generate "bootleg:theater"
|
11
|
-
generate "bootleg:showtime"
|
12
|
-
end
|
13
|
-
end
|
14
|
-
end
|
15
|
-
end
|
@@ -1,24 +0,0 @@
|
|
1
|
-
require 'rails/generators/migration'
|
2
|
-
|
3
|
-
module Bootleg
|
4
|
-
module Generators
|
5
|
-
class MovieGenerator < Rails::Generators::Base
|
6
|
-
include Rails::Generators::Migration
|
7
|
-
|
8
|
-
source_root File.expand_path('../templates', __FILE__)
|
9
|
-
|
10
|
-
def generate_movie_migration
|
11
|
-
migration_template "movie_migration.rb", "db/migrate/create_bootleg_movies.rb"
|
12
|
-
end
|
13
|
-
|
14
|
-
def generate_movie_model
|
15
|
-
copy_file "movie_model.rb", "app/models/bootleg_movie.rb"
|
16
|
-
end
|
17
|
-
|
18
|
-
def self.next_migration_number(path)
|
19
|
-
@migration_number = Time.now.utc.strftime("%Y%m%d%H%M%S").to_i.to_s
|
20
|
-
end
|
21
|
-
end
|
22
|
-
end
|
23
|
-
end
|
24
|
-
|
@@ -1,24 +0,0 @@
|
|
1
|
-
require 'rails/generators/migration'
|
2
|
-
|
3
|
-
module Bootleg
|
4
|
-
module Generators
|
5
|
-
class ShowtimeGenerator < Rails::Generators::Base
|
6
|
-
include Rails::Generators::Migration
|
7
|
-
|
8
|
-
source_root File.expand_path('../templates', __FILE__)
|
9
|
-
|
10
|
-
def generate_theater_movie_migration
|
11
|
-
migration_template "showtime_migration.rb", "db/migrate/create_bootleg_showtimes.rb"
|
12
|
-
end
|
13
|
-
|
14
|
-
def generate_theater_movie_model
|
15
|
-
copy_file "showtime_model.rb", "app/models/bootleg_showtime.rb"
|
16
|
-
end
|
17
|
-
|
18
|
-
def self.next_migration_number(path)
|
19
|
-
@migration_number = Time.now.utc.strftime("%Y%m%d%H%M%S").to_i.to_s
|
20
|
-
end
|
21
|
-
end
|
22
|
-
end
|
23
|
-
end
|
24
|
-
|
@@ -1,14 +0,0 @@
|
|
1
|
-
class BootlegShowtime < ActiveRecord::Base
|
2
|
-
attr_accessible :bootleg_movie_id, :bootleg_theater_id, :zipcode, :showtimes
|
3
|
-
|
4
|
-
belongs_to :bootleg_movie
|
5
|
-
belongs_to :bootleg_theater
|
6
|
-
|
7
|
-
def theater
|
8
|
-
bootleg_theater
|
9
|
-
end
|
10
|
-
|
11
|
-
def movie
|
12
|
-
bootleg_movie
|
13
|
-
end
|
14
|
-
end
|
@@ -1,24 +0,0 @@
|
|
1
|
-
require 'rails/generators/migration'
|
2
|
-
|
3
|
-
module Bootleg
|
4
|
-
module Generators
|
5
|
-
class TheaterGenerator < Rails::Generators::Base
|
6
|
-
include Rails::Generators::Migration
|
7
|
-
|
8
|
-
source_root File.expand_path('../templates', __FILE__)
|
9
|
-
|
10
|
-
def generate_theater_migration
|
11
|
-
migration_template "theater_migration.rb", "db/migrate/create_bootleg_theaters.rb"
|
12
|
-
end
|
13
|
-
|
14
|
-
def generate_theater_model
|
15
|
-
copy_file "theater_model.rb", "app/models/bootleg_theater.rb"
|
16
|
-
end
|
17
|
-
|
18
|
-
def self.next_migration_number(path)
|
19
|
-
@migration_number = Time.now.utc.strftime("%Y%m%d%H%M%S").to_i.to_s
|
20
|
-
end
|
21
|
-
end
|
22
|
-
end
|
23
|
-
end
|
24
|
-
|
data/lib/manager.rb
DELETED
@@ -1,27 +0,0 @@
|
|
1
|
-
require_relative 'finder'
|
2
|
-
require_relative 'extractor'
|
3
|
-
|
4
|
-
class Manager
|
5
|
-
|
6
|
-
class << self
|
7
|
-
attr_accessor :zipcode
|
8
|
-
end
|
9
|
-
|
10
|
-
def initialize(zipcode)
|
11
|
-
@zipcode = zipcode
|
12
|
-
@pages ||= find_pages
|
13
|
-
@all_theaters = []
|
14
|
-
Manager.zipcode = zipcode
|
15
|
-
end
|
16
|
-
|
17
|
-
def find_pages
|
18
|
-
Finder.new(@zipcode).hrefs
|
19
|
-
end
|
20
|
-
|
21
|
-
def extract_theaters
|
22
|
-
@pages.each do |page|
|
23
|
-
@all_theaters << Extractor.new(page, @zipcode).page_theaters
|
24
|
-
end
|
25
|
-
@all_theaters.flatten
|
26
|
-
end
|
27
|
-
end
|
data/lib/modules/href.rb
DELETED
@@ -1,18 +0,0 @@
|
|
1
|
-
module Href
|
2
|
-
def all
|
3
|
-
pages = []
|
4
|
-
count.times { |nr| pages << url + nr.to_s }
|
5
|
-
pages
|
6
|
-
end
|
7
|
-
|
8
|
-
private
|
9
|
-
|
10
|
-
def count
|
11
|
-
self.links.select { |link| link.text.size < 3 and link.text =~ /\d/ }.last.text.to_i
|
12
|
-
end
|
13
|
-
|
14
|
-
def url
|
15
|
-
self.uri.to_s + '?page='
|
16
|
-
end
|
17
|
-
end
|
18
|
-
|
data/lib/modules/movie.rb
DELETED
@@ -1,24 +0,0 @@
|
|
1
|
-
module Movie
|
2
|
-
def name
|
3
|
-
details.css('a').text.strip
|
4
|
-
end
|
5
|
-
|
6
|
-
def link
|
7
|
-
"http://www.moviefone.com" + details.css('a').attribute('href').value
|
8
|
-
end
|
9
|
-
|
10
|
-
def showtimes
|
11
|
-
values = []
|
12
|
-
showtimes = self.css('a.gt').empty? ? self.css('span.stDisplay') : self.css('a.gt')
|
13
|
-
showtimes.each do |time|
|
14
|
-
values << time.text
|
15
|
-
end
|
16
|
-
values
|
17
|
-
end
|
18
|
-
|
19
|
-
private
|
20
|
-
|
21
|
-
def details
|
22
|
-
self.css('div.movietitle')
|
23
|
-
end
|
24
|
-
end
|
data/lib/modules/theater.rb
DELETED
@@ -1,45 +0,0 @@
|
|
1
|
-
require_relative 'movie'
|
2
|
-
|
3
|
-
module Theater
|
4
|
-
def name
|
5
|
-
details.text.strip
|
6
|
-
end
|
7
|
-
|
8
|
-
def link
|
9
|
-
details.attribute('href').value
|
10
|
-
end
|
11
|
-
|
12
|
-
def movies
|
13
|
-
movies = self.css('div.movie-listing.first')
|
14
|
-
values = []
|
15
|
-
theater = BootlegTheater.last
|
16
|
-
movies.each do |movie|
|
17
|
-
movie.extend Movie
|
18
|
-
movie_info = { name: movie.name, href: movie.link, showtimes: movie.showtimes }
|
19
|
-
values << movie_info
|
20
|
-
insert_movies(theater,movie)
|
21
|
-
end
|
22
|
-
values
|
23
|
-
end
|
24
|
-
|
25
|
-
private
|
26
|
-
def details
|
27
|
-
self.css('h3.title').css('a')
|
28
|
-
end
|
29
|
-
|
30
|
-
def insert_movies(theater, movie)
|
31
|
-
existing_movie = BootlegMovie.where(name: movie.name).first
|
32
|
-
if existing_movie
|
33
|
-
showtime = theater.bootleg_showtimes.new
|
34
|
-
showtime.bootleg_movie_id = existing_movie.id
|
35
|
-
showtime.save
|
36
|
-
else
|
37
|
-
theater.movies.create!(name: movie.name, href: movie.link)
|
38
|
-
end
|
39
|
-
showtime = BootlegShowtime.last
|
40
|
-
showtime.showtimes = movie.showtimes.to_s.gsub(/-/, '').gsub(/\n/,'').strip
|
41
|
-
showtime.zipcode = Manager.zipcode
|
42
|
-
showtime.date = Time.zone.now
|
43
|
-
showtime.save
|
44
|
-
end
|
45
|
-
end
|
data/lib/modules/zipcode.rb
DELETED
data/lib/presenter.rb
DELETED
data/spec/extractor_spec.rb
DELETED
@@ -1,59 +0,0 @@
|
|
1
|
-
require 'spec_helper'
|
2
|
-
|
3
|
-
describe Extractor do
|
4
|
-
it "should raise an error withouth arguments" do
|
5
|
-
expect{ Extractor.new }.to raise_error(ArgumentError)
|
6
|
-
end
|
7
|
-
|
8
|
-
it "should not raise error with one argument" do
|
9
|
-
expect { Extractor.new("http://www.moviefone.com")}.to_not raise_error(ArgumentError)
|
10
|
-
end
|
11
|
-
|
12
|
-
it "should raise an error with more then one argument" do
|
13
|
-
expect { Extractor.new("arg1", "arg2") }.to raise_error(ArgumentError)
|
14
|
-
end
|
15
|
-
|
16
|
-
before :all do
|
17
|
-
@theaters = Extractor.new("http://www.moviefone.com/showtimes/manchester-md/21102/theaters").page_theaters
|
18
|
-
end
|
19
|
-
|
20
|
-
it "should pull out no more then 5 theaters" do
|
21
|
-
@theaters.size.should eq(5)
|
22
|
-
end
|
23
|
-
|
24
|
-
describe Theater do
|
25
|
-
before :all do
|
26
|
-
@theater = @theaters[1]
|
27
|
-
end
|
28
|
-
|
29
|
-
it "name should match expression" do
|
30
|
-
expect(@theater[:name]).to match(/(\w|\s)/)
|
31
|
-
end
|
32
|
-
|
33
|
-
it "href should be a link" do
|
34
|
-
expect(@theater[:href]).to match(/http:\/\/www\.moviefone\.com/)
|
35
|
-
end
|
36
|
-
|
37
|
-
describe Movie do
|
38
|
-
|
39
|
-
before :all do
|
40
|
-
@movie = @theater[:movies].first
|
41
|
-
end
|
42
|
-
it "should have a name, href and showtimes" do
|
43
|
-
expect(@movie.size).to eq(3)
|
44
|
-
end
|
45
|
-
|
46
|
-
it "name should match expression" do
|
47
|
-
expect(@movie[:name]).to match(/(\w|\s)/)
|
48
|
-
end
|
49
|
-
|
50
|
-
it "href shoud mathc expression" do
|
51
|
-
expect(@movie[:href]).to match(/http:\/\/www\.moviefone\.com/)
|
52
|
-
end
|
53
|
-
|
54
|
-
it "shotimes returns an array" do
|
55
|
-
expect(@movie[:showtimes].class).to be(Array)
|
56
|
-
end
|
57
|
-
end
|
58
|
-
end
|
59
|
-
end
|
data/spec/finder_spec.rb
DELETED
@@ -1,26 +0,0 @@
|
|
1
|
-
require 'spec_helper'
|
2
|
-
|
3
|
-
describe Finder do
|
4
|
-
it "should raise an error with no arguments" do
|
5
|
-
expect { Finder.new }.to raise_error(ArgumentError)
|
6
|
-
end
|
7
|
-
|
8
|
-
it "should not raise error with one argument" do
|
9
|
-
expect { Finder.new("smth") }.to_not raise_error(ArgumentError)
|
10
|
-
end
|
11
|
-
|
12
|
-
it "should raise an error with more then one arguments" do
|
13
|
-
expect { Finder.new("smth", "smthelse") }.to raise_error(ArgumentError)
|
14
|
-
end
|
15
|
-
|
16
|
-
|
17
|
-
describe Href do
|
18
|
-
before :all do
|
19
|
-
@hrefs = Finder.new('21102').hrefs
|
20
|
-
end
|
21
|
-
|
22
|
-
it "should have a size of 3" do
|
23
|
-
expect(@hrefs.size).to eq(3)
|
24
|
-
end
|
25
|
-
end
|
26
|
-
end
|
data/spec/presenter_spec.rb
DELETED