greenmonster 0.4.0.dev → 0.5.0
Sign up to get free protection for your applications and to get access to all the features.
- data/.rspec +2 -0
- data/README.markdown +19 -52
- data/Rakefile +3 -0
- data/greenmonster.gemspec +12 -13
- data/lib/greenmonster.rb +47 -43
- data/lib/greenmonster/spider.rb +107 -142
- data/lib/greenmonster/version.rb +2 -2
- data/{test/games/mlb → spec/games/tst}/year_2012/month_03/day_27/gid_2012_03_27_aaamlb_aabmlb_1/blank.txt +0 -0
- data/{test/games/tst/year_2012/month_03/day_27/gid_2012_03_27_aaamlb_aabmlb_1 → spec/games/tst/year_2012/month_03/day_27/gid_2012_03_27_aaamlb_aabmlb_1_bak}/blank.txt +0 -0
- data/spec/games/tst/year_2012/month_03/day_27/not_2012_03_27_aaamlb_aabmlb_1/blank.txt +0 -0
- data/spec/greenmonster/spider_spec.rb +85 -0
- data/spec/greenmonster_spec.rb +45 -0
- data/spec/spec_helper.rb +25 -0
- metadata +52 -71
- data/lib/generators/greenmonster/install_mlb_games_generator.rb +0 -20
- data/lib/generators/greenmonster/install_players_generator.rb +0 -20
- data/lib/generators/templates/install_greenmonster_mlb_games_migration.rb +0 -54
- data/lib/generators/templates/install_greenmonster_players_migration.rb +0 -37
- data/lib/greenmonster/model_extensions/mlb_game.rb +0 -106
- data/lib/greenmonster/model_extensions/mlb_probable_pitcher.rb +0 -11
- data/lib/greenmonster/model_extensions/player.rb +0 -16
- data/lib/greenmonster/parser.rb +0 -27
- data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/boxscore.xml +0 -5
- data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/eventLog.xml +0 -1
- data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/inning/inning_all.xml +0 -1
- data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/inning/inning_hit.xml +0 -1
- data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/linescore.xml +0 -117
- data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/players.xml +0 -56
- data/test/test_create_mlb_game_from_gameday_xml_game.rb +0 -12
- data/test/test_create_players_from_gameday_xml_game.rb +0 -8
- data/test/test_greenmonster.rb +0 -12
- data/test/test_greenmonster_player.rb +0 -7
- data/test/test_greenmonster_spider.rb +0 -107
- data/test/test_greenmonster_traversal.rb +0 -31
- data/test/test_helper.rb +0 -78
- data/test/test_line_score.rb +0 -12
- data/test/test_parse_mlb_probable_pitchers_from_linescore_data.rb +0 -14
- data/test/test_parse_players_from_gameday_xml_files.rb +0 -21
- data/test/test_update_mlb_game_with_linescore_data.rb +0 -23
data/.rspec
ADDED
data/README.markdown
CHANGED
@@ -1,14 +1,9 @@
|
|
1
1
|
Greenmonster
|
2
2
|
============
|
3
3
|
|
4
|
-
Greenmonster is a toolkit for baseball stat enthusiasts or sabermetricians to build a database of play-by-play stats from MLB's [Gameday XML data](http://gd.mlb.com/components/game/).
|
4
|
+
Greenmonster is a toolkit for baseball stat enthusiasts or sabermetricians to build a database of play-by-play stats from MLB's [Gameday XML data](http://gd.mlb.com/components/game/). The current tool provides the ability to spider Gameday XML data from MLB's servers for personal research. Future iterations of the tool will provide the ability to parse the data and store it in a SQL database.
|
5
5
|
|
6
|
-
|
7
|
-
* Spidering of MLB/MiLB games
|
8
|
-
* Parsing of Gameday XML
|
9
|
-
* Mixin methods you can use to extend your own classes
|
10
|
-
|
11
|
-
Usage
|
6
|
+
Usage
|
12
7
|
=====
|
13
8
|
|
14
9
|
Spider
|
@@ -21,66 +16,39 @@ If you don't want to specify a download location every time you run the spider,
|
|
21
16
|
Greenmonster.set_games_folder('/Users/geoff/games/')
|
22
17
|
```
|
23
18
|
|
24
|
-
The spider utility has three public class methods: Spider.pull_game, Spider.pull_day, and Spider.pull_days.
|
19
|
+
The spider utility has three public class methods: Spider.pull_game, Spider.pull_day, and Spider.pull_days.
|
25
20
|
|
26
|
-
Spider.pull_game takes a game_id (the folder name of the game on the Gameday server) and
|
21
|
+
Spider.pull_game takes a game_id (the folder name of the game on the Gameday server) and the date. The date is necessary because if a game is postponed or (yes, it's happened this decade) preponed, the game ID might have a date different than the actual date on which the game was played.
|
27
22
|
|
28
23
|
```ruby
|
29
|
-
#
|
30
|
-
Greenmonster::Spider.pull_game('gid_2011_07_04_tormlb_bosmlb_1',
|
24
|
+
# Pull MLB's 7/4/2011 Toronto @ Boston game
|
25
|
+
Greenmonster::Spider.pull_game('gid_2011_07_04_tormlb_bosmlb_1', Date.new(2011,7,4))
|
31
26
|
```
|
32
27
|
|
33
28
|
Spider.pull_day takes an hash of options as an argument. Greenmonster will create subfolders by MLB "sport_code" (MLB games fall under 'mlb', various minor league games and non-MLB/MiLB games fall under other sport code designations), and then children folders for years, months, days, and specific games. Sport code can be a string or an array of sport code strings.
|
34
29
|
|
35
30
|
```ruby
|
36
|
-
#
|
37
|
-
Greenmonster::Spider.pull_day(
|
38
|
-
|
39
|
-
# Pulls all rookie league games for today
|
40
|
-
Greenmonster::Spider.pull_day({:sport_code => 'rok', :date => Date.today, :games_folder => './home/geoff/games'})
|
31
|
+
# Pull all MLB games for today
|
32
|
+
Greenmonster::Spider.pull_day(Date.today, 'mlb')
|
41
33
|
|
42
|
-
#
|
43
|
-
Greenmonster::Spider.pull_day(
|
34
|
+
# Pull all rookie league games for today
|
35
|
+
Greenmonster::Spider.pull_day(Date.today, 'rok')
|
44
36
|
|
45
|
-
#
|
46
|
-
Greenmonster::
|
37
|
+
# Pull all games in all sport codes for today
|
38
|
+
Greenmonster::SPORT_CODES.each do |sport_code|
|
39
|
+
Greenmonster::Spider.pull_day(Date.today, sport_code)
|
40
|
+
end
|
47
41
|
```
|
48
42
|
|
49
43
|
|
50
44
|
|
51
|
-
Spider.pull_days takes a range of dates to process as an argument, plus
|
45
|
+
Spider.pull_days takes a range of dates to process as an argument, plus the sport code for the games (MLB.
|
52
46
|
|
53
47
|
```ruby
|
54
|
-
#
|
55
|
-
Greenmonster::Spider.pull_days((Date.new(2012,4,1)..Date.new(2012,4,30)),
|
48
|
+
# Pull all MLB games for in April, 2012
|
49
|
+
Greenmonster::Spider.pull_days((Date.new(2012,4,1)..Date.new(2012,4,30)), 'mlb')
|
56
50
|
```
|
57
51
|
|
58
|
-
Mixins (ALPHA)
|
59
|
-
--------------
|
60
|
-
(Under development.)
|
61
|
-
|
62
|
-
As of version 0.4.0, Greenmonster provides the Greenmonster::Player module which can be used to extend any Ruby class you use that represents players. Include the module in your class to get Greenmonster-specific functionality like parsing players out of games.
|
63
|
-
|
64
|
-
```ruby
|
65
|
-
class MlbPlayer < ActiveRecord::Base
|
66
|
-
include Greenmonster::Player
|
67
|
-
end
|
68
|
-
|
69
|
-
>> MlbPlayer.create_from_gameday_xml_game('gid_2011_07_04_tormlb_bosmlb_1')
|
70
|
-
```
|
71
|
-
|
72
|
-
Migrations (ALPHA)
|
73
|
-
------------------
|
74
|
-
|
75
|
-
WARNING: THIS FEATURE IS UNDER DEVELOPMENT. USE AT YOUR OWN RISK.
|
76
|
-
ANOTHER WARNING: Using migrations on alpha/beta/pre versions of Greenmonster may not be compatible with formal releases that come later.
|
77
|
-
|
78
|
-
If you use ActiveRecord, Greenmonster provides a generator that can generate tables for Greenmonster data. Add Greenmonster to your Gemfile:
|
79
|
-
```ruby
|
80
|
-
gem 'greenmonster', '~> 0.4.0'
|
81
|
-
```
|
82
|
-
|
83
|
-
After you pull the gem in with Bundler, you will have access to Greenmonster generators. The Install generator attempts to install a set of standard name tables that correspond to Greenmonster data.
|
84
52
|
|
85
53
|
|
86
54
|
Requirements
|
@@ -89,19 +57,18 @@ Requirements
|
|
89
57
|
- Bundler
|
90
58
|
- Nokogiri
|
91
59
|
- HTTParty
|
92
|
-
- ActiveRecord (if you want to use migration generators or any mixins that involve AR saves)
|
93
60
|
|
94
61
|
Testing
|
95
62
|
-------
|
96
63
|
|
97
|
-
The test suite
|
64
|
+
The test suite is being migrated to RSpec, and uses bourne.
|
98
65
|
|
99
66
|
|
100
67
|
License
|
101
68
|
-------
|
102
69
|
(The MIT License)
|
103
70
|
|
104
|
-
Copyright © [Geoff Harcourt](http://github.com/geoffharcourt) 2012
|
71
|
+
Copyright © [Geoff Harcourt](http://github.com/geoffharcourt) 2012-2013
|
105
72
|
|
106
73
|
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ‘Software’), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
107
74
|
|
data/Rakefile
CHANGED
data/greenmonster.gemspec
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
# -*- encoding: utf-8 -*-
|
2
|
-
$:.push File.expand_path(
|
3
|
-
require
|
2
|
+
$:.push File.expand_path('../lib', __FILE__)
|
3
|
+
require 'greenmonster/version'
|
4
4
|
|
5
5
|
Gem::Specification.new do |s|
|
6
|
-
s.name =
|
6
|
+
s.name = 'greenmonster'
|
7
7
|
s.version = Greenmonster::VERSION
|
8
|
-
s.authors = [
|
9
|
-
s.email = [
|
10
|
-
s.homepage =
|
8
|
+
s.authors = ['Geoff Harcourt']
|
9
|
+
s.email = ['geoff.harcourt@gmail.com']
|
10
|
+
s.homepage = 'http://github.com/geoffharcourt/greenmonster'
|
11
11
|
s.summary = %q{A utility for working with MLB Gameday XML data.}
|
12
12
|
s.description = %q{A utility for working with MLB Gameday XML data.}
|
13
13
|
|
@@ -16,10 +16,9 @@ Gem::Specification.new do |s|
|
|
16
16
|
s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
|
17
17
|
s.require_paths = ["lib"]
|
18
18
|
|
19
|
-
s.add_dependency
|
20
|
-
s.add_dependency
|
21
|
-
s.
|
22
|
-
s.add_development_dependency
|
23
|
-
s.add_development_dependency
|
24
|
-
|
25
|
-
end
|
19
|
+
s.add_dependency 'nokogiri'
|
20
|
+
s.add_dependency 'httparty'
|
21
|
+
s.add_development_dependency 'rspec'
|
22
|
+
s.add_development_dependency 'bourne'
|
23
|
+
s.add_development_dependency 'rake'
|
24
|
+
end
|
data/lib/greenmonster.rb
CHANGED
@@ -3,26 +3,26 @@ require "bundler/setup"
|
|
3
3
|
|
4
4
|
require 'httparty'
|
5
5
|
require 'nokogiri'
|
6
|
-
require 'pathname'
|
7
6
|
require 'fileutils'
|
8
|
-
require 'active_record'
|
9
7
|
require 'date'
|
10
8
|
|
11
9
|
module Greenmonster
|
10
|
+
SPORT_CODES = %w(aaa aax afa afx asx bbc fps hsb ind int jml nae naf nas nat naw oly rok win)
|
11
|
+
|
12
12
|
@@games_folder = nil
|
13
|
-
|
13
|
+
|
14
14
|
##
|
15
15
|
# Set the default folder to which games
|
16
16
|
# are saved after being downloaded from
|
17
17
|
# the server.
|
18
18
|
#
|
19
|
-
# Example:
|
20
|
-
#
|
19
|
+
# Example:
|
20
|
+
# => Greenmonster.set_games_folder("/Users/geoff/game_data")
|
21
21
|
#
|
22
22
|
# Arguments:
|
23
|
-
#
|
23
|
+
# location: (String)
|
24
24
|
#
|
25
|
-
|
25
|
+
|
26
26
|
def self.set_games_folder(location)
|
27
27
|
@@games_folder = Pathname.new(location)
|
28
28
|
end
|
@@ -31,57 +31,61 @@ module Greenmonster
|
|
31
31
|
# Return the default games folder location
|
32
32
|
#
|
33
33
|
# Example:
|
34
|
-
#
|
35
|
-
#
|
36
|
-
#
|
34
|
+
# >> Greenmonster.set_games_folder("/Users/geoff/game_data")
|
35
|
+
# >> Greenmonster.games_folder
|
36
|
+
# => #<Pathname:/Users/geoff/game_data>
|
37
37
|
|
38
38
|
def self.games_folder
|
39
39
|
@@games_folder
|
40
40
|
end
|
41
|
-
|
42
|
-
def self.format_date_as_folder(date)
|
43
|
-
date.strftime("year_%Y/month_%m/day_%d")
|
44
|
-
end
|
45
|
-
|
41
|
+
|
46
42
|
##
|
47
43
|
# Walk the dates in a range of dates, and execute whatever
|
48
|
-
# methods on the date and argument set specified. Used when
|
44
|
+
# methods on the date and argument set specified. Used when
|
49
45
|
# processing games and players.
|
50
46
|
#
|
51
47
|
#
|
52
|
-
|
53
|
-
def self.traverse_dates(range
|
54
|
-
|
55
|
-
yield day,args
|
56
|
-
end
|
48
|
+
|
49
|
+
def self.traverse_dates(range, args)
|
50
|
+
range.each { |day| yield day,args }
|
57
51
|
end
|
58
|
-
|
52
|
+
|
59
53
|
##
|
60
54
|
# Walk the game folders in a range of dates, and execute whatever
|
61
|
-
# methods on the game folder and argument set specified. Used when
|
55
|
+
# methods on the game folder and argument set specified. Used when
|
62
56
|
# processing games and players.
|
63
57
|
#
|
64
58
|
#
|
65
|
-
|
66
|
-
def self.traverse_folders_for_date(
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
puts "No files for #{day.to_s}."
|
71
|
-
end
|
72
|
-
|
73
|
-
unless folders.nil?
|
74
|
-
folders.sort.each do |gdir|
|
75
|
-
if gdir[0,3] == 'gid' and gdir[-4,4] != "_bak"
|
76
|
-
yield gdir, args
|
77
|
-
end
|
78
|
-
end
|
79
|
-
end
|
59
|
+
|
60
|
+
def self.traverse_folders_for_date(date, args)
|
61
|
+
game_folders_for_date_and_sport_code(date, args[:sport_code]).each do |gdir|
|
62
|
+
yield gdir, args
|
63
|
+
end
|
80
64
|
end
|
65
|
+
|
66
|
+
|
67
|
+
private
|
68
|
+
|
69
|
+
def self.game_folders_for_date_and_sport_code(date, sport_code)
|
70
|
+
folders_for_date_and_sport_code(date, sport_code).select do |folder|
|
71
|
+
folder[0,3] == 'gid' && folder[-4,4] != '_bak'
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
def self.folders_for_date_and_sport_code(date, sport_code)
|
76
|
+
begin
|
77
|
+
Dir.entries(
|
78
|
+
Pathname.new(games_folder) + sport_code + format_date_as_folder(date)
|
79
|
+
).sort
|
80
|
+
rescue Errno::ENOENT
|
81
|
+
[]
|
82
|
+
end
|
83
|
+
end
|
84
|
+
|
85
|
+
def self.format_date_as_folder(date)
|
86
|
+
date.strftime("year_%Y/month_%m/day_%d")
|
87
|
+
end
|
88
|
+
|
81
89
|
end
|
82
90
|
|
83
|
-
require 'greenmonster/spider'
|
84
|
-
require 'greenmonster/model_extensions/player'
|
85
|
-
require 'greenmonster/model_extensions/mlb_game'
|
86
|
-
require 'greenmonster/model_extensions/mlb_probable_pitcher'
|
87
|
-
require 'greenmonster/parser'
|
91
|
+
require 'greenmonster/spider'
|
data/lib/greenmonster/spider.rb
CHANGED
@@ -1,107 +1,43 @@
|
|
1
1
|
# The Gameday XML Spider utility
|
2
|
-
|
2
|
+
class Greenmonster::Spider
|
3
3
|
include HTTParty
|
4
|
-
|
4
|
+
|
5
5
|
##
|
6
|
-
# Pull Gameday XML files for a given game, specified by the game
|
6
|
+
# Pull Gameday XML files for a given game, specified by the game
|
7
7
|
# ID. If date and sport code are not specified as options, these
|
8
8
|
# values are guessed from the game ID string using the home team's
|
9
9
|
# sport code and the date from the scheduled date values in the game
|
10
10
|
# ID.
|
11
11
|
#
|
12
12
|
# Example:
|
13
|
-
#
|
14
|
-
|
15
|
-
def self.pull_game(game_id,args = {})
|
16
|
-
args = {
|
17
|
-
:date => args[:date] || Date.new(game_id[4,4].to_i, game_id[9,2].to_i, game_id[12,2].to_i),
|
18
|
-
:sport_code => args[:sport_code] || game_id[25,3],
|
19
|
-
:print_games => true,
|
20
|
-
:games_folder => Greenmonster.games_folder
|
21
|
-
}.merge(args)
|
22
|
-
raise "Games folder location required." if args[:games_folder].nil?
|
23
|
-
|
24
|
-
args[:games_folder] = Pathname.new(args[:games_folder])
|
25
|
-
|
26
|
-
puts game_id if args[:print_games]
|
27
|
-
|
28
|
-
paths = {
|
29
|
-
:localGameFolder => args[:games_folder] + args[:sport_code] + format_date_as_folder(args[:date]) + game_id,
|
30
|
-
:mlbGameFolder => "#{gameday_league_and_date_url(args)}/#{game_id}/"
|
31
|
-
}
|
32
|
-
|
33
|
-
FileUtils.mkdir_p paths[:localGameFolder] + 'inning'
|
34
|
-
|
35
|
-
begin
|
36
|
-
# Always copy linescore first. If we can't get this
|
37
|
-
# data, all other game data is useless.
|
38
|
-
copy_gameday_xml('linescore.xml',paths)
|
39
|
-
|
40
|
-
if args[:date].year > 2007
|
41
|
-
copy_gameday_xml('inning_all.xml',paths)
|
42
|
-
copy_gameday_xml('inning_hit.xml',paths)
|
43
|
-
else
|
44
|
-
# Iterate through the inning files, but skip inning
|
45
|
-
# files numbered 0 (some bad spring training data).
|
46
|
-
# Necessary for games prior to 2008 because there is
|
47
|
-
# no inning_all.xml file in older games.
|
48
|
-
(Nokogiri::XML(self.get("#{paths[:mlbGameFolder]}/inning/").body).search('a')).each do |ic|
|
49
|
-
copy_gameday_xml(ic.attribute('href').value,paths) if ic.attribute('href').value[-3,3] == "xml" unless ic.attribute('href').value[-6,6] == "_0.xml" or ic.attribute('href').value.include?('Score')
|
50
|
-
end
|
51
|
-
end
|
13
|
+
# >> Gameday::Spider.pull_game('',{:games_folder => })
|
52
14
|
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
rescue StandardError => bang
|
59
|
-
puts "Unable to download some data for #{game_id}"
|
15
|
+
def pull_game(game_id, date)
|
16
|
+
make_folders_for_game(game_id, date)
|
17
|
+
|
18
|
+
%w(boxscore.xml game_events.xml inning_all.xml linescore.xml players.xml).each do |file_name|
|
19
|
+
copy_gameday_xml(game_id, date, file_name)
|
60
20
|
end
|
61
|
-
|
62
|
-
return game_id
|
63
21
|
end
|
64
|
-
|
22
|
+
|
65
23
|
##
|
66
|
-
# Pull Gameday XML files for a given date. Default options for
|
67
|
-
# the spider are to pull games with sport_code of 'mlb' (games
|
68
|
-
# played by MLB games rather than MiLB teams or foreign teams)
|
69
|
-
# and to pull games on the current date.
|
24
|
+
# Pull Gameday XML files for a given date. Default options for
|
25
|
+
# the spider are to pull games with sport_code of 'mlb' (games
|
26
|
+
# played by MLB games rather than MiLB teams or foreign teams)
|
27
|
+
# and to pull games on the current date.
|
70
28
|
#
|
71
29
|
# Example:
|
72
|
-
#
|
73
|
-
#
|
30
|
+
# # Pull games from July 4, 2011
|
31
|
+
# >> Gameday::Spider.pull_day({:date => Date.new(2011,7,1), :games_folder => '/Users/geoff/games'})
|
74
32
|
#
|
75
33
|
# Arguments:
|
76
|
-
#
|
34
|
+
# args: (Hash)
|
77
35
|
#
|
78
|
-
|
79
|
-
def self.pull_day(args = {})
|
80
|
-
args = {
|
81
|
-
:date => Date.today,
|
82
|
-
:sport_code => 'mlb',
|
83
|
-
}.merge(args)
|
84
|
-
|
85
|
-
# If we want all sport codes, set up the array.
|
86
|
-
if args[:all_sport_codes]
|
87
|
-
args[:sport_codes] = %w(aaa aax afa afx asx bbc fps hsb ind int jml nae naf nas nat naw oly rok win)
|
88
|
-
else
|
89
|
-
args[:sport_codes] = [args[:sport_code] || 'mlb'].flatten
|
90
|
-
end
|
91
|
-
|
92
|
-
# Iterate through every hyperlink on the page.
|
93
|
-
# These links represent the individual game folders
|
94
|
-
# for each date. Reject any links that aren't to game
|
95
|
-
# folders or that are to what look like backup game
|
96
|
-
# folders.
|
97
|
-
args[:sport_codes].each do |sport_code|
|
98
|
-
args[:sport_code] = sport_code
|
99
|
-
(Nokogiri::XML(self.get(gameday_league_and_date_url(args)))/"a").reject{|l| l.attribute('href').value[0,4] != "gid_" or l.attribute('href').value[-5,4] == "_bak"}.each do |e|
|
100
|
-
self.pull_game(e.attribute('href').value.gsub('/',''),args)
|
101
|
-
end
|
102
|
-
end
|
103
36
|
|
104
|
-
|
37
|
+
def pull_day(date, sport_code)
|
38
|
+
game_links_on_gameday_date_page(date, sport_code).each do |game_id|
|
39
|
+
pull_game(game_id, date)
|
40
|
+
end
|
105
41
|
end
|
106
42
|
|
107
43
|
##
|
@@ -109,66 +45,95 @@ module Greenmonster::Spider
|
|
109
45
|
# passes arguments like games_folder location on to Spider.pull.
|
110
46
|
#
|
111
47
|
# Example:
|
112
|
-
#
|
113
|
-
#
|
48
|
+
# # Pull all games in MLB in July 2011
|
49
|
+
# >> Gameday::Spider.pull_days(Date.new(2011,7,1)..Date.new(2011,7,31), {:games_folder => '/Users/geoff/games'})
|
114
50
|
#
|
115
51
|
# Arguments:
|
116
|
-
#
|
117
|
-
#
|
118
|
-
|
119
|
-
def
|
120
|
-
range.each {|
|
121
|
-
end
|
122
|
-
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
#
|
133
|
-
def self.gameday_league_and_date_url(args = {})
|
134
|
-
raise "Date required." unless args[:date]
|
135
|
-
args[:sport_code] ||= 'mlb'
|
136
|
-
|
137
|
-
"http://gd2.mlb.com/components/game/#{args[:sport_code]}/#{format_date_as_folder(args[:date])}"
|
52
|
+
# range: (Range)
|
53
|
+
# args: (Hash)
|
54
|
+
|
55
|
+
def pull_days(range, sport_code)
|
56
|
+
range.each { |date| self.pull_day(date, sport_code) }
|
57
|
+
end
|
58
|
+
|
59
|
+
private
|
60
|
+
|
61
|
+
def get_gameday_date_page(date, sport_code)
|
62
|
+
self.class.get(gameday_date_and_sport_code_url(date, sport_code))
|
63
|
+
end
|
64
|
+
|
65
|
+
def links_on_gameday_date_page(date, sport_code)
|
66
|
+
Nokogiri::XML(get_gameday_date_page(date, sport_code)).search('a').map do |a|
|
67
|
+
a.attribute('href').value
|
138
68
|
end
|
139
|
-
|
140
|
-
|
141
|
-
|
142
|
-
|
143
|
-
|
144
|
-
|
145
|
-
|
146
|
-
|
147
|
-
|
148
|
-
|
149
|
-
|
150
|
-
|
151
|
-
|
152
|
-
|
153
|
-
|
154
|
-
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
69
|
+
end
|
70
|
+
|
71
|
+
def game_links_on_gameday_date_page(date, sport_code)
|
72
|
+
links_on_gameday_date_page(date, sport_code).select do |link|
|
73
|
+
link[0,4] == "gid_" && link[-5,4] != "_bak"
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
def gameday_url_root
|
78
|
+
"http://gd2.mlb.com/components/game/"
|
79
|
+
end
|
80
|
+
|
81
|
+
def gameday_date_and_sport_code_url(date, sport_code)
|
82
|
+
"#{gameday_url_root}#{sport_code}/#{format_date_as_folder(date)}"
|
83
|
+
end
|
84
|
+
|
85
|
+
def gameday_game_url(game_id, date)
|
86
|
+
gameday_url_root + remote_game_path(game_id, date)
|
87
|
+
end
|
88
|
+
|
89
|
+
def remote_game_path(game_id, date)
|
90
|
+
"#{home_sport_code_from_game_id(game_id)}/#{format_date_as_folder(date)}/#{game_id}"
|
91
|
+
end
|
92
|
+
|
93
|
+
def home_sport_code_from_game_id(game_id)
|
94
|
+
game_id[-5,3]
|
95
|
+
end
|
96
|
+
|
97
|
+
def inning_prefix(file_name)
|
98
|
+
if file_name =~ /inning/
|
99
|
+
'inning/'
|
100
|
+
else
|
101
|
+
''
|
159
102
|
end
|
160
|
-
|
161
|
-
|
162
|
-
|
163
|
-
|
164
|
-
|
165
|
-
|
166
|
-
|
167
|
-
|
168
|
-
|
169
|
-
|
170
|
-
|
171
|
-
|
172
|
-
Greenmonster.
|
103
|
+
end
|
104
|
+
|
105
|
+
def remote_file_url(game_id, date, file_name)
|
106
|
+
gameday_game_url(game_id, date) + '/' + inning_prefix(file_name) + '/' + file_name
|
107
|
+
end
|
108
|
+
|
109
|
+
def download_gameday_xml(game_id, date, file_name)
|
110
|
+
self.class.get(remote_file_url(game_id, date, file_name)).body.force_encoding("ISO-8859-1").encode("UTF-8")
|
111
|
+
end
|
112
|
+
|
113
|
+
def local_game_path(game_id, date)
|
114
|
+
Pathname.new(
|
115
|
+
Greenmonster.games_folder +
|
116
|
+
home_sport_code_from_game_id(game_id) +
|
117
|
+
format_date_as_folder(date) +
|
118
|
+
game_id
|
119
|
+
)
|
120
|
+
end
|
121
|
+
|
122
|
+
def copy_gameday_xml(game_id, date, file_name)
|
123
|
+
download = download_gameday_xml(game_id, date, file_name)
|
124
|
+
|
125
|
+
unless download.include?('404 Not Found')
|
126
|
+
open(local_game_path(game_id, date) + inning_prefix(file_name) + file_name, 'w') do |file|
|
127
|
+
file.write(download)
|
128
|
+
end
|
173
129
|
end
|
130
|
+
end
|
131
|
+
|
132
|
+
def format_date_as_folder(date)
|
133
|
+
Greenmonster.format_date_as_folder(date)
|
134
|
+
end
|
135
|
+
|
136
|
+
def make_folders_for_game(game_id, date)
|
137
|
+
FileUtils.mkdir_p(local_game_path(game_id, date) + 'inning')
|
138
|
+
end
|
174
139
|
end
|