greenmonster 0.4.0.dev → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (39) hide show
  1. data/.rspec +2 -0
  2. data/README.markdown +19 -52
  3. data/Rakefile +3 -0
  4. data/greenmonster.gemspec +12 -13
  5. data/lib/greenmonster.rb +47 -43
  6. data/lib/greenmonster/spider.rb +107 -142
  7. data/lib/greenmonster/version.rb +2 -2
  8. data/{test/games/mlb → spec/games/tst}/year_2012/month_03/day_27/gid_2012_03_27_aaamlb_aabmlb_1/blank.txt +0 -0
  9. data/{test/games/tst/year_2012/month_03/day_27/gid_2012_03_27_aaamlb_aabmlb_1 → spec/games/tst/year_2012/month_03/day_27/gid_2012_03_27_aaamlb_aabmlb_1_bak}/blank.txt +0 -0
  10. data/spec/games/tst/year_2012/month_03/day_27/not_2012_03_27_aaamlb_aabmlb_1/blank.txt +0 -0
  11. data/spec/greenmonster/spider_spec.rb +85 -0
  12. data/spec/greenmonster_spec.rb +45 -0
  13. data/spec/spec_helper.rb +25 -0
  14. metadata +52 -71
  15. data/lib/generators/greenmonster/install_mlb_games_generator.rb +0 -20
  16. data/lib/generators/greenmonster/install_players_generator.rb +0 -20
  17. data/lib/generators/templates/install_greenmonster_mlb_games_migration.rb +0 -54
  18. data/lib/generators/templates/install_greenmonster_players_migration.rb +0 -37
  19. data/lib/greenmonster/model_extensions/mlb_game.rb +0 -106
  20. data/lib/greenmonster/model_extensions/mlb_probable_pitcher.rb +0 -11
  21. data/lib/greenmonster/model_extensions/player.rb +0 -16
  22. data/lib/greenmonster/parser.rb +0 -27
  23. data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/boxscore.xml +0 -5
  24. data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/eventLog.xml +0 -1
  25. data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/inning/inning_all.xml +0 -1
  26. data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/inning/inning_hit.xml +0 -1
  27. data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/linescore.xml +0 -117
  28. data/test/games/mlb/year_2011/month_07/day_04/gid_2011_07_04_tormlb_bosmlb_1/players.xml +0 -56
  29. data/test/test_create_mlb_game_from_gameday_xml_game.rb +0 -12
  30. data/test/test_create_players_from_gameday_xml_game.rb +0 -8
  31. data/test/test_greenmonster.rb +0 -12
  32. data/test/test_greenmonster_player.rb +0 -7
  33. data/test/test_greenmonster_spider.rb +0 -107
  34. data/test/test_greenmonster_traversal.rb +0 -31
  35. data/test/test_helper.rb +0 -78
  36. data/test/test_line_score.rb +0 -12
  37. data/test/test_parse_mlb_probable_pitchers_from_linescore_data.rb +0 -14
  38. data/test/test_parse_players_from_gameday_xml_files.rb +0 -21
  39. data/test/test_update_mlb_game_with_linescore_data.rb +0 -23
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --color
2
+ --format progress
@@ -1,14 +1,9 @@
1
1
  Greenmonster
2
2
  ============
3
3
 
4
- Greenmonster is a toolkit for baseball stat enthusiasts or sabermetricians to build a database of play-by-play stats from MLB's [Gameday XML data](http://gd.mlb.com/components/game/).
4
+ Greenmonster is a toolkit for baseball stat enthusiasts or sabermetricians to build a database of play-by-play stats from MLB's [Gameday XML data](http://gd.mlb.com/components/game/). The current tool provides the ability to spider Gameday XML data from MLB's servers for personal research. Future iterations of the tool will provide the ability to parse the data and store it in a SQL database.
5
5
 
6
- The provides three tools:
7
- * Spidering of MLB/MiLB games
8
- * Parsing of Gameday XML
9
- * Mixin methods you can use to extend your own classes
10
-
11
- Usage
6
+ Usage
12
7
  =====
13
8
 
14
9
  Spider
@@ -21,66 +16,39 @@ If you don't want to specify a download location every time you run the spider,
21
16
  Greenmonster.set_games_folder('/Users/geoff/games/')
22
17
  ```
23
18
 
24
- The spider utility has three public class methods: Spider.pull_game, Spider.pull_day, and Spider.pull_days.
19
+ The spider utility has three public class methods: Spider.pull_game, Spider.pull_day, and Spider.pull_days.
25
20
 
26
- Spider.pull_game takes a game_id (the folder name of the game on the Gameday server) and a hash of options as arguments. If for some reason the game does not fall in the expected folder for the game's date or sport code, you can add those options to the arguments hash. Other options include :games_folder and :print_games (if false, game IDs are not printed to screen).
21
+ Spider.pull_game takes a game_id (the folder name of the game on the Gameday server) and the date. The date is necessary because if a game is postponed or (yes, it's happened this decade) preponed, the game ID might have a date different than the actual date on which the game was played.
27
22
 
28
23
  ```ruby
29
- # Pulls MLB's 7/4/2011 Toronto @ Boston game
30
- Greenmonster::Spider.pull_game('gid_2011_07_04_tormlb_bosmlb_1', {:print_games => false})
24
+ # Pull MLB's 7/4/2011 Toronto @ Boston game
25
+ Greenmonster::Spider.pull_game('gid_2011_07_04_tormlb_bosmlb_1', Date.new(2011,7,4))
31
26
  ```
32
27
 
33
28
  Spider.pull_day takes an hash of options as an argument. Greenmonster will create subfolders by MLB "sport_code" (MLB games fall under 'mlb', various minor league games and non-MLB/MiLB games fall under other sport code designations), and then children folders for years, months, days, and specific games. Sport code can be a string or an array of sport code strings.
34
29
 
35
30
  ```ruby
36
- # Pulls all MLB games for today
37
- Greenmonster::Spider.pull_day({:date => Date.today, :games_folder => './home/geoff/games'})
38
-
39
- # Pulls all rookie league games for today
40
- Greenmonster::Spider.pull_day({:sport_code => 'rok', :date => Date.today, :games_folder => './home/geoff/games'})
31
+ # Pull all MLB games for today
32
+ Greenmonster::Spider.pull_day(Date.today, 'mlb')
41
33
 
42
- # Pulls all games in all sport codes for today
43
- Greenmonster::Spider.pull_day({:all_sport_codes => true, :date => Date.today, :games_folder => './home/geoff/games'})
34
+ # Pull all rookie league games for today
35
+ Greenmonster::Spider.pull_day(Date.today, 'rok')
44
36
 
45
- # Pulls all games in rookie and winter league games for January 2nd, 2010
46
- Greenmonster::Spider.pull_day({:sport_code => ['rok','win'], :date => Date.new(2012,1,2), :games_folder => './home/geoff/games'})
37
+ # Pull all games in all sport codes for today
38
+ Greenmonster::SPORT_CODES.each do |sport_code|
39
+ Greenmonster::Spider.pull_day(Date.today, sport_code)
40
+ end
47
41
  ```
48
42
 
49
43
 
50
44
 
51
- Spider.pull_days takes a range of dates to process as an argument, plus a hash of arguments to pass to Spider.pull.
45
+ Spider.pull_days takes a range of dates to process as an argument, plus the sport code for the games (MLB.
52
46
 
53
47
  ```ruby
54
- # Pulls all MLB games for in April, 2012
55
- Greenmonster::Spider.pull_days((Date.new(2012,4,1)..Date.new(2012,4,30)), {:games_folder => './home/geoff/games'})
48
+ # Pull all MLB games for in April, 2012
49
+ Greenmonster::Spider.pull_days((Date.new(2012,4,1)..Date.new(2012,4,30)), 'mlb')
56
50
  ```
57
51
 
58
- Mixins (ALPHA)
59
- --------------
60
- (Under development.)
61
-
62
- As of version 0.4.0, Greenmonster provides the Greenmonster::Player module which can be used to extend any Ruby class you use that represents players. Include the module in your class to get Greenmonster-specific functionality like parsing players out of games.
63
-
64
- ```ruby
65
- class MlbPlayer < ActiveRecord::Base
66
- include Greenmonster::Player
67
- end
68
-
69
- >> MlbPlayer.create_from_gameday_xml_game('gid_2011_07_04_tormlb_bosmlb_1')
70
- ```
71
-
72
- Migrations (ALPHA)
73
- ------------------
74
-
75
- WARNING: THIS FEATURE IS UNDER DEVELOPMENT. USE AT YOUR OWN RISK.
76
- ANOTHER WARNING: Using migrations on alpha/beta/pre versions of Greenmonster may not be compatible with formal releases that come later.
77
-
78
- If you use ActiveRecord, Greenmonster provides a generator that can generate tables for Greenmonster data. Add Greenmonster to your Gemfile:
79
- ```ruby
80
- gem 'greenmonster', '~> 0.4.0'
81
- ```
82
-
83
- After you pull the gem in with Bundler, you will have access to Greenmonster generators. The Install generator attempts to install a set of standard name tables that correspond to Greenmonster data.
84
52
 
85
53
 
86
54
  Requirements
@@ -89,19 +57,18 @@ Requirements
89
57
  - Bundler
90
58
  - Nokogiri
91
59
  - HTTParty
92
- - ActiveRecord (if you want to use migration generators or any mixins that involve AR saves)
93
60
 
94
61
  Testing
95
62
  -------
96
63
 
97
- The test suite downloads a few days of data, so it is not fast to execute.
64
+ The test suite is being migrated to RSpec, and uses bourne.
98
65
 
99
66
 
100
67
  License
101
68
  -------
102
69
  (The MIT License)
103
70
 
104
- Copyright &copy; [Geoff Harcourt](http://github.com/geoffharcourt) 2012
71
+ Copyright &copy; [Geoff Harcourt](http://github.com/geoffharcourt) 2012-2013
105
72
 
106
73
  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ‘Software’), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
107
74
 
data/Rakefile CHANGED
@@ -1,9 +1,12 @@
1
1
  require "bundler/gem_tasks"
2
2
  require 'rake/testtask'
3
+ require 'rspec/core/rake_task'
3
4
 
4
5
  Rake::TestTask.new do |t|
5
6
  t.libs << 'test'
6
7
  end
7
8
 
9
+ RSpec::Core::RakeTask.new('spec')
10
+
8
11
  desc "Run Tests"
9
12
  task :default => :test
@@ -1,13 +1,13 @@
1
1
  # -*- encoding: utf-8 -*-
2
- $:.push File.expand_path("../lib", __FILE__)
3
- require "greenmonster/version"
2
+ $:.push File.expand_path('../lib', __FILE__)
3
+ require 'greenmonster/version'
4
4
 
5
5
  Gem::Specification.new do |s|
6
- s.name = "greenmonster"
6
+ s.name = 'greenmonster'
7
7
  s.version = Greenmonster::VERSION
8
- s.authors = ["Geoff Harcourt"]
9
- s.email = ["geoff.harcourt@gmail.com"]
10
- s.homepage = "http://github.com/geoffharcourt/greenmonster"
8
+ s.authors = ['Geoff Harcourt']
9
+ s.email = ['geoff.harcourt@gmail.com']
10
+ s.homepage = 'http://github.com/geoffharcourt/greenmonster'
11
11
  s.summary = %q{A utility for working with MLB Gameday XML data.}
12
12
  s.description = %q{A utility for working with MLB Gameday XML data.}
13
13
 
@@ -16,10 +16,9 @@ Gem::Specification.new do |s|
16
16
  s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
17
17
  s.require_paths = ["lib"]
18
18
 
19
- s.add_dependency "nokogiri"
20
- s.add_dependency "httparty"
21
- s.add_dependency "activerecord"
22
- s.add_development_dependency "minitest"
23
- s.add_development_dependency "supermodel"
24
- s.add_development_dependency "sqlite3-ruby"
25
- end
19
+ s.add_dependency 'nokogiri'
20
+ s.add_dependency 'httparty'
21
+ s.add_development_dependency 'rspec'
22
+ s.add_development_dependency 'bourne'
23
+ s.add_development_dependency 'rake'
24
+ end
@@ -3,26 +3,26 @@ require "bundler/setup"
3
3
 
4
4
  require 'httparty'
5
5
  require 'nokogiri'
6
- require 'pathname'
7
6
  require 'fileutils'
8
- require 'active_record'
9
7
  require 'date'
10
8
 
11
9
  module Greenmonster
10
+ SPORT_CODES = %w(aaa aax afa afx asx bbc fps hsb ind int jml nae naf nas nat naw oly rok win)
11
+
12
12
  @@games_folder = nil
13
-
13
+
14
14
  ##
15
15
  # Set the default folder to which games
16
16
  # are saved after being downloaded from
17
17
  # the server.
18
18
  #
19
- # Example:
20
- # => Greenmonster.set_games_folder("/Users/geoff/game_data")
19
+ # Example:
20
+ # => Greenmonster.set_games_folder("/Users/geoff/game_data")
21
21
  #
22
22
  # Arguments:
23
- # location: (String)
23
+ # location: (String)
24
24
  #
25
-
25
+
26
26
  def self.set_games_folder(location)
27
27
  @@games_folder = Pathname.new(location)
28
28
  end
@@ -31,57 +31,61 @@ module Greenmonster
31
31
  # Return the default games folder location
32
32
  #
33
33
  # Example:
34
- # >> Greenmonster.set_games_folder("/Users/geoff/game_data")
35
- # >> Greenmonster.games_folder
36
- # => #<Pathname:/Users/geoff/game_data>
34
+ # >> Greenmonster.set_games_folder("/Users/geoff/game_data")
35
+ # >> Greenmonster.games_folder
36
+ # => #<Pathname:/Users/geoff/game_data>
37
37
 
38
38
  def self.games_folder
39
39
  @@games_folder
40
40
  end
41
-
42
- def self.format_date_as_folder(date)
43
- date.strftime("year_%Y/month_%m/day_%d")
44
- end
45
-
41
+
46
42
  ##
47
43
  # Walk the dates in a range of dates, and execute whatever
48
- # methods on the date and argument set specified. Used when
44
+ # methods on the date and argument set specified. Used when
49
45
  # processing games and players.
50
46
  #
51
47
  #
52
-
53
- def self.traverse_dates(range = (Date.today..Date.today), args = {})
54
- range.each do |day|
55
- yield day,args
56
- end
48
+
49
+ def self.traverse_dates(range, args)
50
+ range.each { |day| yield day,args }
57
51
  end
58
-
52
+
59
53
  ##
60
54
  # Walk the game folders in a range of dates, and execute whatever
61
- # methods on the game folder and argument set specified. Used when
55
+ # methods on the game folder and argument set specified. Used when
62
56
  # processing games and players.
63
57
  #
64
58
  #
65
-
66
- def self.traverse_folders_for_date(day,args = {})
67
- begin
68
- folders = Dir.entries(Pathname.new(args[:games_folder] || @@games_folder) + (args[:sport_code] || 'mlb') + self.format_date_as_folder(day))
69
- rescue StandardError => boom
70
- puts "No files for #{day.to_s}."
71
- end
72
-
73
- unless folders.nil?
74
- folders.sort.each do |gdir|
75
- if gdir[0,3] == 'gid' and gdir[-4,4] != "_bak"
76
- yield gdir, args
77
- end
78
- end
79
- end
59
+
60
+ def self.traverse_folders_for_date(date, args)
61
+ game_folders_for_date_and_sport_code(date, args[:sport_code]).each do |gdir|
62
+ yield gdir, args
63
+ end
80
64
  end
65
+
66
+
67
+ private
68
+
69
+ def self.game_folders_for_date_and_sport_code(date, sport_code)
70
+ folders_for_date_and_sport_code(date, sport_code).select do |folder|
71
+ folder[0,3] == 'gid' && folder[-4,4] != '_bak'
72
+ end
73
+ end
74
+
75
+ def self.folders_for_date_and_sport_code(date, sport_code)
76
+ begin
77
+ Dir.entries(
78
+ Pathname.new(games_folder) + sport_code + format_date_as_folder(date)
79
+ ).sort
80
+ rescue Errno::ENOENT
81
+ []
82
+ end
83
+ end
84
+
85
+ def self.format_date_as_folder(date)
86
+ date.strftime("year_%Y/month_%m/day_%d")
87
+ end
88
+
81
89
  end
82
90
 
83
- require 'greenmonster/spider'
84
- require 'greenmonster/model_extensions/player'
85
- require 'greenmonster/model_extensions/mlb_game'
86
- require 'greenmonster/model_extensions/mlb_probable_pitcher'
87
- require 'greenmonster/parser'
91
+ require 'greenmonster/spider'
@@ -1,107 +1,43 @@
1
1
  # The Gameday XML Spider utility
2
- module Greenmonster::Spider
2
+ class Greenmonster::Spider
3
3
  include HTTParty
4
-
4
+
5
5
  ##
6
- # Pull Gameday XML files for a given game, specified by the game
6
+ # Pull Gameday XML files for a given game, specified by the game
7
7
  # ID. If date and sport code are not specified as options, these
8
8
  # values are guessed from the game ID string using the home team's
9
9
  # sport code and the date from the scheduled date values in the game
10
10
  # ID.
11
11
  #
12
12
  # Example:
13
- # >> Gameday::Spider.pull_game('',{:games_folder => })
14
-
15
- def self.pull_game(game_id,args = {})
16
- args = {
17
- :date => args[:date] || Date.new(game_id[4,4].to_i, game_id[9,2].to_i, game_id[12,2].to_i),
18
- :sport_code => args[:sport_code] || game_id[25,3],
19
- :print_games => true,
20
- :games_folder => Greenmonster.games_folder
21
- }.merge(args)
22
- raise "Games folder location required." if args[:games_folder].nil?
23
-
24
- args[:games_folder] = Pathname.new(args[:games_folder])
25
-
26
- puts game_id if args[:print_games]
27
-
28
- paths = {
29
- :localGameFolder => args[:games_folder] + args[:sport_code] + format_date_as_folder(args[:date]) + game_id,
30
- :mlbGameFolder => "#{gameday_league_and_date_url(args)}/#{game_id}/"
31
- }
32
-
33
- FileUtils.mkdir_p paths[:localGameFolder] + 'inning'
34
-
35
- begin
36
- # Always copy linescore first. If we can't get this
37
- # data, all other game data is useless.
38
- copy_gameday_xml('linescore.xml',paths)
39
-
40
- if args[:date].year > 2007
41
- copy_gameday_xml('inning_all.xml',paths)
42
- copy_gameday_xml('inning_hit.xml',paths)
43
- else
44
- # Iterate through the inning files, but skip inning
45
- # files numbered 0 (some bad spring training data).
46
- # Necessary for games prior to 2008 because there is
47
- # no inning_all.xml file in older games.
48
- (Nokogiri::XML(self.get("#{paths[:mlbGameFolder]}/inning/").body).search('a')).each do |ic|
49
- copy_gameday_xml(ic.attribute('href').value,paths) if ic.attribute('href').value[-3,3] == "xml" unless ic.attribute('href').value[-6,6] == "_0.xml" or ic.attribute('href').value.include?('Score')
50
- end
51
- end
13
+ # >> Gameday::Spider.pull_game('',{:games_folder => })
52
14
 
53
- # Copy base data files
54
- # (if inning data wasn't there, this gets skipped)
55
- ['boxscore.xml','eventLog.xml','players.xml'].each do |file|
56
- copy_gameday_xml(file,paths)
57
- end
58
- rescue StandardError => bang
59
- puts "Unable to download some data for #{game_id}"
15
+ def pull_game(game_id, date)
16
+ make_folders_for_game(game_id, date)
17
+
18
+ %w(boxscore.xml game_events.xml inning_all.xml linescore.xml players.xml).each do |file_name|
19
+ copy_gameday_xml(game_id, date, file_name)
60
20
  end
61
-
62
- return game_id
63
21
  end
64
-
22
+
65
23
  ##
66
- # Pull Gameday XML files for a given date. Default options for
67
- # the spider are to pull games with sport_code of 'mlb' (games
68
- # played by MLB games rather than MiLB teams or foreign teams)
69
- # and to pull games on the current date.
24
+ # Pull Gameday XML files for a given date. Default options for
25
+ # the spider are to pull games with sport_code of 'mlb' (games
26
+ # played by MLB games rather than MiLB teams or foreign teams)
27
+ # and to pull games on the current date.
70
28
  #
71
29
  # Example:
72
- # # Pull games from July 4, 2011
73
- # >> Gameday::Spider.pull_day({:date => Date.new(2011,7,1), :games_folder => '/Users/geoff/games'})
30
+ # # Pull games from July 4, 2011
31
+ # >> Gameday::Spider.pull_day({:date => Date.new(2011,7,1), :games_folder => '/Users/geoff/games'})
74
32
  #
75
33
  # Arguments:
76
- # args: (Hash)
34
+ # args: (Hash)
77
35
  #
78
-
79
- def self.pull_day(args = {})
80
- args = {
81
- :date => Date.today,
82
- :sport_code => 'mlb',
83
- }.merge(args)
84
-
85
- # If we want all sport codes, set up the array.
86
- if args[:all_sport_codes]
87
- args[:sport_codes] = %w(aaa aax afa afx asx bbc fps hsb ind int jml nae naf nas nat naw oly rok win)
88
- else
89
- args[:sport_codes] = [args[:sport_code] || 'mlb'].flatten
90
- end
91
-
92
- # Iterate through every hyperlink on the page.
93
- # These links represent the individual game folders
94
- # for each date. Reject any links that aren't to game
95
- # folders or that are to what look like backup game
96
- # folders.
97
- args[:sport_codes].each do |sport_code|
98
- args[:sport_code] = sport_code
99
- (Nokogiri::XML(self.get(gameday_league_and_date_url(args)))/"a").reject{|l| l.attribute('href').value[0,4] != "gid_" or l.attribute('href').value[-5,4] == "_bak"}.each do |e|
100
- self.pull_game(e.attribute('href').value.gsub('/',''),args)
101
- end
102
- end
103
36
 
104
- return args[:sport_code]
37
+ def pull_day(date, sport_code)
38
+ game_links_on_gameday_date_page(date, sport_code).each do |game_id|
39
+ pull_game(game_id, date)
40
+ end
105
41
  end
106
42
 
107
43
  ##
@@ -109,66 +45,95 @@ module Greenmonster::Spider
109
45
  # passes arguments like games_folder location on to Spider.pull.
110
46
  #
111
47
  # Example:
112
- # # Pull all games in MLB in July 2011
113
- # >> Gameday::Spider.pull_days(Date.new(2011,7,1)..Date.new(2011,7,31), {:games_folder => '/Users/geoff/games'})
48
+ # # Pull all games in MLB in July 2011
49
+ # >> Gameday::Spider.pull_days(Date.new(2011,7,1)..Date.new(2011,7,31), {:games_folder => '/Users/geoff/games'})
114
50
  #
115
51
  # Arguments:
116
- # range: (Range)
117
- # args: (Hash)
118
-
119
- def self.pull_days(range,args = {})
120
- range.each {|day| self.pull_day(args.merge({:date => day}))}
121
- end
122
-
123
- private
124
- ##
125
- # Return the Gameday URL for a given sport_code
126
- # and date. Argument must include :date, but
127
- # :sport_code will be assumed to be 'mlb' if not
128
- # specified.
129
- #
130
- # Arguments:
131
- # args: (Hash)
132
- #
133
- def self.gameday_league_and_date_url(args = {})
134
- raise "Date required." unless args[:date]
135
- args[:sport_code] ||= 'mlb'
136
-
137
- "http://gd2.mlb.com/components/game/#{args[:sport_code]}/#{format_date_as_folder(args[:date])}"
52
+ # range: (Range)
53
+ # args: (Hash)
54
+
55
+ def pull_days(range, sport_code)
56
+ range.each { |date| self.pull_day(date, sport_code) }
57
+ end
58
+
59
+ private
60
+
61
+ def get_gameday_date_page(date, sport_code)
62
+ self.class.get(gameday_date_and_sport_code_url(date, sport_code))
63
+ end
64
+
65
+ def links_on_gameday_date_page(date, sport_code)
66
+ Nokogiri::XML(get_gameday_date_page(date, sport_code)).search('a').map do |a|
67
+ a.attribute('href').value
138
68
  end
139
-
140
- ##
141
- # Copy XML files from the Gameday severs to your
142
- # local machine for parsing and analysis. This method
143
- # is used by Spider pull class methods to put files on
144
- # your local machine in a similar layout to the one used by
145
- # Gameday. The paths argument gets built by Spider.pull
146
- # during the pull process.
147
- #
148
- # Arguments:
149
- # file_name: (String)
150
- # paths: (Hash)
151
-
152
- def self.copy_gameday_xml (file_name,paths)
153
- download = self.get(paths[:mlbGameFolder] + "#{file_name =~ /inning/ ? 'inning/' : ''}" + file_name).body
154
- unless download.include?('404 Not Found')
155
- open(paths[:localGameFolder] + (file_name =~ /inning/ ? 'inning/' : '') + file_name, 'w') do |file|
156
- file.write(download)
157
- end
158
- end
69
+ end
70
+
71
+ def game_links_on_gameday_date_page(date, sport_code)
72
+ links_on_gameday_date_page(date, sport_code).select do |link|
73
+ link[0,4] == "gid_" && link[-5,4] != "_bak"
74
+ end
75
+ end
76
+
77
+ def gameday_url_root
78
+ "http://gd2.mlb.com/components/game/"
79
+ end
80
+
81
+ def gameday_date_and_sport_code_url(date, sport_code)
82
+ "#{gameday_url_root}#{sport_code}/#{format_date_as_folder(date)}"
83
+ end
84
+
85
+ def gameday_game_url(game_id, date)
86
+ gameday_url_root + remote_game_path(game_id, date)
87
+ end
88
+
89
+ def remote_game_path(game_id, date)
90
+ "#{home_sport_code_from_game_id(game_id)}/#{format_date_as_folder(date)}/#{game_id}"
91
+ end
92
+
93
+ def home_sport_code_from_game_id(game_id)
94
+ game_id[-5,3]
95
+ end
96
+
97
+ def inning_prefix(file_name)
98
+ if file_name =~ /inning/
99
+ 'inning/'
100
+ else
101
+ ''
159
102
  end
160
-
161
- ##
162
- # Output a folder format similar to the one used by Gameday.
163
- #
164
- # Example:
165
- # >> Spider.format_date_as_folder(Date.new(2011,7,4))
166
- # => "year_2011/month_07/day_04"
167
- #
168
- # Arguments:
169
- # date: (Date)
170
-
171
- def self.format_date_as_folder(date)
172
- Greenmonster.format_date_as_folder(date)
103
+ end
104
+
105
+ def remote_file_url(game_id, date, file_name)
106
+ gameday_game_url(game_id, date) + '/' + inning_prefix(file_name) + '/' + file_name
107
+ end
108
+
109
+ def download_gameday_xml(game_id, date, file_name)
110
+ self.class.get(remote_file_url(game_id, date, file_name)).body.force_encoding("ISO-8859-1").encode("UTF-8")
111
+ end
112
+
113
+ def local_game_path(game_id, date)
114
+ Pathname.new(
115
+ Greenmonster.games_folder +
116
+ home_sport_code_from_game_id(game_id) +
117
+ format_date_as_folder(date) +
118
+ game_id
119
+ )
120
+ end
121
+
122
+ def copy_gameday_xml(game_id, date, file_name)
123
+ download = download_gameday_xml(game_id, date, file_name)
124
+
125
+ unless download.include?('404 Not Found')
126
+ open(local_game_path(game_id, date) + inning_prefix(file_name) + file_name, 'w') do |file|
127
+ file.write(download)
128
+ end
173
129
  end
130
+ end
131
+
132
+ def format_date_as_folder(date)
133
+ Greenmonster.format_date_as_folder(date)
134
+ end
135
+
136
+ def make_folders_for_game(game_id, date)
137
+ FileUtils.mkdir_p(local_game_path(game_id, date) + 'inning')
138
+ end
174
139
  end