footballdata-12xpert 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 53d9f93e00ee1b85e5848b4dee48988b16e659c3
4
+ data.tar.gz: 60068169b3aaab76e84d4875a00db617e4e6211d
5
+ SHA512:
6
+ metadata.gz: fedfd89358237ef25ba0083c8d1f21c7ec13df976e70ea8460f9682ba5966b718dcfbbf6d4b3b92d1fb9310dbb3739b9f62972f8fce5d43d3c2afcfdaf98e2b0
7
+ data.tar.gz: 36691c7a8a0e4ebfd70aaeecba2d8e363977a9ced0ebb05d1550b5cf3b33bb3e93dc31e76ffb5d4f512165a75149c70edb9f0c76e96277c31ee034199bfd0c73
@@ -0,0 +1,3 @@
1
+ ### 0.0.1 / 2019-08-19
2
+
3
+ * Everything is new. First release.
@@ -0,0 +1,13 @@
1
+ CHANGELOG.md
2
+ Manifest.txt
3
+ README.md
4
+ Rakefile
5
+ lib/footballdata-12xpert.rb
6
+ lib/footballdata-12xpert/config.rb
7
+ lib/footballdata-12xpert/convert.rb
8
+ lib/footballdata-12xpert/download.rb
9
+ lib/footballdata-12xpert/version.rb
10
+ lib/footballdata/12xpert.rb
11
+ test/helper.rb
12
+ test/test_download.rb
13
+ test/test_import.rb
@@ -0,0 +1,306 @@
1
+ # footballdata-12xpert - download, convert & import 22+ top football leagues from 25 seasons back to 1993/94 from Joseph Buchdahl (12Xpert)'s Football Data website (football-data.co.uk) up and running since 2001 (and updated twice a week)
2
+
3
+
4
+ * home :: [github.com/sportdb/sport.db.sources](https://github.com/sportdb/sport.db.sources)
5
+ * bugs :: [github.com/sportdb/sport.db.sources/issues](https://github.com/sportdb/sport.db.sources/issues)
6
+ * gem :: [rubygems.org/gems/footballdata-12xpert](https://rubygems.org/gems/footballdata-12xpert)
7
+ * rdoc :: [rubydoc.info/gems/footballdata-12xpert](http://rubydoc.info/gems/footballdata-12xpert)
8
+ * forum :: [opensport](http://groups.google.com/group/opensport)
9
+
10
+
11
+
12
+ ## What's Joseph Buchdahl's Football Data?
13
+
14
+ [Joseph Buchdahl (12Xpert)](https://twitter.com/12Xpert) has been publishing football data
15
+ at the [`football-data.co.uk`](https://www.football-data.co.uk/data.php) website
16
+ in the world's most popular tabular data interchange format in text, that is,
17
+ comma-separated value (.csv) records for (bulk) download (and offline usage) since 2001 (!).
18
+
19
+ The main top football leagues include:
20
+
21
+ - England (`E0`, `E1`, `E2`, `E3` & `EC`) - Premiership & Divs 1, 2, 3 & Conference
22
+ - Scotland (`SC0`, `SC1`, `SC2` & `SC3`) - Premiership & Divs 1, 2 & 3
23
+ - Germany (`D1` & `D2`) - Bundesligas 1 & 2
24
+ - Italy (`I1` & `I2`) - Serie A & B
25
+ - Spain (`SP1` & `SP2`) - La Liga (Primera & Segunda)
26
+ - France (`F1` & `F2`) - Le Championnat & Division 2
27
+ - Netherlands (`N1`) - Eredivisie
28
+ - Belgium (`B1`) - Pro League
29
+ - Portugal (`P1`) - Liga I
30
+ - Turkey (`T1`) - Ligi 1
31
+ - Greece (`G1`) - Ethniki Katigoria
32
+
33
+ And the extra leagues include:
34
+
35
+ - Argentina (`ARG`) - Primera Division
36
+ - Austria (`AUT`) - Bundesliga
37
+ - Brazil (`BRA`) - Serie A
38
+ - China (`CHN`) - Super League
39
+ - Denmark (`DNK`) - Superliga
40
+ - Finland (`FIN`) - Veikkausliiga
41
+ - Ireland (`IRL`) - Premier Division
42
+ - Japan (`JPN`) - J-League
43
+ - Mexico (`MEX`) - Liga MX
44
+ - Norway (`NOR`) - Eliteserien
45
+ - Poland (`POL`)- Ekstraklasa
46
+ - Romania (`ROU`) - Liga 1
47
+ - Russia (`RUS`) - Premier League
48
+ - Sweden (`SWE`) - Allsvenskan
49
+ - Switzerland (`SWZ`) - Super League
50
+ - USA (`USA`) - Major League Soccer (MLS)
51
+
52
+
53
+ The top football leagues include 25 seasons back to 1993/94
54
+ and get at least updated twice weekly
55
+ (Sunday nights and Wednesday nights).
56
+
57
+
58
+ ## Usage
59
+
60
+ Let's download all datasets (about 570+) for offline usage into the
61
+ default web cache directory (that is, `~/.cache/www.football-data.co.uk`):
62
+
63
+
64
+ ``` ruby
65
+ require 'footballdata/12xpert'
66
+
67
+ Footballdata12xpert.download
68
+ ```
69
+
70
+ Note: You can use `Footballdata12Xpert`,
71
+ `Footballdata_12xpert`, or `Footballdata_12Xpert`
72
+ as alternate alias names for `Footballdata12xpert`.
73
+
74
+
75
+
76
+ Stand back ten feet. Resulting in:
77
+
78
+ ```
79
+ ~/.cache/www.football-data.co.uk
80
+ │ ARG.csv
81
+ │ AUT.csv
82
+ │ BRA.csv
83
+ │ CHN.csv
84
+ │ DNK.csv
85
+ │ FIN.csv
86
+ │ IRL.csv
87
+ │ JPN.csv
88
+ │ MEX.csv
89
+ │ NOR.csv
90
+ │ POL.csv
91
+ │ ROU.csv
92
+ │ RUS.csv
93
+ │ SWE.csv
94
+ │ SWZ.csv
95
+ │ USA.csv
96
+
97
+ ├───9394
98
+ │ D1.csv
99
+ │ D2.csv
100
+ │ E0.csv
101
+ │ E1.csv
102
+ │ E2.csv
103
+ │ E3.csv
104
+ │ F1.csv
105
+ │ I1.csv
106
+ │ N1.csv
107
+ │ SP1.csv
108
+
109
+ ├───9495
110
+ │ D1.csv
111
+ │ D2.csv
112
+ │ E0.csv
113
+ │ E1.csv
114
+ │ E2.csv
115
+ │ E3.csv
116
+ │ F1.csv
117
+ │ G1.csv
118
+ │ I1.csv
119
+ │ N1.csv
120
+ │ P1.csv
121
+ │ SC0.csv
122
+ │ SC1.csv
123
+ │ SP1.csv
124
+ │ T1.csv
125
+ ...
126
+ ├───1819
127
+ │ B1.csv
128
+ │ D1.csv
129
+ │ D2.csv
130
+ │ E0.csv
131
+ │ E1.csv
132
+ │ E2.csv
133
+ │ E3.csv
134
+ │ EC.csv
135
+ │ F1.csv
136
+ │ F2.csv
137
+ │ G1.csv
138
+ │ I1.csv
139
+ │ I2.csv
140
+ │ N1.csv
141
+ │ P1.csv
142
+ │ SC0.csv
143
+ │ SC1.csv
144
+ │ SC2.csv
145
+ │ SC3.csv
146
+ │ SP1.csv
147
+ │ SP2.csv
148
+ │ T1.csv
149
+
150
+ └───1920
151
+ B1.csv
152
+ D1.csv
153
+ D2.csv
154
+ E0.csv
155
+ E1.csv
156
+ E2.csv
157
+ E3.csv
158
+ EC.csv
159
+ F1.csv
160
+ F2.csv
161
+ N1.csv
162
+ P1.csv
163
+ SC0.csv
164
+ SC1.csv
165
+ SC2.csv
166
+ SC3.csv
167
+ SP1.csv
168
+ SP2.csv
169
+ T1.csv
170
+ ```
171
+
172
+ The football datasets come in two flavors / formats.
173
+ The main leagues use season-by-season datafiles.
174
+ For example, `E0.csv`, `E1.csv`, `E2.csv`, `E3.csv` & `E4.csv` in the `1920`
175
+ season directory hold the matches for the English Premiership & Divs 1, 2, 3 & Conference;
176
+ `D1.csv` & `D2.csv` for the Bundesligas 1 & 2 and so on
177
+ for the 2019/20 season.
178
+
179
+ The extra leagues use an all-seasons-in-one datafile.
180
+ For example, `ARG.csv`
181
+ holds all seasons of the Argentinian Primera Division;
182
+ `AUT.csv` for the Austrian Bundesliga and so on.
183
+
184
+
185
+ Note: The datasets character encoding gets converted from
186
+ [Windows-1252 (8-bit)](https://en.wikipedia.org/wiki/Windows-1252) to UTF-8 (Unicode multi-byte).
187
+
188
+
189
+ Less is More?
190
+
191
+ You can download datasets for selected countries only. Pass in
192
+ the country keys. Let's download only England (`eng`)'s leagues:
193
+
194
+ ``` ruby
195
+ Footballdata.download( 'eng' )
196
+ ```
197
+
198
+ Or let's download only the top five leagues, that is,
199
+ England (`eng`), Spain (`es`), Germany (`de`), France (`fr`)
200
+ and Italy (`it`):
201
+
202
+ ``` ruby
203
+ Footballdata.download( 'eng', 'es', 'de', 'fr', 'it' )
204
+ ```
205
+
206
+
207
+ Now what? Let's import all football datasets from the web cache
208
+ into an SQL database.
209
+
210
+
211
+ ``` ruby
212
+ SportDb.connect( adapter: 'sqlite3',
213
+ database: './football.db' )
214
+
215
+ SportDb.create_all ## build database schema / tables
216
+
217
+
218
+ Footballdata.import
219
+ ```
220
+
221
+ Note: Depending on your computing processing power the import might take
222
+ 10+ minutes.
223
+
224
+
225
+ Done. Let's try some database (SQL) queries (using the sport.db ActiveRecord models):
226
+
227
+ ``` ruby
228
+ ## ActiveRecord model (convenience) shortcuts
229
+ Team = SportDb::Model::Team
230
+ Game = SportDb::Model::Game
231
+ League = SportDb::Model::League
232
+ Event = SportDb::Model::Event
233
+
234
+
235
+ ## Let's query for some stats - How many teams? How many games / matches? etc.
236
+
237
+ puts Team.count #=> 1143
238
+ # SELECT COUNT(*) FROM teams
239
+
240
+ puts Game.count #=> 227_142
241
+ # SELECT COUNT(*) FROM games
242
+
243
+ puts League.count #=> 38
244
+ # SELECT COUNT(*) FROM leagues
245
+ ```
246
+
247
+ Note: See the [SUMMARY.md](SUMMARY.md) page for a list of all 1000+ (canonical)
248
+ club names by country.
249
+
250
+ ``` ruby
251
+ ## Let's query for the Real Madrid football club from Spain
252
+
253
+ madrid = Team.find_by( title: 'Real Madrid' )
254
+ # SELECT * FROM teams WHERE title = 'Real Madrid' LIMIT 1
255
+
256
+ puts madrid.games.count #=> 1023
257
+ # SELECT COUNT(*) FROM games WHERE (team1_id = 380 or team2_id = 380)
258
+ g = madrid.games.first
259
+ # SELECT * FROM "games" WHERE (team1_id = 380 or team2_id = 380) LIMIT 1
260
+
261
+ puts g.team1.title #=> CA Osasuna
262
+ puts g.team2.title #=> Real Madrid
263
+ puts g.score_str #=> 1 - 4
264
+
265
+
266
+ ## Or let's query for the Liverpool football club from England
267
+
268
+ liverpool = Team.find_by( title: 'Liverpool FC' )
269
+
270
+ puts liverpool.games.count #=> 1025
271
+
272
+ g = liverpool.games.first
273
+ puts g.team1.title #=> Liverpool FC
274
+ puts g.team2.title #=> Sheffield Wednesday FC
275
+ puts g.score_str #=> 2 - 0
276
+
277
+
278
+ ## Let's try the English Premier League 2019/20
279
+
280
+ pl = Event.find_by( key: 'eng.1.2019/20' )
281
+
282
+ puts pl.games.count #=> 288
283
+
284
+ g = pl.games.first
285
+ puts g.team1.title #=> Liverpool FC
286
+ puts g.team2.title #=> Norwich City FC
287
+ puts g.score_str #=> 4 - 1
288
+
289
+ # and so on
290
+ ```
291
+
292
+ That's it. Enjoy the beautiful game.
293
+
294
+
295
+
296
+ ## License
297
+
298
+ The `sportdb-source-footballdata` scripts are dedicated to the public domain.
299
+ Use it as you please with no restrictions whatsoever.
300
+
301
+
302
+ ## Questions? Comments?
303
+
304
+ Send them along to the
305
+ [Open Sports & Friends Forum/Mailing List](http://groups.google.com/group/opensport).
306
+ Thanks!
@@ -0,0 +1,31 @@
1
+ require 'hoe'
2
+ require './lib/footballdata-12xpert/version.rb'
3
+
4
+ Hoe.spec 'footballdata-12xpert' do
5
+
6
+ self.version = Footballdata12xpert::VERSION
7
+
8
+ self.summary = "footballdata-12xpert - download, convert & import 22+ top football leagues from 25 seasons back to 1993/94 from Joseph Buchdahl (12Xpert)'s Football Data website (football-data.co.uk) up and running since 2001 (and updated twice a week)"
9
+ self.description = summary
10
+
11
+ self.urls = { home: 'https://github.com/sportdb/sport.db.sources'}
12
+
13
+ self.author = 'Gerald Bauer'
14
+ self.email = 'opensport@googlegroups.com'
15
+
16
+ # switch extension to .markdown for gihub formatting
17
+ self.readme_file = 'README.md'
18
+ self.history_file = 'CHANGELOG.md'
19
+
20
+ self.licenses = ['Public Domain']
21
+
22
+ self.extra_deps = [
23
+ ['webget', '>= 0.2.1'],
24
+ ['footballdb-clubs', '>= 2020.9.15'],
25
+ ['sportdb-importers', '>= 1.1.1'],
26
+ ]
27
+
28
+ self.spec_extras = {
29
+ required_ruby_version: '>= 2.2.2'
30
+ }
31
+ end
@@ -0,0 +1,168 @@
1
+
2
+ ## 3rd party libs / gems
3
+ require 'webget'
4
+
5
+ ## sportdb libs / gems
6
+ require 'sportdb/importers'
7
+
8
+
9
+
10
+ ###
11
+ # our own code
12
+ require 'footballdata-12xpert/version' # let version always go first
13
+
14
+ require 'footballdata-12xpert/config'
15
+ require 'footballdata-12xpert/download'
16
+ require 'footballdata-12xpert/convert'
17
+
18
+ ###
19
+ ## add alternate aliases - why? why not?
20
+ Footballdata12Xpert = Footballdata12xpert
21
+ Footballdata_12xpert = Footballdata12xpert
22
+ Footballdata_12Xpert = Footballdata12xpert
23
+
24
+
25
+
26
+
27
+ module Footballdata12xpert
28
+
29
+
30
+ def self.import( *args, dir: './dl' )
31
+
32
+ country_keys = args ## countries to include / fetch - optinal
33
+
34
+ FOOTBALLDATA_SOURCES.each do |country_key, country_sources|
35
+ if country_keys.empty? || country_keys.include?( country_key )
36
+ Footballdata.import_season_by_season( country_key, country_sources, dir: dir )
37
+ else
38
+ ## skipping country
39
+ end
40
+ end
41
+
42
+ FOOTBALLDATA_SOURCES_II.each do |country_key, country_basename|
43
+ if country_keys.empty? || country_keys.include?( country_key )
44
+ Footballdata.import_all_seasons( country_key, country_basename, dir: dir )
45
+ else
46
+ ## skipping country
47
+ end
48
+ end
49
+ end # method import
50
+
51
+ class << self
52
+ alias_method :load, :import
53
+ end
54
+
55
+
56
+
57
+ def self.import_season_by_season( country_key, sources, dir: )
58
+
59
+ ## todo/check: make sure timezones entry for country_key exists!!! what results with nil/24.0 ??
60
+ fix_date_converter = ->(row) { fix_date( row, FOOTBALLDATA_TIMEZONES[country_key]/24.0 ) }
61
+
62
+ sources.each do |rec|
63
+ season_key = rec[0] ## note: dirname is season_key e.g. 2011-12 etc.
64
+ basenames = rec[1] ## e.g. E1,E2,etc.
65
+
66
+ basenames.each do |basename|
67
+
68
+ path = "#{dir}/#{season_key}/#{basename}.csv"
69
+
70
+ league_key = FOOTBALLDATA_LEAGUES[basename] ## e.g.: eng.1, fr.1, fr.2 etc.
71
+ if league_key.nil?
72
+ puts "** !!! ERROR !!! league key missing for >#{basename}<; sorry - please add"
73
+ exit 1
74
+ end
75
+
76
+ country, league = find_or_create_country_and_league( league_key )
77
+
78
+ season = SportDb::Importer::Season.find_or_create_builtin( season_key )
79
+
80
+ puts "path: #{path}"
81
+
82
+ matches = CsvMatchReader.read( path, converters: fix_date_converter )
83
+
84
+ update_matches_txt( matches,
85
+ league: league,
86
+ season: season )
87
+ end
88
+ end
89
+ end # method import_season_by_season
90
+
91
+
92
+
93
+ def self.import_all_seasons( country_key, basename, dir: )
94
+
95
+ col = 'Season'
96
+ path = "#{dir}/#{basename}.csv"
97
+
98
+ season_keys = CsvMatchSplitter.find_seasons( path, col: col )
99
+ pp season_keys
100
+
101
+ ## note: assume always first level/tier league for now
102
+ league_key = "#{country_key}.1"
103
+ country, league = find_or_create_country_and_league( league_key )
104
+
105
+ ## todo/check: make sure timezones entry for country_key exists!!! what results with nil/24.0 ??
106
+ fix_date_converter = ->(row) { fix_date( row, FOOTBALLDATA_TIMEZONES[country_key]/24.0 ) }
107
+
108
+ season_keys.each do |season_key|
109
+ season = SportDb::Importer::Season.find_or_create_builtin( season_key )
110
+
111
+ matches = CsvMatchReader.read( path, filters: { col => season_key },
112
+ converters: fix_date_converter )
113
+
114
+ pp matches[0..2]
115
+ pp matches.size
116
+
117
+ update_matches_txt( matches,
118
+ league: league,
119
+ season: season )
120
+ end
121
+ end # method import_all_seasons
122
+
123
+
124
+ ###
125
+ ## helper for country and league db record
126
+ def self.find_or_create_country_and_league( league_key )
127
+ country_key, level = league_key.split( '.' )
128
+ country = SportDb::Importer::Country.find_or_create_builtin!( country_key )
129
+
130
+ league_auto_name = "#{country.name} #{level}" ## "fallback" auto-generated league name
131
+ pp league_auto_name
132
+ league = SportDb::Importer::League.find_or_create( league_key,
133
+ name: league_auto_name,
134
+ country_id: country.id )
135
+
136
+ [country, league]
137
+ end
138
+
139
+ ## helper to fix dates to use local timezone (and not utc/london time)
140
+ def self.fix_date( row, offset )
141
+ return row if row['Time'].nil? ## note: time (column) required for fix
142
+
143
+ col = row['Date']
144
+ if col =~ /^\d{2}\/\d{2}\/\d{4}$/
145
+ date_fmt = '%d/%m/%Y' # e.g. 17/08/2002
146
+ elsif col =~ /^\d{2}\/\d{2}\/\d{2}$/
147
+ date_fmt = '%d/%m/%y' # e.g. 17/08/02
148
+ else
149
+ puts "*** !!! wrong (unknown) date format >>#{col}<<; cannot continue; fix it; sorry"
150
+ ## todo/fix: add to errors/warns list - why? why not?
151
+ exit 1
152
+ end
153
+
154
+ date = DateTime.strptime( "#{row['Date']} #{row['Time']}", "#{date_fmt} %H:%M" )
155
+ date = date + offset
156
+
157
+ row['Date'] = date.strftime( date_fmt ) ## overwrite "old"
158
+ row['Time'] = date.strftime( '%H:%M' )
159
+ row ## return row for possible pipelining - why? why not?
160
+ end
161
+
162
+
163
+
164
+ end ## module Footballdata12xpert
165
+
166
+
167
+
168
+ puts Footballdata12xpert.banner # say hello