remote_table 1.3.0 → 1.4.0

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore CHANGED
@@ -5,3 +5,5 @@ rdoc
5
5
  pkg
6
6
  Gemfile.lock
7
7
  *.gem
8
+ .yardoc
9
+ doc/
data/CHANGELOG ADDED
@@ -0,0 +1,19 @@
1
+ 1.4.0 / 2012-04-12
2
+
3
+ * Enhancements
4
+
5
+ * DRY up spawning code with UnixUtils
6
+ * Switch to minitest
7
+ * Stop defining MyCSV globally
8
+ * Test on MRI 1.8.7, MRI 1.9.3, and JRuby 1.6.7
9
+ * Warn users about ODS not working on JRuby
10
+ * Move all warnings to Kernel.warn
11
+ * Start keeping a CHANGELOG!
12
+ * Ensure we clean up temporary files
13
+
14
+ * Bug fixes
15
+
16
+ * Make sure headers (keys) on rows created with Roo are ordered in Ruby 1.8.7
17
+ * Make tests green (for now) by fixing URLs and sometimes :row_xpaths (hello FAA aircraft lookup guide)
18
+ * Use Hash#fetch for default options
19
+ * Don't try to set default_sheet if user doesn't specify a sheet name
data/Gemfile CHANGED
@@ -1,4 +1,10 @@
1
- source "http://rubygems.org"
1
+ source :rubygems
2
2
 
3
3
  # Specify your gem's dependencies in remote_table.gemspec
4
4
  gemspec
5
+
6
+ gem 'errata', '>=0.2.0'
7
+ gem 'minitest'
8
+ gem 'minitest-reporters'
9
+ gem 'rake'
10
+ gem 'yard'
data/README.markdown ADDED
@@ -0,0 +1,440 @@
1
+ # remote_table
2
+
3
+ Open local or remote XLSX, XLS, ODS, CSV and fixed-width files.
4
+
5
+ ## Production usage
6
+
7
+ Used by [the Brighter Planet Reference Data web service](http://data.brighterplanet.com), the [`data_miner` gem](https://github.com/seamusabshere/data_miner), and the [`earth` gem](https://github.com/brighterplanet/earth).
8
+
9
+ ## Example
10
+
11
+ $ irb
12
+ 1.9.3-p0 :001 > require 'remote_table'
13
+ => true
14
+ 1.9.3-p0 :002 > t = RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/98guide6.zip', :filename => '98guide6.csv'
15
+ => #<RemoteTable:0x00000100851d98 @options={:filename=>"98guide6.csv"}, @url="http://www.fueleconomy.gov/FEG/epadata/98guide6.zip">
16
+ 1.9.3-p0 :003 > t.rows.length
17
+ => 806
18
+ 1.9.3-p0 :004 > t.rows.first.length
19
+ => 26
20
+ 1.9.3-p0 :005 > require 'pp'
21
+ => true
22
+ 1.9.3-p0 :006 > pp t[23]
23
+ {"Class"=>"TWO SEATERS",
24
+ "Manufacturer"=>"PORSCHE",
25
+ "carline name"=>"BOXSTER",
26
+ "displ"=>"2.5",
27
+ "cyl"=>"6",
28
+ "trans"=>"Manual(M5)",
29
+ "drv"=>"R",
30
+ "cty"=>"19",
31
+ "hwy"=>"26",
32
+ "cmb"=>"22",
33
+ "ucty"=>"21.2",
34
+ "uhwy"=>"33.9499",
35
+ "ucmb"=>"25.5114",
36
+ "fl"=>"P",
37
+ "G"=>"",
38
+ "T"=>"",
39
+ "S"=>"",
40
+ "2pv"=>"",
41
+ "2lv"=>"",
42
+ "4pv"=>"",
43
+ "4lv"=>"",
44
+ "hpv"=>"",
45
+ "hlv"=>"",
46
+ "fcost"=>"956",
47
+ "eng dscr"=>"",
48
+ "trans dscr"=>""}
49
+
50
+ You get an <code>Array</code> of <code>Hash</code>es with **string keys**. If you set <code>:headers => false</code>, then you get an <code>Array</code> of <code>Array</code>s.
51
+
52
+ ## Supported formats
53
+
54
+ <table>
55
+ <tr>
56
+ <th>Format</th>
57
+ <th>Notes</th>
58
+ <th>Library</th>
59
+ </tr>
60
+ <tr>
61
+ <td>Delimited (CSV, TSV, etc.)</td>
62
+ <td>All <code>RemoteTable::Format::Delimited::FASTERCSV_OPTIONS</code>, for example <code>:col_sep</code>, are passed directly to fastercsv.</td>
63
+ <td>
64
+ <a href="http://fastercsv.rubyforge.org/">fastercsv</a> (1.8);
65
+ <a href="http://www.ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/index.html">stdlib</code></a> (1.9)
66
+ </td>
67
+ </tr>
68
+ <tr>
69
+ <td>Fixed width</td>
70
+ <td>You have to set up a <code>:schema</code>.</td>
71
+ <td><a href="https://github.com/seamusabshere/fixed_width">fixed_width-multibyte</a></td>
72
+ </tr>
73
+ <tr>
74
+ <td>HTML</td>
75
+ <td>See XML.</td>
76
+ <td><a href="http://nokogiri.org/">nokogiri</a></td>
77
+ </tr>
78
+ <tr>
79
+ <td>ODS</td>
80
+ <td></td>
81
+ <td><a href="http://roo.rubyforge.org/">roo</a></td>
82
+ </tr>
83
+ <tr>
84
+ <td>XLS</td>
85
+ <td></td>
86
+ <td><a href="http://roo.rubyforge.org/">roo</a></td>
87
+ </tr>
88
+ <tr>
89
+ <td>XLSX</td>
90
+ <td></td>
91
+ <td><a href="http://roo.rubyforge.org/">roo</a></td>
92
+ </tr>
93
+ <tr>
94
+ <td>XML</td>
95
+ <td>The idea is to set up a <code>:row_[xpath|css]</code> and (optionally) a <code>:column_[xpath|css]</code>.</td>
96
+ <td><a href="http://nokogiri.org/">nokogiri</a></td>
97
+ </tr>
98
+ </table>
99
+
100
+ ## Compression and packing
101
+
102
+ You can directly pick a file out of a remote archive using <code>:filename</code> or use a <code>:glob</code>.
103
+
104
+ * zip
105
+ * tar
106
+ * bz2
107
+ * gz
108
+ * exe (treated as zip)
109
+
110
+ ## Encoding
111
+
112
+ Everything is forced into UTF-8. You can improve the quality of the conversion by specifying the original encoding with `:encoding`
113
+
114
+ * ASCII-8BIT and BINARY are equal
115
+ * ISO-8859-1 and Latin1 are equal
116
+
117
+ ## More examples
118
+
119
+ RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdHRNaVpSUWw2Z2VhN3RUV25yYWdQX2c&output=csv')
120
+
121
+ # aircraft fuel use equations derived from EMEP/EEA and ICAO
122
+ RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdEhYenF3dGt1T0Y1cTdneUNsNjV0dEE&output=csv')
123
+
124
+ # distance classes from the WRI business travel tool and UK DEFRA/DECC GHG Conversion Factors for Company Reporting
125
+ RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdFBKM0xWaUhKVkxDRmdBVkE3VklxY2c&hl=en&gid=0&output=csv')
126
+
127
+ # seat classes used in the WRI GHG Protocol calculation tools
128
+ RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdG9EdmxybG1wdC1iU3JRYXNkMGhvSnc&output=csv')
129
+
130
+ # pure automobile fuels
131
+ RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdE9xTEdueFM2R0diNTgxUlk1QXFSb2c&gid=0&output=csv')
132
+
133
+ # blended automobile fuels
134
+ RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdEswNGIxM0U4U0N1UUppdWw2ejJEX0E&gid=0&output=csv')
135
+
136
+ # A list of hybrid make model years derived from the EPA fuel economy guide
137
+ RemoteTable.new('https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AoQJbWqPrREqdGtzekE4cGNoRGVmdmZMaTNvOWluSnc&output=csv')
138
+
139
+ # BTS aircraft type lookup table
140
+ RemoteTable.new("http://www.transtats.bts.gov/Download_Lookup.asp?Lookup=L_AIRCRAFT_TYPE",
141
+ :errata => { RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdEZ2d3JQMzV5T1o1T3JmVlFyNUZxdEE&output=csv' })
142
+
143
+ # aircraft made by whitelisted manufacturers whose ICAO code starts with 'B' from the FAA
144
+ # for definition of `Aircraft::Guru` and `manufacturer_whitelist?` see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb
145
+ RemoteTable.new("http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-B.htm",
146
+ :encoding => 'windows-1252',
147
+ :row_xpath => '//table/tr[2]/td/table/tr',
148
+ :column_xpath => 'td',
149
+ :errata => { RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdGVBRnhkRGhSaVptSDJ5bXJGbkpUSWc&output=csv', :responder => Aircraft::Guru.new },
150
+ :select => lambda { |record| manufacturer_whitelist? record['Manufacturer'] })
151
+
152
+ # OpenFlights.org airports database
153
+ RemoteTable.new('https://openflights.svn.sourceforge.net/svnroot/openflights/openflights/data/airports.dat',
154
+ :headers => %w{ id name city country_name iata_code icao_code latitude longitude altitude timezone daylight_savings },
155
+ :select => lambda { |record| record['iata_code'].present? },
156
+ :errata => { RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdFc2UzhQYU5PWEQ0N21yWFZGNmc2a3c&gid=0&output=csv', :responder => Airport::Guru.new }) # see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb
157
+
158
+ # T100 flight segment data for #{month.strftime('%B %Y')}
159
+ # for definition of `form_data` and `FlightSegment::Guru` see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/flight_segment/data_miner.rb
160
+ RemoteTable.new('http://www.transtats.bts.gov/DownLoad_Table.asp',
161
+ :form_data => form_data,
162
+ :compression => :zip,
163
+ :glob => '/*.csv',
164
+ :errata => { RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdGxpYU1qWFR3d0syTVMyQVVOaDd0V3c&output=csv', :responder => FlightSegment::Guru.new },
165
+ :select => lambda { |record| record['DEPARTURES_PERFORMED'].to_i > 0 })
166
+
167
+ # 1995 Fuel Economy Guide
168
+ # for definition of `:fuel_economy_guide_b` and `AutomobileMakeModelYearVariant::ParserB` see https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb
169
+ RemoteTable.new("http://www.fueleconomy.gov/FEG/epadata/95mfgui.zip",
170
+ :filename => '95MFGUI.DAT',
171
+ :format => :fixed_width,
172
+ :cut => '13-',
173
+ :schema_name => :fuel_economy_guide_b,
174
+ :select => lambda { |row| row['model'].present? and (row['suppress_code'].blank? or row['suppress_code'].to_f == 0) and row['state_code'] == 'F' },
175
+ :transform => { :class => AutomobileMakeModelYearVariant::ParserB, :year => 1995 },
176
+ :errata => { :url => "https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDkxTElWRVlvUXB3Uy04SDhSYWkzakE&output=csv", :responder => AutomobileMakeModelYearVariant::Guru.new })
177
+
178
+ # 1998 Fuel Economy Guide
179
+ # for definition of `AutomobileMakeModelYearVariant::ParserC` see https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb
180
+ RemoteTable.new('http://www.fueleconomy.gov/FEG/epadata/98guide6.zip',
181
+ :filename => '98guide6.csv',
182
+ :transform => { :class => AutomobileMakeModelYearVariant::ParserC, :year => 1998 },
183
+ :errata => { :url => "https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDkxTElWRVlvUXB3Uy04SDhSYWkzakE&output=csv", :responder => AutomobileMakeModelYearVariant::Guru.new },
184
+ :select => lambda { |row| row['model'].present? })
185
+
186
+ # annual corporate average fuel economy data for domestic and imported vehicle fleets from the NHTSA
187
+ RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdEdXWXB6dkVLWkowLXhYSFVUT01sS2c&hl=en&gid=0&output=csv',
188
+ :errata => { 'url' => 'http://static.brighterplanet.com/science/data/transport/automobiles/make_fleet_years/errata.csv' },
189
+ :select => lambda { |row| row['volume'].to_i > 0 })
190
+
191
+ # total vehicle miles travelled by gasoline passenger cars from the 2010 EPA GHG Inventory
192
+ RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
193
+ :filename => 'Annex Tables/Annex 3/Table A-87.csv',
194
+ :skip => 1,
195
+ :select => lambda { |row| row['Year'].to_i.to_s == row['Year'] })
196
+
197
+ # total vehicle miles travelled from the 2010 EPA GHG Inventory
198
+ RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
199
+ :filename => 'Annex Tables/Annex 3/Table A-87.csv',
200
+ :skip => 1,
201
+ :select => lambda { |row| row['Year'].to_i.to_s == row['Year'] })
202
+
203
+ # total travel distribution from the 2010 EPA GHG Inventory
204
+ RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
205
+ :filename => 'Annex Tables/Annex 3/Table A-93.csv',
206
+ :skip => 1,
207
+ :select => lambda { |row| row['Vehicle Age'].to_i.to_s == row['Vehicle Age'] })
208
+
209
+ # building characteristics from the 2003 EIA Commercial Building Energy Consumption Survey
210
+ RemoteTable.new('http://www.eia.gov/emeu/cbecs/cbecs2003/public_use_2003/data/FILE02.csv',
211
+ :skip => 1,
212
+ :headers => ["PUBID8","REGION8","CENDIV8","SQFT8","SQFTC8","YRCONC8","PBA8","ELUSED8","NGUSED8","FKUSED8","PRUSED8","STUSED8","HWUSED8","ONEACT8","ACT18","ACT28","ACT38","ACT1PCT8","ACT2PCT8","ACT3PCT8","PBAPLUS8","VACANT8","RWSEAT8","PBSEAT8","EDSEAT8","FDSEAT8","HCBED8","NRSBED8","LODGRM8","FACIL8","FEDFAC8","FACACT8","MANIND8","PLANT8","FACDST8","FACDHW8","FACDCW8","FACELC8","BLDPLT8","ADJWT8","STRATUM8","PAIR8"])
213
+
214
+ # 2003 CBECS C17 - Electricity Consumption and Intensity - New England Division
215
+ # for definition of `CbecsEnergyIntensity::NAICS_CODE_SYNTHESIZER` see https://github.com/brighterplanet/earth/blob/master/lib/earth/industry/cbecs_energy_intensity/data_miner.rb
216
+ RemoteTable.new("http://www.eia.gov/emeu/cbecs/cbecs2003/detailed_tables_2003/2003set10/2003excel/C17.xls",
217
+ :headers => false,
218
+ :select => ::Proc.new { |row| CbecsEnergyIntensity::NAICS_CODE_SYNTHESIZER.call(row) },
219
+ :crop => (21..37))
220
+
221
+ # U.S. Census 2002 NAICS code list
222
+ RemoteTable.new('http://www.census.gov/epcd/naics02/naicod02.txt',
223
+ :skip => 4,
224
+ :headers => false,
225
+ :delimiter => ' ')
226
+
227
+ # MECS table 3.2 Total US
228
+ RemoteTable.new("http://205.254.135.24/emeu/mecs/mecs2006/excel/Table3_2.xls",
229
+ :crop => (15..94),
230
+ :headers => ["NAICS Code", "Subsector and Industry", "Total", "BLANK", "Net Electricity", "BLANK", "Residual Fuel Oil", "Distillate Fuel Oil", "Natural Gas", "BLANK", "LPG and NGL", "BLANK", "Coal", "Coke and Breeze", "Other"])
231
+
232
+ # MECS table 6.1 Midwest
233
+ RemoteTable.new("http://205.254.135.24/emeu/mecs/mecs2006/excel/Table6_1.xls",
234
+ :crop => (184..263),
235
+ :headers => ["NAICS Code", "Subsector and Industry", "Consumption per Employee", "Consumption per Dollar of Value Added", "Consumption per Dollar of Value of Shipments"])
236
+
237
+ # U.S. Census Geographic Terms and Definitions
238
+ RemoteTable.new('http://www.census.gov/popest/about/geo/state_geocodes_v2009.txt',
239
+ :skip => 6,
240
+ :headers => %w{ Region Division FIPS Name },
241
+ :select => ::Proc.new { |row| row['Division'].to_i > 0 and row['FIPS'].to_i == 0 })
242
+
243
+ # state census divisions from the U.S. Census
244
+ RemoteTable.new('http://www.census.gov/popest/about/geo/state_geocodes_v2009.txt',
245
+ :skip => 8,
246
+ :headers => ['Region', 'Division', 'State FIPS', 'Name'],
247
+ :select => ::Proc.new { |row| row['State FIPS'].to_i > 0 })
248
+
249
+ # OpenGeoCode.org's Country Codes to Country Names list
250
+ RemoteTable.new('http://opengeocode.org/download/countrynames.txt',
251
+ :format => :delimited,
252
+ :delimiter => ';',
253
+ :headers => false,
254
+ :skip => 22)
255
+
256
+ # heating degree day data from WRI CAIT
257
+ RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDN4MkRTSWtWRjdfazhRdWllTkVSMkE&output=csv',
258
+ :select => Proc.new { |record| record['country'] != 'European Union (27)' },
259
+ :errata => { RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDNSMUtCV0h4cUF4UnBKZlNkczlNbFE&output=csv' })
260
+
261
+ # US average grid loss factor derived eGRID 2007 data
262
+ RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010V1_1_STIE_USGC.xls',
263
+ :sheet => 'USGC',
264
+ :skip => 5)
265
+
266
+ # eGRID 2010 regions and loss factors
267
+ RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010V1_1_STIE_USGC.xls',
268
+ :sheet => 'STIE07',
269
+ :skip => 4,
270
+ :select => lambda { |row| row['eGRID2010 year 2007 file state sequence number'].to_i.between?(1, 51) })
271
+
272
+ # eGRID 2010 subregions and electricity emission factors
273
+ RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010_Version1-1_xls_only.zip',
274
+ :filename => 'eGRID2010V1_1_year07_AGGREGATION.xls',
275
+ :sheet => 'SRL07',
276
+ :skip => 4,
277
+ :select => lambda { |row| row['SEQSRL07'].to_i.between?(1, 26) })
278
+
279
+ # U.S. Census State ANSI Code file
280
+ RemoteTable.new('http://www.census.gov/geo/www/ansi/state.txt',
281
+ :delimiter => '|',
282
+ :select => lambda { |record| record['STATE'].to_i < 60 })
283
+
284
+ # Mapping Hacks zipcode database
285
+ RemoteTable.new('http://mappinghacks.com/data/zipcode.zip',
286
+ :filename => 'zipcode.csv')
287
+
288
+ # zipcode states and eGRID Subregions from the US EPA
289
+ RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/Power_Profiler_Zipcode_Tool_v3-2.xlsx',
290
+ :sheet => 'Zip-subregion')
291
+
292
+ # horse breeds
293
+ RemoteTable.new('http://www.freebase.com/type/exporttypeinstances/base/horses/horse_breed?page=0&filter_mode=type&filter_view=table&show%01p%3D%2Ftype%2Fobject%2Fname%01index=0&show%01p%3D%2Fcommon%2Ftopic%2Fimage%01index=1&show%01p%3D%2Fcommon%2Ftopic%2Farticle%01index=2&sort%01p%3D%2Ftype%2Fobject%2Ftype%01p%3Dlink%01p%3D%2Ftype%2Flink%2Ftimestamp%01index=false&=&exporttype=csv-8')
294
+
295
+ # Brighter Planet's list of cat and dog breeds, genders, and weights
296
+ RemoteTable.new('http://static.brighterplanet.com/science/data/consumables/pets/breed_genders.csv',
297
+ :encoding => 'ISO-8859-1',
298
+ :select => lambda { |row| row['gender'].present? })
299
+
300
+ # residential electricity prices from the EIA
301
+ RemoteTable.new('http://www.eia.doe.gov/cneaf/electricity/page/sales_revenue.xls',
302
+ :select => lambda { |row| row['Year'].to_s.first(4).to_i > 1989 })
303
+
304
+ # residential natural gas prices from the EIA
305
+ # for definition of `NaturalGasParser` see https://github.com/brighterplanet/earth/blob/master/lib/earth/residence/residence_fuel_price/data_miner.rb
306
+ RemoteTable.new('http://tonto.eia.doe.gov/dnav/ng/xls/ng_pri_sum_a_EPG0_FWA_DMcf_a.xls',
307
+ :sheet => 'Data 1',
308
+ :skip => 2,
309
+ :select => lambda { |row| row['year'].to_i > 1989 },
310
+ :transform => { :class => NaturalGasParser })
311
+
312
+ # 2005 EIA Residential Energy Consumption Survey microdata
313
+ RemoteTable.new('http://www.eia.doe.gov/emeu/recs/recspubuse05/datafiles/RECS05alldata.csv',
314
+ :headers => :upcase)
315
+
316
+ # ...and more from the tests...
317
+
318
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA&single=true&gid=0'
319
+
320
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA'
321
+
322
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA', :skip => 1, :headers => false
323
+
324
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw'
325
+
326
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw', :headers => %w{ col1 col2 col3 }
327
+
328
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=tujrgUOwDSLWb-P4KCt1qBg'
329
+
330
+ RemoteTable.new 'http://tonto.eia.doe.gov/dnav/pet/xls/PET_PRI_RESID_A_EPPR_PTA_CPGAL_M.xls', :transform => { :class => FuelOilParser }
331
+
332
+ RemoteTable.new 'http://www.freebase.com/type/exporttypeinstances/base/horses/horse_breed?page=0&filter_mode=type&filter_view=table&show%01p%3D%2Ftype%2Fobject%2Fname%01index=0&show%01p%3D%2Fcommon%2Ftopic%2Fimage%01index=1&show%01p%3D%2Fcommon%2Ftopic%2Farticle%01index=2&sort%01p%3D%2Ftype%2Fobject%2Ftype%01p%3Dlink%01p%3D%2Ftype%2Flink%2Ftimestamp%01index=false&=&exporttype=csv-8'
333
+
334
+ RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/02data.zip', :filename => 'guide_jan28.xls'
335
+
336
+ RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/08data.zip', :filename => '2008_FE_guide_ALL_rel_dates_-no sales-for DOE-5-1-08.csv'
337
+
338
+ RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/08data.zip', :glob => '/*.csv'
339
+
340
+ RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/98guide6.zip', :filename => '98guide6.csv'
341
+
342
+ RemoteTable.new 'http://www.worldmapper.org/data/opendoc/2_worldmapper_data.ods', :sheet => 'Data', :keep_blank_rows => true
343
+
344
+ RemoteTable.new 'https://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA'
345
+
346
+ RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx'
347
+
348
+ RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx', :headers => %w{foo bar baz}
349
+
350
+ RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx', :headers => false
351
+
352
+ RemoteTable.new 'http://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=293&Has_Group=3&Is_Zipped=0', :form_data => 'UserTableName=T_100_Segment__All_Carriers&[...]', :compression => :zip, :glob => '/*.csv'
353
+
354
+ RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-E.htm",
355
+ :encoding => 'US-ASCII',
356
+ :row_xpath => '//table/tr[2]/td/table/tr',
357
+ :column_xpath => 'td'
358
+
359
+ RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-G.htm",
360
+ :encoding => 'windows-1252',
361
+ :row_xpath => '//table/tr[2]/td/table/tr',
362
+ :column_xpath => 'td',
363
+ :errata => Errata.new(:url => 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw',
364
+ :responder => AircraftGuru.new)
365
+
366
+ RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-G.htm",
367
+ :encoding => 'windows-1252',
368
+ :row_xpath => '//table/tr[2]/td/table/tr',
369
+ :column_xpath => 'td',
370
+ :errata => { :url => 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw',
371
+ :responder => AircraftGuru.new }
372
+
373
+ RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/00data.zip',
374
+ :filename => 'Gd6-dsc.txt',
375
+ :format => :fixed_width,
376
+ :crop => 21..26, # inclusive
377
+ :cut => '2-',
378
+ :select => lambda { |row| /\A[A-Z]/.match row['code'] },
379
+ :schema => [[ 'code', 2, { :type => :string } ],
380
+ [ 'spacer', 2 ],
381
+ [ 'name', 52, { :type => :string } ]]
382
+
383
+ RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/test2.fixed_width.txt',
384
+ :format => :fixed_width,
385
+ :skip => 1,
386
+ :schema => [[ 'header4', 10, { :type => :string } ],
387
+ [ 'spacer', 1 ],
388
+ [ 'header5', 10, { :type => :string } ],
389
+ [ 'spacer', 12 ],
390
+ [ 'header6', 10, { :type => :string } ]]
391
+
392
+ RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/test2.fixed_width.txt',
393
+ :format => :fixed_width,
394
+ :keep_blank_rows => true,
395
+ :skip => 1,
396
+ :schema => [[ 'header4', 10, { :type => :string } ],
397
+ [ 'spacer', 1 ],
398
+ [ 'header5', 10, { :type => :string } ],
399
+ [ 'spacer', 12 ],
400
+ [ 'header6', 10, { :type => :string } ]]
401
+
402
+ RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/remote_table_row_hash_test.fixed_width.txt',
403
+ :format => :fixed_width,
404
+ :skip => 1,
405
+ :schema => [[ 'header1', 10, { :type => :string } ],
406
+ [ 'spacer', 1 ],
407
+ [ 'header2', 10, { :type => :string } ],
408
+ [ 'spacer', 12 ],
409
+ [ 'header3', 10, { :type => :string } ]]
410
+
411
+ RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/remote_table_row_hash_test.alternate_order.fixed_width.txt',
412
+ :format => :fixed_width,
413
+ :skip => 1,
414
+ :schema => [[ 'spacer', 11 ],
415
+ [ 'header2', 10, { :type => :string } ],
416
+ [ 'spacer', 1 ],
417
+ [ 'header3', 10, { :type => :string } ],
418
+ [ 'spacer', 1 ],
419
+ [ 'header1', 10, { :type => :string } ]]
420
+
421
+ ## Requirements
422
+
423
+ * MRI (not JRuby)
424
+ * Unix tools like curl, iconv, perl, cat, cut, tail, etc. accessible from `ENV['PATH']`
425
+
426
+ As this library matures, that requirement should go away.
427
+
428
+ ## Wishlist
429
+
430
+ * JRuby and Win32 compat
431
+ * The new "custom parser" syntax (aka transformer) hasn't been defined yet... only the old-style syntax is available
432
+
433
+ ## Authors
434
+
435
+ * Seamus Abshere <seamus@abshere.net>
436
+ * Andy Rossmeissl <andy@rossmeissl.net>
437
+
438
+ ## Copyright
439
+
440
+ Copyright (c) 2012 Brighter Planet. See LICENSE for details.