remote_table 1.3.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/.gitignore CHANGED
@@ -5,3 +5,5 @@ rdoc
5
5
  pkg
6
6
  Gemfile.lock
7
7
  *.gem
8
+ .yardoc
9
+ doc/
data/CHANGELOG ADDED
@@ -0,0 +1,19 @@
1
+ 1.4.0 / 2012-04-12
2
+
3
+ * Enhancements
4
+
5
+ * DRY up spawning code with UnixUtils
6
+ * Switch to minitest
7
+ * Stop defining MyCSV globally
8
+ * Test on MRI 1.8.7, MRI 1.9.3, and JRuby 1.6.7
9
+ * Warn users about ODS not working on JRuby
10
+ * Move all warnings to Kernel.warn
11
+ * Start keeping a CHANGELOG!
12
+ * Ensure we clean up temporary files
13
+
14
+ * Bug fixes
15
+
16
+ * Make sure headers (keys) on rows created with Roo are ordered in Ruby 1.8.7
17
+ * Make tests green (for now) by fixing URLs and sometimes :row_xpaths (hello FAA aircraft lookup guide)
18
+ * Use Hash#fetch for default options
19
+ * Don't try to set default_sheet if user doesn't specify a sheet name
data/Gemfile CHANGED
@@ -1,4 +1,10 @@
1
- source "http://rubygems.org"
1
+ source :rubygems
2
2
 
3
3
  # Specify your gem's dependencies in remote_table.gemspec
4
4
  gemspec
5
+
6
+ gem 'errata', '>=0.2.0'
7
+ gem 'minitest'
8
+ gem 'minitest-reporters'
9
+ gem 'rake'
10
+ gem 'yard'
data/README.markdown ADDED
@@ -0,0 +1,440 @@
1
+ # remote_table
2
+
3
+ Open local or remote XLSX, XLS, ODS, CSV and fixed-width files.
4
+
5
+ ## Production usage
6
+
7
+ Used by [the Brighter Planet Reference Data web service](http://data.brighterplanet.com), the [`data_miner` gem](https://github.com/seamusabshere/data_miner), and the [`earth` gem](https://github.com/brighterplanet/earth).
8
+
9
+ ## Example
10
+
11
+ $ irb
12
+ 1.9.3-p0 :001 > require 'remote_table'
13
+ => true
14
+ 1.9.3-p0 :002 > t = RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/98guide6.zip', :filename => '98guide6.csv'
15
+ => #<RemoteTable:0x00000100851d98 @options={:filename=>"98guide6.csv"}, @url="http://www.fueleconomy.gov/FEG/epadata/98guide6.zip">
16
+ 1.9.3-p0 :003 > t.rows.length
17
+ => 806
18
+ 1.9.3-p0 :004 > t.rows.first.length
19
+ => 26
20
+ 1.9.3-p0 :005 > require 'pp'
21
+ => true
22
+ 1.9.3-p0 :006 > pp t[23]
23
+ {"Class"=>"TWO SEATERS",
24
+ "Manufacturer"=>"PORSCHE",
25
+ "carline name"=>"BOXSTER",
26
+ "displ"=>"2.5",
27
+ "cyl"=>"6",
28
+ "trans"=>"Manual(M5)",
29
+ "drv"=>"R",
30
+ "cty"=>"19",
31
+ "hwy"=>"26",
32
+ "cmb"=>"22",
33
+ "ucty"=>"21.2",
34
+ "uhwy"=>"33.9499",
35
+ "ucmb"=>"25.5114",
36
+ "fl"=>"P",
37
+ "G"=>"",
38
+ "T"=>"",
39
+ "S"=>"",
40
+ "2pv"=>"",
41
+ "2lv"=>"",
42
+ "4pv"=>"",
43
+ "4lv"=>"",
44
+ "hpv"=>"",
45
+ "hlv"=>"",
46
+ "fcost"=>"956",
47
+ "eng dscr"=>"",
48
+ "trans dscr"=>""}
49
+
50
+ You get an <code>Array</code> of <code>Hash</code>es with **string keys**. If you set <code>:headers => false</code>, then you get an <code>Array</code> of <code>Array</code>s.
51
+
52
+ ## Supported formats
53
+
54
+ <table>
55
+ <tr>
56
+ <th>Format</th>
57
+ <th>Notes</th>
58
+ <th>Library</th>
59
+ </tr>
60
+ <tr>
61
+ <td>Delimited (CSV, TSV, etc.)</td>
62
+ <td>All <code>RemoteTable::Format::Delimited::FASTERCSV_OPTIONS</code>, for example <code>:col_sep</code>, are passed directly to fastercsv.</td>
63
+ <td>
64
+ <a href="http://fastercsv.rubyforge.org/">fastercsv</a> (1.8);
65
+ <a href="http://www.ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/index.html">stdlib</code></a> (1.9)
66
+ </td>
67
+ </tr>
68
+ <tr>
69
+ <td>Fixed width</td>
70
+ <td>You have to set up a <code>:schema</code>.</td>
71
+ <td><a href="https://github.com/seamusabshere/fixed_width">fixed_width-multibyte</a></td>
72
+ </tr>
73
+ <tr>
74
+ <td>HTML</td>
75
+ <td>See XML.</td>
76
+ <td><a href="http://nokogiri.org/">nokogiri</a></td>
77
+ </tr>
78
+ <tr>
79
+ <td>ODS</td>
80
+ <td></td>
81
+ <td><a href="http://roo.rubyforge.org/">roo</a></td>
82
+ </tr>
83
+ <tr>
84
+ <td>XLS</td>
85
+ <td></td>
86
+ <td><a href="http://roo.rubyforge.org/">roo</a></td>
87
+ </tr>
88
+ <tr>
89
+ <td>XLSX</td>
90
+ <td></td>
91
+ <td><a href="http://roo.rubyforge.org/">roo</a></td>
92
+ </tr>
93
+ <tr>
94
+ <td>XML</td>
95
+ <td>The idea is to set up a <code>:row_[xpath|css]</code> and (optionally) a <code>:column_[xpath|css]</code>.</td>
96
+ <td><a href="http://nokogiri.org/">nokogiri</a></td>
97
+ </tr>
98
+ </table>
99
+
100
+ ## Compression and packing
101
+
102
+ You can directly pick a file out of a remote archive using <code>:filename</code> or use a <code>:glob</code>.
103
+
104
+ * zip
105
+ * tar
106
+ * bz2
107
+ * gz
108
+ * exe (treated as zip)
109
+
110
+ ## Encoding
111
+
112
+ Everything is forced into UTF-8. You can improve the quality of the conversion by specifying the original encoding with `:encoding`
113
+
114
+ * ASCII-8BIT and BINARY are equal
115
+ * ISO-8859-1 and Latin1 are equal
116
+
117
+ ## More examples
118
+
119
+ RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdHRNaVpSUWw2Z2VhN3RUV25yYWdQX2c&output=csv')
120
+
121
+ # aircraft fuel use equations derived from EMEP/EEA and ICAO
122
+ RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdEhYenF3dGt1T0Y1cTdneUNsNjV0dEE&output=csv')
123
+
124
+ # distance classes from the WRI business travel tool and UK DEFRA/DECC GHG Conversion Factors for Company Reporting
125
+ RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdFBKM0xWaUhKVkxDRmdBVkE3VklxY2c&hl=en&gid=0&output=csv')
126
+
127
+ # seat classes used in the WRI GHG Protocol calculation tools
128
+ RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdG9EdmxybG1wdC1iU3JRYXNkMGhvSnc&output=csv')
129
+
130
+ # pure automobile fuels
131
+ RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdE9xTEdueFM2R0diNTgxUlk1QXFSb2c&gid=0&output=csv')
132
+
133
+ # blended automobile fuels
134
+ RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdEswNGIxM0U4U0N1UUppdWw2ejJEX0E&gid=0&output=csv')
135
+
136
+ # A list of hybrid make model years derived from the EPA fuel economy guide
137
+ RemoteTable.new('https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AoQJbWqPrREqdGtzekE4cGNoRGVmdmZMaTNvOWluSnc&output=csv')
138
+
139
+ # BTS aircraft type lookup table
140
+ RemoteTable.new("http://www.transtats.bts.gov/Download_Lookup.asp?Lookup=L_AIRCRAFT_TYPE",
141
+ :errata => { RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdEZ2d3JQMzV5T1o1T3JmVlFyNUZxdEE&output=csv' })
142
+
143
+ # aircraft made by whitelisted manufacturers whose ICAO code starts with 'B' from the FAA
144
+ # for definition of `Aircraft::Guru` and `manufacturer_whitelist?` see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb
145
+ RemoteTable.new("http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-B.htm",
146
+ :encoding => 'windows-1252',
147
+ :row_xpath => '//table/tr[2]/td/table/tr',
148
+ :column_xpath => 'td',
149
+ :errata => { RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdGVBRnhkRGhSaVptSDJ5bXJGbkpUSWc&output=csv', :responder => Aircraft::Guru.new },
150
+ :select => lambda { |record| manufacturer_whitelist? record['Manufacturer'] })
151
+
152
+ # OpenFlights.org airports database
153
+ RemoteTable.new('https://openflights.svn.sourceforge.net/svnroot/openflights/openflights/data/airports.dat',
154
+ :headers => %w{ id name city country_name iata_code icao_code latitude longitude altitude timezone daylight_savings },
155
+ :select => lambda { |record| record['iata_code'].present? },
156
+ :errata => { RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdFc2UzhQYU5PWEQ0N21yWFZGNmc2a3c&gid=0&output=csv', :responder => Airport::Guru.new }) # see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb
157
+
158
+ # T100 flight segment data for #{month.strftime('%B %Y')}
159
+ # for definition of `form_data` and `FlightSegment::Guru` see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/flight_segment/data_miner.rb
160
+ RemoteTable.new('http://www.transtats.bts.gov/DownLoad_Table.asp',
161
+ :form_data => form_data,
162
+ :compression => :zip,
163
+ :glob => '/*.csv',
164
+ :errata => { RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdGxpYU1qWFR3d0syTVMyQVVOaDd0V3c&output=csv', :responder => FlightSegment::Guru.new },
165
+ :select => lambda { |record| record['DEPARTURES_PERFORMED'].to_i > 0 })
166
+
167
+ # 1995 Fuel Economy Guide
168
+ # for definition of `:fuel_economy_guide_b` and `AutomobileMakeModelYearVariant::ParserB` see https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb
169
+ RemoteTable.new("http://www.fueleconomy.gov/FEG/epadata/95mfgui.zip",
170
+ :filename => '95MFGUI.DAT',
171
+ :format => :fixed_width,
172
+ :cut => '13-',
173
+ :schema_name => :fuel_economy_guide_b,
174
+ :select => lambda { |row| row['model'].present? and (row['suppress_code'].blank? or row['suppress_code'].to_f == 0) and row['state_code'] == 'F' },
175
+ :transform => { :class => AutomobileMakeModelYearVariant::ParserB, :year => 1995 },
176
+ :errata => { :url => "https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDkxTElWRVlvUXB3Uy04SDhSYWkzakE&output=csv", :responder => AutomobileMakeModelYearVariant::Guru.new })
177
+
178
+ # 1998 Fuel Economy Guide
179
+ # for definition of `AutomobileMakeModelYearVariant::ParserC` see https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb
180
+ RemoteTable.new('http://www.fueleconomy.gov/FEG/epadata/98guide6.zip',
181
+ :filename => '98guide6.csv',
182
+ :transform => { :class => AutomobileMakeModelYearVariant::ParserC, :year => 1998 },
183
+ :errata => { :url => "https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDkxTElWRVlvUXB3Uy04SDhSYWkzakE&output=csv", :responder => AutomobileMakeModelYearVariant::Guru.new },
184
+ :select => lambda { |row| row['model'].present? })
185
+
186
+ # annual corporate average fuel economy data for domestic and imported vehicle fleets from the NHTSA
187
+ RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdEdXWXB6dkVLWkowLXhYSFVUT01sS2c&hl=en&gid=0&output=csv',
188
+ :errata => { 'url' => 'http://static.brighterplanet.com/science/data/transport/automobiles/make_fleet_years/errata.csv' },
189
+ :select => lambda { |row| row['volume'].to_i > 0 })
190
+
191
+ # total vehicle miles travelled by gasoline passenger cars from the 2010 EPA GHG Inventory
192
+ RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
193
+ :filename => 'Annex Tables/Annex 3/Table A-87.csv',
194
+ :skip => 1,
195
+ :select => lambda { |row| row['Year'].to_i.to_s == row['Year'] })
196
+
197
+ # total vehicle miles travelled from the 2010 EPA GHG Inventory
198
+ RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
199
+ :filename => 'Annex Tables/Annex 3/Table A-87.csv',
200
+ :skip => 1,
201
+ :select => lambda { |row| row['Year'].to_i.to_s == row['Year'] })
202
+
203
+ # total travel distribution from the 2010 EPA GHG Inventory
204
+ RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
205
+ :filename => 'Annex Tables/Annex 3/Table A-93.csv',
206
+ :skip => 1,
207
+ :select => lambda { |row| row['Vehicle Age'].to_i.to_s == row['Vehicle Age'] })
208
+
209
+ # building characteristics from the 2003 EIA Commercial Building Energy Consumption Survey
210
+ RemoteTable.new('http://www.eia.gov/emeu/cbecs/cbecs2003/public_use_2003/data/FILE02.csv',
211
+ :skip => 1,
212
+ :headers => ["PUBID8","REGION8","CENDIV8","SQFT8","SQFTC8","YRCONC8","PBA8","ELUSED8","NGUSED8","FKUSED8","PRUSED8","STUSED8","HWUSED8","ONEACT8","ACT18","ACT28","ACT38","ACT1PCT8","ACT2PCT8","ACT3PCT8","PBAPLUS8","VACANT8","RWSEAT8","PBSEAT8","EDSEAT8","FDSEAT8","HCBED8","NRSBED8","LODGRM8","FACIL8","FEDFAC8","FACACT8","MANIND8","PLANT8","FACDST8","FACDHW8","FACDCW8","FACELC8","BLDPLT8","ADJWT8","STRATUM8","PAIR8"])
213
+
214
+ # 2003 CBECS C17 - Electricity Consumption and Intensity - New England Division
215
+ # for definition of `CbecsEnergyIntensity::NAICS_CODE_SYNTHESIZER` see https://github.com/brighterplanet/earth/blob/master/lib/earth/industry/cbecs_energy_intensity/data_miner.rb
216
+ RemoteTable.new("http://www.eia.gov/emeu/cbecs/cbecs2003/detailed_tables_2003/2003set10/2003excel/C17.xls",
217
+ :headers => false,
218
+ :select => ::Proc.new { |row| CbecsEnergyIntensity::NAICS_CODE_SYNTHESIZER.call(row) },
219
+ :crop => (21..37))
220
+
221
+ # U.S. Census 2002 NAICS code list
222
+ RemoteTable.new('http://www.census.gov/epcd/naics02/naicod02.txt',
223
+ :skip => 4,
224
+ :headers => false,
225
+ :delimiter => ' ')
226
+
227
+ # MECS table 3.2 Total US
228
+ RemoteTable.new("http://205.254.135.24/emeu/mecs/mecs2006/excel/Table3_2.xls",
229
+ :crop => (15..94),
230
+ :headers => ["NAICS Code", "Subsector and Industry", "Total", "BLANK", "Net Electricity", "BLANK", "Residual Fuel Oil", "Distillate Fuel Oil", "Natural Gas", "BLANK", "LPG and NGL", "BLANK", "Coal", "Coke and Breeze", "Other"])
231
+
232
+ # MECS table 6.1 Midwest
233
+ RemoteTable.new("http://205.254.135.24/emeu/mecs/mecs2006/excel/Table6_1.xls",
234
+ :crop => (184..263),
235
+ :headers => ["NAICS Code", "Subsector and Industry", "Consumption per Employee", "Consumption per Dollar of Value Added", "Consumption per Dollar of Value of Shipments"])
236
+
237
+ # U.S. Census Geographic Terms and Definitions
238
+ RemoteTable.new('http://www.census.gov/popest/about/geo/state_geocodes_v2009.txt',
239
+ :skip => 6,
240
+ :headers => %w{ Region Division FIPS Name },
241
+ :select => ::Proc.new { |row| row['Division'].to_i > 0 and row['FIPS'].to_i == 0 })
242
+
243
+ # state census divisions from the U.S. Census
244
+ RemoteTable.new('http://www.census.gov/popest/about/geo/state_geocodes_v2009.txt',
245
+ :skip => 8,
246
+ :headers => ['Region', 'Division', 'State FIPS', 'Name'],
247
+ :select => ::Proc.new { |row| row['State FIPS'].to_i > 0 })
248
+
249
+ # OpenGeoCode.org's Country Codes to Country Names list
250
+ RemoteTable.new('http://opengeocode.org/download/countrynames.txt',
251
+ :format => :delimited,
252
+ :delimiter => ';',
253
+ :headers => false,
254
+ :skip => 22)
255
+
256
+ # heating degree day data from WRI CAIT
257
+ RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDN4MkRTSWtWRjdfazhRdWllTkVSMkE&output=csv',
258
+ :select => Proc.new { |record| record['country'] != 'European Union (27)' },
259
+ :errata => { RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDNSMUtCV0h4cUF4UnBKZlNkczlNbFE&output=csv' })
260
+
261
+ # US average grid loss factor derived eGRID 2007 data
262
+ RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010V1_1_STIE_USGC.xls',
263
+ :sheet => 'USGC',
264
+ :skip => 5)
265
+
266
+ # eGRID 2010 regions and loss factors
267
+ RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010V1_1_STIE_USGC.xls',
268
+ :sheet => 'STIE07',
269
+ :skip => 4,
270
+ :select => lambda { |row| row['eGRID2010 year 2007 file state sequence number'].to_i.between?(1, 51) })
271
+
272
+ # eGRID 2010 subregions and electricity emission factors
273
+ RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010_Version1-1_xls_only.zip',
274
+ :filename => 'eGRID2010V1_1_year07_AGGREGATION.xls',
275
+ :sheet => 'SRL07',
276
+ :skip => 4,
277
+ :select => lambda { |row| row['SEQSRL07'].to_i.between?(1, 26) })
278
+
279
+ # U.S. Census State ANSI Code file
280
+ RemoteTable.new('http://www.census.gov/geo/www/ansi/state.txt',
281
+ :delimiter => '|',
282
+ :select => lambda { |record| record['STATE'].to_i < 60 })
283
+
284
+ # Mapping Hacks zipcode database
285
+ RemoteTable.new('http://mappinghacks.com/data/zipcode.zip',
286
+ :filename => 'zipcode.csv')
287
+
288
+ # zipcode states and eGRID Subregions from the US EPA
289
+ RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/Power_Profiler_Zipcode_Tool_v3-2.xlsx',
290
+ :sheet => 'Zip-subregion')
291
+
292
+ # horse breeds
293
+ RemoteTable.new('http://www.freebase.com/type/exporttypeinstances/base/horses/horse_breed?page=0&filter_mode=type&filter_view=table&show%01p%3D%2Ftype%2Fobject%2Fname%01index=0&show%01p%3D%2Fcommon%2Ftopic%2Fimage%01index=1&show%01p%3D%2Fcommon%2Ftopic%2Farticle%01index=2&sort%01p%3D%2Ftype%2Fobject%2Ftype%01p%3Dlink%01p%3D%2Ftype%2Flink%2Ftimestamp%01index=false&=&exporttype=csv-8')
294
+
295
+ # Brighter Planet's list of cat and dog breeds, genders, and weights
296
+ RemoteTable.new('http://static.brighterplanet.com/science/data/consumables/pets/breed_genders.csv',
297
+ :encoding => 'ISO-8859-1',
298
+ :select => lambda { |row| row['gender'].present? })
299
+
300
+ # residential electricity prices from the EIA
301
+ RemoteTable.new('http://www.eia.doe.gov/cneaf/electricity/page/sales_revenue.xls',
302
+ :select => lambda { |row| row['Year'].to_s.first(4).to_i > 1989 })
303
+
304
+ # residential natural gas prices from the EIA
305
+ # for definition of `NaturalGasParser` see https://github.com/brighterplanet/earth/blob/master/lib/earth/residence/residence_fuel_price/data_miner.rb
306
+ RemoteTable.new('http://tonto.eia.doe.gov/dnav/ng/xls/ng_pri_sum_a_EPG0_FWA_DMcf_a.xls',
307
+ :sheet => 'Data 1',
308
+ :skip => 2,
309
+ :select => lambda { |row| row['year'].to_i > 1989 },
310
+ :transform => { :class => NaturalGasParser })
311
+
312
+ # 2005 EIA Residential Energy Consumption Survey microdata
313
+ RemoteTable.new('http://www.eia.doe.gov/emeu/recs/recspubuse05/datafiles/RECS05alldata.csv',
314
+ :headers => :upcase)
315
+
316
+ # ...and more from the tests...
317
+
318
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA&single=true&gid=0'
319
+
320
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA'
321
+
322
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA', :skip => 1, :headers => false
323
+
324
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw'
325
+
326
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw', :headers => %w{ col1 col2 col3 }
327
+
328
+ RemoteTable.new 'http://spreadsheets.google.com/pub?key=tujrgUOwDSLWb-P4KCt1qBg'
329
+
330
+ RemoteTable.new 'http://tonto.eia.doe.gov/dnav/pet/xls/PET_PRI_RESID_A_EPPR_PTA_CPGAL_M.xls', :transform => { :class => FuelOilParser }
331
+
332
+ RemoteTable.new 'http://www.freebase.com/type/exporttypeinstances/base/horses/horse_breed?page=0&filter_mode=type&filter_view=table&show%01p%3D%2Ftype%2Fobject%2Fname%01index=0&show%01p%3D%2Fcommon%2Ftopic%2Fimage%01index=1&show%01p%3D%2Fcommon%2Ftopic%2Farticle%01index=2&sort%01p%3D%2Ftype%2Fobject%2Ftype%01p%3Dlink%01p%3D%2Ftype%2Flink%2Ftimestamp%01index=false&=&exporttype=csv-8'
333
+
334
+ RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/02data.zip', :filename => 'guide_jan28.xls'
335
+
336
+ RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/08data.zip', :filename => '2008_FE_guide_ALL_rel_dates_-no sales-for DOE-5-1-08.csv'
337
+
338
+ RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/08data.zip', :glob => '/*.csv'
339
+
340
+ RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/98guide6.zip', :filename => '98guide6.csv'
341
+
342
+ RemoteTable.new 'http://www.worldmapper.org/data/opendoc/2_worldmapper_data.ods', :sheet => 'Data', :keep_blank_rows => true
343
+
344
+ RemoteTable.new 'https://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA'
345
+
346
+ RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx'
347
+
348
+ RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx', :headers => %w{foo bar baz}
349
+
350
+ RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx', :headers => false
351
+
352
+ RemoteTable.new 'http://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=293&Has_Group=3&Is_Zipped=0', :form_data => 'UserTableName=T_100_Segment__All_Carriers&[...]', :compression => :zip, :glob => '/*.csv'
353
+
354
+ RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-E.htm",
355
+ :encoding => 'US-ASCII',
356
+ :row_xpath => '//table/tr[2]/td/table/tr',
357
+ :column_xpath => 'td'
358
+
359
+ RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-G.htm",
360
+ :encoding => 'windows-1252',
361
+ :row_xpath => '//table/tr[2]/td/table/tr',
362
+ :column_xpath => 'td',
363
+ :errata => Errata.new(:url => 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw',
364
+ :responder => AircraftGuru.new)
365
+
366
+ RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-G.htm",
367
+ :encoding => 'windows-1252',
368
+ :row_xpath => '//table/tr[2]/td/table/tr',
369
+ :column_xpath => 'td',
370
+ :errata => { :url => 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw',
371
+ :responder => AircraftGuru.new }
372
+
373
+ RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/00data.zip',
374
+ :filename => 'Gd6-dsc.txt',
375
+ :format => :fixed_width,
376
+ :crop => 21..26, # inclusive
377
+ :cut => '2-',
378
+ :select => lambda { |row| /\A[A-Z]/.match row['code'] },
379
+ :schema => [[ 'code', 2, { :type => :string } ],
380
+ [ 'spacer', 2 ],
381
+ [ 'name', 52, { :type => :string } ]]
382
+
383
+ RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/test2.fixed_width.txt',
384
+ :format => :fixed_width,
385
+ :skip => 1,
386
+ :schema => [[ 'header4', 10, { :type => :string } ],
387
+ [ 'spacer', 1 ],
388
+ [ 'header5', 10, { :type => :string } ],
389
+ [ 'spacer', 12 ],
390
+ [ 'header6', 10, { :type => :string } ]]
391
+
392
+ RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/test2.fixed_width.txt',
393
+ :format => :fixed_width,
394
+ :keep_blank_rows => true,
395
+ :skip => 1,
396
+ :schema => [[ 'header4', 10, { :type => :string } ],
397
+ [ 'spacer', 1 ],
398
+ [ 'header5', 10, { :type => :string } ],
399
+ [ 'spacer', 12 ],
400
+ [ 'header6', 10, { :type => :string } ]]
401
+
402
+ RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/remote_table_row_hash_test.fixed_width.txt',
403
+ :format => :fixed_width,
404
+ :skip => 1,
405
+ :schema => [[ 'header1', 10, { :type => :string } ],
406
+ [ 'spacer', 1 ],
407
+ [ 'header2', 10, { :type => :string } ],
408
+ [ 'spacer', 12 ],
409
+ [ 'header3', 10, { :type => :string } ]]
410
+
411
+ RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/remote_table_row_hash_test.alternate_order.fixed_width.txt',
412
+ :format => :fixed_width,
413
+ :skip => 1,
414
+ :schema => [[ 'spacer', 11 ],
415
+ [ 'header2', 10, { :type => :string } ],
416
+ [ 'spacer', 1 ],
417
+ [ 'header3', 10, { :type => :string } ],
418
+ [ 'spacer', 1 ],
419
+ [ 'header1', 10, { :type => :string } ]]
420
+
421
+ ## Requirements
422
+
423
+ * MRI (not JRuby)
424
+ * Unix tools like curl, iconv, perl, cat, cut, tail, etc. accessible from `ENV['PATH']`
425
+
426
+ As this library matures, that requirement should go away.
427
+
428
+ ## Wishlist
429
+
430
+ * JRuby and Win32 compat
431
+ * The new "custom parser" syntax (aka transformer) hasn't been defined yet... only the old-style syntax is available
432
+
433
+ ## Authors
434
+
435
+ * Seamus Abshere <seamus@abshere.net>
436
+ * Andy Rossmeissl <andy@rossmeissl.net>
437
+
438
+ ## Copyright
439
+
440
+ Copyright (c) 2012 Brighter Planet. See LICENSE for details.