remote_table 1.3.0 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.gitignore +2 -0
- data/CHANGELOG +19 -0
- data/Gemfile +7 -1
- data/README.markdown +440 -0
- data/Rakefile +6 -14
- data/lib/remote_table.rb +27 -38
- data/lib/remote_table/{properties.rb → config.rb} +39 -43
- data/lib/remote_table/format.rb +24 -27
- data/lib/remote_table/format/delimited.rb +17 -21
- data/lib/remote_table/format/fixed_width.rb +9 -9
- data/lib/remote_table/format/html.rb +0 -2
- data/lib/remote_table/format/mixins/processed_by_nokogiri.rb +13 -12
- data/lib/remote_table/format/mixins/processed_by_roo.rb +17 -13
- data/lib/remote_table/format/mixins/textual.rb +13 -13
- data/lib/remote_table/format/open_office.rb +3 -0
- data/lib/remote_table/format/xml.rb +0 -2
- data/lib/remote_table/format/yaml.rb +14 -0
- data/lib/remote_table/local_file.rb +69 -7
- data/lib/remote_table/transformer.rb +7 -4
- data/lib/remote_table/version.rb +1 -1
- data/remote_table.gemspec +5 -13
- data/test/fixtures/data.yml +4 -0
- data/test/helper.rb +8 -9
- data/test/test_big.rb +43 -53
- data/test/test_errata.rb +27 -25
- data/test/test_old_syntax.rb +193 -191
- data/test/test_old_transform.rb +12 -10
- data/test/test_remote_table.rb +57 -47
- metadata +48 -64
- data/.document +0 -5
- data/README.rdoc +0 -167
- data/lib/remote_table/utils.rb +0 -157
data/.gitignore
CHANGED
data/CHANGELOG
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
1.4.0 / 2012-04-12
|
2
|
+
|
3
|
+
* Enhancements
|
4
|
+
|
5
|
+
* DRY up spawning code with UnixUtils
|
6
|
+
* Switch to minitest
|
7
|
+
* Stop defining MyCSV globally
|
8
|
+
* Test on MRI 1.8.7, MRI 1.9.3, and JRuby 1.6.7
|
9
|
+
* Warn users about ODS not working on JRuby
|
10
|
+
* Move all warnings to Kernel.warn
|
11
|
+
* Start keeping a CHANGELOG!
|
12
|
+
* Ensure we clean up temporary files
|
13
|
+
|
14
|
+
* Bug fixes
|
15
|
+
|
16
|
+
* Make sure headers (keys) on rows created with Roo are ordered in Ruby 1.8.7
|
17
|
+
* Make tests green (for now) by fixing URLs and sometimes :row_xpaths (hello FAA aircraft lookup guide)
|
18
|
+
* Use Hash#fetch for default options
|
19
|
+
* Don't try to set default_sheet if user doesn't specify a sheet name
|
data/Gemfile
CHANGED
data/README.markdown
ADDED
@@ -0,0 +1,440 @@
|
|
1
|
+
# remote_table
|
2
|
+
|
3
|
+
Open local or remote XLSX, XLS, ODS, CSV and fixed-width files.
|
4
|
+
|
5
|
+
## Production usage
|
6
|
+
|
7
|
+
Used by [the Brighter Planet Reference Data web service](http://data.brighterplanet.com), the [`data_miner` gem](https://github.com/seamusabshere/data_miner), and the [`earth` gem](https://github.com/brighterplanet/earth).
|
8
|
+
|
9
|
+
## Example
|
10
|
+
|
11
|
+
$ irb
|
12
|
+
1.9.3-p0 :001 > require 'remote_table'
|
13
|
+
=> true
|
14
|
+
1.9.3-p0 :002 > t = RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/98guide6.zip', :filename => '98guide6.csv'
|
15
|
+
=> #<RemoteTable:0x00000100851d98 @options={:filename=>"98guide6.csv"}, @url="http://www.fueleconomy.gov/FEG/epadata/98guide6.zip">
|
16
|
+
1.9.3-p0 :003 > t.rows.length
|
17
|
+
=> 806
|
18
|
+
1.9.3-p0 :004 > t.rows.first.length
|
19
|
+
=> 26
|
20
|
+
1.9.3-p0 :005 > require 'pp'
|
21
|
+
=> true
|
22
|
+
1.9.3-p0 :006 > pp t[23]
|
23
|
+
{"Class"=>"TWO SEATERS",
|
24
|
+
"Manufacturer"=>"PORSCHE",
|
25
|
+
"carline name"=>"BOXSTER",
|
26
|
+
"displ"=>"2.5",
|
27
|
+
"cyl"=>"6",
|
28
|
+
"trans"=>"Manual(M5)",
|
29
|
+
"drv"=>"R",
|
30
|
+
"cty"=>"19",
|
31
|
+
"hwy"=>"26",
|
32
|
+
"cmb"=>"22",
|
33
|
+
"ucty"=>"21.2",
|
34
|
+
"uhwy"=>"33.9499",
|
35
|
+
"ucmb"=>"25.5114",
|
36
|
+
"fl"=>"P",
|
37
|
+
"G"=>"",
|
38
|
+
"T"=>"",
|
39
|
+
"S"=>"",
|
40
|
+
"2pv"=>"",
|
41
|
+
"2lv"=>"",
|
42
|
+
"4pv"=>"",
|
43
|
+
"4lv"=>"",
|
44
|
+
"hpv"=>"",
|
45
|
+
"hlv"=>"",
|
46
|
+
"fcost"=>"956",
|
47
|
+
"eng dscr"=>"",
|
48
|
+
"trans dscr"=>""}
|
49
|
+
|
50
|
+
You get an <code>Array</code> of <code>Hash</code>es with **string keys**. If you set <code>:headers => false</code>, then you get an <code>Array</code> of <code>Array</code>s.
|
51
|
+
|
52
|
+
## Supported formats
|
53
|
+
|
54
|
+
<table>
|
55
|
+
<tr>
|
56
|
+
<th>Format</th>
|
57
|
+
<th>Notes</th>
|
58
|
+
<th>Library</th>
|
59
|
+
</tr>
|
60
|
+
<tr>
|
61
|
+
<td>Delimited (CSV, TSV, etc.)</td>
|
62
|
+
<td>All <code>RemoteTable::Format::Delimited::FASTERCSV_OPTIONS</code>, for example <code>:col_sep</code>, are passed directly to fastercsv.</td>
|
63
|
+
<td>
|
64
|
+
<a href="http://fastercsv.rubyforge.org/">fastercsv</a> (1.8);
|
65
|
+
<a href="http://www.ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/index.html">stdlib</code></a> (1.9)
|
66
|
+
</td>
|
67
|
+
</tr>
|
68
|
+
<tr>
|
69
|
+
<td>Fixed width</td>
|
70
|
+
<td>You have to set up a <code>:schema</code>.</td>
|
71
|
+
<td><a href="https://github.com/seamusabshere/fixed_width">fixed_width-multibyte</a></td>
|
72
|
+
</tr>
|
73
|
+
<tr>
|
74
|
+
<td>HTML</td>
|
75
|
+
<td>See XML.</td>
|
76
|
+
<td><a href="http://nokogiri.org/">nokogiri</a></td>
|
77
|
+
</tr>
|
78
|
+
<tr>
|
79
|
+
<td>ODS</td>
|
80
|
+
<td></td>
|
81
|
+
<td><a href="http://roo.rubyforge.org/">roo</a></td>
|
82
|
+
</tr>
|
83
|
+
<tr>
|
84
|
+
<td>XLS</td>
|
85
|
+
<td></td>
|
86
|
+
<td><a href="http://roo.rubyforge.org/">roo</a></td>
|
87
|
+
</tr>
|
88
|
+
<tr>
|
89
|
+
<td>XLSX</td>
|
90
|
+
<td></td>
|
91
|
+
<td><a href="http://roo.rubyforge.org/">roo</a></td>
|
92
|
+
</tr>
|
93
|
+
<tr>
|
94
|
+
<td>XML</td>
|
95
|
+
<td>The idea is to set up a <code>:row_[xpath|css]</code> and (optionally) a <code>:column_[xpath|css]</code>.</td>
|
96
|
+
<td><a href="http://nokogiri.org/">nokogiri</a></td>
|
97
|
+
</tr>
|
98
|
+
</table>
|
99
|
+
|
100
|
+
## Compression and packing
|
101
|
+
|
102
|
+
You can directly pick a file out of a remote archive using <code>:filename</code> or use a <code>:glob</code>.
|
103
|
+
|
104
|
+
* zip
|
105
|
+
* tar
|
106
|
+
* bz2
|
107
|
+
* gz
|
108
|
+
* exe (treated as zip)
|
109
|
+
|
110
|
+
## Encoding
|
111
|
+
|
112
|
+
Everything is forced into UTF-8. You can improve the quality of the conversion by specifying the original encoding with `:encoding`
|
113
|
+
|
114
|
+
* ASCII-8BIT and BINARY are equal
|
115
|
+
* ISO-8859-1 and Latin1 are equal
|
116
|
+
|
117
|
+
## More examples
|
118
|
+
|
119
|
+
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdHRNaVpSUWw2Z2VhN3RUV25yYWdQX2c&output=csv')
|
120
|
+
|
121
|
+
# aircraft fuel use equations derived from EMEP/EEA and ICAO
|
122
|
+
RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdEhYenF3dGt1T0Y1cTdneUNsNjV0dEE&output=csv')
|
123
|
+
|
124
|
+
# distance classes from the WRI business travel tool and UK DEFRA/DECC GHG Conversion Factors for Company Reporting
|
125
|
+
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdFBKM0xWaUhKVkxDRmdBVkE3VklxY2c&hl=en&gid=0&output=csv')
|
126
|
+
|
127
|
+
# seat classes used in the WRI GHG Protocol calculation tools
|
128
|
+
RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdG9EdmxybG1wdC1iU3JRYXNkMGhvSnc&output=csv')
|
129
|
+
|
130
|
+
# pure automobile fuels
|
131
|
+
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdE9xTEdueFM2R0diNTgxUlk1QXFSb2c&gid=0&output=csv')
|
132
|
+
|
133
|
+
# blended automobile fuels
|
134
|
+
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdEswNGIxM0U4U0N1UUppdWw2ejJEX0E&gid=0&output=csv')
|
135
|
+
|
136
|
+
# A list of hybrid make model years derived from the EPA fuel economy guide
|
137
|
+
RemoteTable.new('https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AoQJbWqPrREqdGtzekE4cGNoRGVmdmZMaTNvOWluSnc&output=csv')
|
138
|
+
|
139
|
+
# BTS aircraft type lookup table
|
140
|
+
RemoteTable.new("http://www.transtats.bts.gov/Download_Lookup.asp?Lookup=L_AIRCRAFT_TYPE",
|
141
|
+
:errata => { RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdEZ2d3JQMzV5T1o1T3JmVlFyNUZxdEE&output=csv' })
|
142
|
+
|
143
|
+
# aircraft made by whitelisted manufacturers whose ICAO code starts with 'B' from the FAA
|
144
|
+
# for definition of `Aircraft::Guru` and `manufacturer_whitelist?` see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb
|
145
|
+
RemoteTable.new("http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-B.htm",
|
146
|
+
:encoding => 'windows-1252',
|
147
|
+
:row_xpath => '//table/tr[2]/td/table/tr',
|
148
|
+
:column_xpath => 'td',
|
149
|
+
:errata => { RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdGVBRnhkRGhSaVptSDJ5bXJGbkpUSWc&output=csv', :responder => Aircraft::Guru.new },
|
150
|
+
:select => lambda { |record| manufacturer_whitelist? record['Manufacturer'] })
|
151
|
+
|
152
|
+
# OpenFlights.org airports database
|
153
|
+
RemoteTable.new('https://openflights.svn.sourceforge.net/svnroot/openflights/openflights/data/airports.dat',
|
154
|
+
:headers => %w{ id name city country_name iata_code icao_code latitude longitude altitude timezone daylight_savings },
|
155
|
+
:select => lambda { |record| record['iata_code'].present? },
|
156
|
+
:errata => { RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdFc2UzhQYU5PWEQ0N21yWFZGNmc2a3c&gid=0&output=csv', :responder => Airport::Guru.new }) # see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb
|
157
|
+
|
158
|
+
# T100 flight segment data for #{month.strftime('%B %Y')}
|
159
|
+
# for definition of `form_data` and `FlightSegment::Guru` see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/flight_segment/data_miner.rb
|
160
|
+
RemoteTable.new('http://www.transtats.bts.gov/DownLoad_Table.asp',
|
161
|
+
:form_data => form_data,
|
162
|
+
:compression => :zip,
|
163
|
+
:glob => '/*.csv',
|
164
|
+
:errata => { RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdGxpYU1qWFR3d0syTVMyQVVOaDd0V3c&output=csv', :responder => FlightSegment::Guru.new },
|
165
|
+
:select => lambda { |record| record['DEPARTURES_PERFORMED'].to_i > 0 })
|
166
|
+
|
167
|
+
# 1995 Fuel Economy Guide
|
168
|
+
# for definition of `:fuel_economy_guide_b` and `AutomobileMakeModelYearVariant::ParserB` see https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb
|
169
|
+
RemoteTable.new("http://www.fueleconomy.gov/FEG/epadata/95mfgui.zip",
|
170
|
+
:filename => '95MFGUI.DAT',
|
171
|
+
:format => :fixed_width,
|
172
|
+
:cut => '13-',
|
173
|
+
:schema_name => :fuel_economy_guide_b,
|
174
|
+
:select => lambda { |row| row['model'].present? and (row['suppress_code'].blank? or row['suppress_code'].to_f == 0) and row['state_code'] == 'F' },
|
175
|
+
:transform => { :class => AutomobileMakeModelYearVariant::ParserB, :year => 1995 },
|
176
|
+
:errata => { :url => "https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDkxTElWRVlvUXB3Uy04SDhSYWkzakE&output=csv", :responder => AutomobileMakeModelYearVariant::Guru.new })
|
177
|
+
|
178
|
+
# 1998 Fuel Economy Guide
|
179
|
+
# for definition of `AutomobileMakeModelYearVariant::ParserC` see https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb
|
180
|
+
RemoteTable.new('http://www.fueleconomy.gov/FEG/epadata/98guide6.zip',
|
181
|
+
:filename => '98guide6.csv',
|
182
|
+
:transform => { :class => AutomobileMakeModelYearVariant::ParserC, :year => 1998 },
|
183
|
+
:errata => { :url => "https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDkxTElWRVlvUXB3Uy04SDhSYWkzakE&output=csv", :responder => AutomobileMakeModelYearVariant::Guru.new },
|
184
|
+
:select => lambda { |row| row['model'].present? })
|
185
|
+
|
186
|
+
# annual corporate average fuel economy data for domestic and imported vehicle fleets from the NHTSA
|
187
|
+
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdEdXWXB6dkVLWkowLXhYSFVUT01sS2c&hl=en&gid=0&output=csv',
|
188
|
+
:errata => { 'url' => 'http://static.brighterplanet.com/science/data/transport/automobiles/make_fleet_years/errata.csv' },
|
189
|
+
:select => lambda { |row| row['volume'].to_i > 0 })
|
190
|
+
|
191
|
+
# total vehicle miles travelled by gasoline passenger cars from the 2010 EPA GHG Inventory
|
192
|
+
RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
|
193
|
+
:filename => 'Annex Tables/Annex 3/Table A-87.csv',
|
194
|
+
:skip => 1,
|
195
|
+
:select => lambda { |row| row['Year'].to_i.to_s == row['Year'] })
|
196
|
+
|
197
|
+
# total vehicle miles travelled from the 2010 EPA GHG Inventory
|
198
|
+
RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
|
199
|
+
:filename => 'Annex Tables/Annex 3/Table A-87.csv',
|
200
|
+
:skip => 1,
|
201
|
+
:select => lambda { |row| row['Year'].to_i.to_s == row['Year'] })
|
202
|
+
|
203
|
+
# total travel distribution from the 2010 EPA GHG Inventory
|
204
|
+
RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
|
205
|
+
:filename => 'Annex Tables/Annex 3/Table A-93.csv',
|
206
|
+
:skip => 1,
|
207
|
+
:select => lambda { |row| row['Vehicle Age'].to_i.to_s == row['Vehicle Age'] })
|
208
|
+
|
209
|
+
# building characteristics from the 2003 EIA Commercial Building Energy Consumption Survey
|
210
|
+
RemoteTable.new('http://www.eia.gov/emeu/cbecs/cbecs2003/public_use_2003/data/FILE02.csv',
|
211
|
+
:skip => 1,
|
212
|
+
:headers => ["PUBID8","REGION8","CENDIV8","SQFT8","SQFTC8","YRCONC8","PBA8","ELUSED8","NGUSED8","FKUSED8","PRUSED8","STUSED8","HWUSED8","ONEACT8","ACT18","ACT28","ACT38","ACT1PCT8","ACT2PCT8","ACT3PCT8","PBAPLUS8","VACANT8","RWSEAT8","PBSEAT8","EDSEAT8","FDSEAT8","HCBED8","NRSBED8","LODGRM8","FACIL8","FEDFAC8","FACACT8","MANIND8","PLANT8","FACDST8","FACDHW8","FACDCW8","FACELC8","BLDPLT8","ADJWT8","STRATUM8","PAIR8"])
|
213
|
+
|
214
|
+
# 2003 CBECS C17 - Electricity Consumption and Intensity - New England Division
|
215
|
+
# for definition of `CbecsEnergyIntensity::NAICS_CODE_SYNTHESIZER` see https://github.com/brighterplanet/earth/blob/master/lib/earth/industry/cbecs_energy_intensity/data_miner.rb
|
216
|
+
RemoteTable.new("http://www.eia.gov/emeu/cbecs/cbecs2003/detailed_tables_2003/2003set10/2003excel/C17.xls",
|
217
|
+
:headers => false,
|
218
|
+
:select => ::Proc.new { |row| CbecsEnergyIntensity::NAICS_CODE_SYNTHESIZER.call(row) },
|
219
|
+
:crop => (21..37))
|
220
|
+
|
221
|
+
# U.S. Census 2002 NAICS code list
|
222
|
+
RemoteTable.new('http://www.census.gov/epcd/naics02/naicod02.txt',
|
223
|
+
:skip => 4,
|
224
|
+
:headers => false,
|
225
|
+
:delimiter => ' ')
|
226
|
+
|
227
|
+
# MECS table 3.2 Total US
|
228
|
+
RemoteTable.new("http://205.254.135.24/emeu/mecs/mecs2006/excel/Table3_2.xls",
|
229
|
+
:crop => (15..94),
|
230
|
+
:headers => ["NAICS Code", "Subsector and Industry", "Total", "BLANK", "Net Electricity", "BLANK", "Residual Fuel Oil", "Distillate Fuel Oil", "Natural Gas", "BLANK", "LPG and NGL", "BLANK", "Coal", "Coke and Breeze", "Other"])
|
231
|
+
|
232
|
+
# MECS table 6.1 Midwest
|
233
|
+
RemoteTable.new("http://205.254.135.24/emeu/mecs/mecs2006/excel/Table6_1.xls",
|
234
|
+
:crop => (184..263),
|
235
|
+
:headers => ["NAICS Code", "Subsector and Industry", "Consumption per Employee", "Consumption per Dollar of Value Added", "Consumption per Dollar of Value of Shipments"])
|
236
|
+
|
237
|
+
# U.S. Census Geographic Terms and Definitions
|
238
|
+
RemoteTable.new('http://www.census.gov/popest/about/geo/state_geocodes_v2009.txt',
|
239
|
+
:skip => 6,
|
240
|
+
:headers => %w{ Region Division FIPS Name },
|
241
|
+
:select => ::Proc.new { |row| row['Division'].to_i > 0 and row['FIPS'].to_i == 0 })
|
242
|
+
|
243
|
+
# state census divisions from the U.S. Census
|
244
|
+
RemoteTable.new('http://www.census.gov/popest/about/geo/state_geocodes_v2009.txt',
|
245
|
+
:skip => 8,
|
246
|
+
:headers => ['Region', 'Division', 'State FIPS', 'Name'],
|
247
|
+
:select => ::Proc.new { |row| row['State FIPS'].to_i > 0 })
|
248
|
+
|
249
|
+
# OpenGeoCode.org's Country Codes to Country Names list
|
250
|
+
RemoteTable.new('http://opengeocode.org/download/countrynames.txt',
|
251
|
+
:format => :delimited,
|
252
|
+
:delimiter => ';',
|
253
|
+
:headers => false,
|
254
|
+
:skip => 22)
|
255
|
+
|
256
|
+
# heating degree day data from WRI CAIT
|
257
|
+
RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDN4MkRTSWtWRjdfazhRdWllTkVSMkE&output=csv',
|
258
|
+
:select => Proc.new { |record| record['country'] != 'European Union (27)' },
|
259
|
+
:errata => { RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDNSMUtCV0h4cUF4UnBKZlNkczlNbFE&output=csv' })
|
260
|
+
|
261
|
+
# US average grid loss factor derived eGRID 2007 data
|
262
|
+
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010V1_1_STIE_USGC.xls',
|
263
|
+
:sheet => 'USGC',
|
264
|
+
:skip => 5)
|
265
|
+
|
266
|
+
# eGRID 2010 regions and loss factors
|
267
|
+
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010V1_1_STIE_USGC.xls',
|
268
|
+
:sheet => 'STIE07',
|
269
|
+
:skip => 4,
|
270
|
+
:select => lambda { |row| row['eGRID2010 year 2007 file state sequence number'].to_i.between?(1, 51) })
|
271
|
+
|
272
|
+
# eGRID 2010 subregions and electricity emission factors
|
273
|
+
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010_Version1-1_xls_only.zip',
|
274
|
+
:filename => 'eGRID2010V1_1_year07_AGGREGATION.xls',
|
275
|
+
:sheet => 'SRL07',
|
276
|
+
:skip => 4,
|
277
|
+
:select => lambda { |row| row['SEQSRL07'].to_i.between?(1, 26) })
|
278
|
+
|
279
|
+
# U.S. Census State ANSI Code file
|
280
|
+
RemoteTable.new('http://www.census.gov/geo/www/ansi/state.txt',
|
281
|
+
:delimiter => '|',
|
282
|
+
:select => lambda { |record| record['STATE'].to_i < 60 })
|
283
|
+
|
284
|
+
# Mapping Hacks zipcode database
|
285
|
+
RemoteTable.new('http://mappinghacks.com/data/zipcode.zip',
|
286
|
+
:filename => 'zipcode.csv')
|
287
|
+
|
288
|
+
# zipcode states and eGRID Subregions from the US EPA
|
289
|
+
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/Power_Profiler_Zipcode_Tool_v3-2.xlsx',
|
290
|
+
:sheet => 'Zip-subregion')
|
291
|
+
|
292
|
+
# horse breeds
|
293
|
+
RemoteTable.new('http://www.freebase.com/type/exporttypeinstances/base/horses/horse_breed?page=0&filter_mode=type&filter_view=table&show%01p%3D%2Ftype%2Fobject%2Fname%01index=0&show%01p%3D%2Fcommon%2Ftopic%2Fimage%01index=1&show%01p%3D%2Fcommon%2Ftopic%2Farticle%01index=2&sort%01p%3D%2Ftype%2Fobject%2Ftype%01p%3Dlink%01p%3D%2Ftype%2Flink%2Ftimestamp%01index=false&=&exporttype=csv-8')
|
294
|
+
|
295
|
+
# Brighter Planet's list of cat and dog breeds, genders, and weights
|
296
|
+
RemoteTable.new('http://static.brighterplanet.com/science/data/consumables/pets/breed_genders.csv',
|
297
|
+
:encoding => 'ISO-8859-1',
|
298
|
+
:select => lambda { |row| row['gender'].present? })
|
299
|
+
|
300
|
+
# residential electricity prices from the EIA
|
301
|
+
RemoteTable.new('http://www.eia.doe.gov/cneaf/electricity/page/sales_revenue.xls',
|
302
|
+
:select => lambda { |row| row['Year'].to_s.first(4).to_i > 1989 })
|
303
|
+
|
304
|
+
# residential natural gas prices from the EIA
|
305
|
+
# for definition of `NaturalGasParser` see https://github.com/brighterplanet/earth/blob/master/lib/earth/residence/residence_fuel_price/data_miner.rb
|
306
|
+
RemoteTable.new('http://tonto.eia.doe.gov/dnav/ng/xls/ng_pri_sum_a_EPG0_FWA_DMcf_a.xls',
|
307
|
+
:sheet => 'Data 1',
|
308
|
+
:skip => 2,
|
309
|
+
:select => lambda { |row| row['year'].to_i > 1989 },
|
310
|
+
:transform => { :class => NaturalGasParser })
|
311
|
+
|
312
|
+
# 2005 EIA Residential Energy Consumption Survey microdata
|
313
|
+
RemoteTable.new('http://www.eia.doe.gov/emeu/recs/recspubuse05/datafiles/RECS05alldata.csv',
|
314
|
+
:headers => :upcase)
|
315
|
+
|
316
|
+
# ...and more from the tests...
|
317
|
+
|
318
|
+
RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA&single=true&gid=0'
|
319
|
+
|
320
|
+
RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA'
|
321
|
+
|
322
|
+
RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA', :skip => 1, :headers => false
|
323
|
+
|
324
|
+
RemoteTable.new 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw'
|
325
|
+
|
326
|
+
RemoteTable.new 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw', :headers => %w{ col1 col2 col3 }
|
327
|
+
|
328
|
+
RemoteTable.new 'http://spreadsheets.google.com/pub?key=tujrgUOwDSLWb-P4KCt1qBg'
|
329
|
+
|
330
|
+
RemoteTable.new 'http://tonto.eia.doe.gov/dnav/pet/xls/PET_PRI_RESID_A_EPPR_PTA_CPGAL_M.xls', :transform => { :class => FuelOilParser }
|
331
|
+
|
332
|
+
RemoteTable.new 'http://www.freebase.com/type/exporttypeinstances/base/horses/horse_breed?page=0&filter_mode=type&filter_view=table&show%01p%3D%2Ftype%2Fobject%2Fname%01index=0&show%01p%3D%2Fcommon%2Ftopic%2Fimage%01index=1&show%01p%3D%2Fcommon%2Ftopic%2Farticle%01index=2&sort%01p%3D%2Ftype%2Fobject%2Ftype%01p%3Dlink%01p%3D%2Ftype%2Flink%2Ftimestamp%01index=false&=&exporttype=csv-8'
|
333
|
+
|
334
|
+
RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/02data.zip', :filename => 'guide_jan28.xls'
|
335
|
+
|
336
|
+
RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/08data.zip', :filename => '2008_FE_guide_ALL_rel_dates_-no sales-for DOE-5-1-08.csv'
|
337
|
+
|
338
|
+
RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/08data.zip', :glob => '/*.csv'
|
339
|
+
|
340
|
+
RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/98guide6.zip', :filename => '98guide6.csv'
|
341
|
+
|
342
|
+
RemoteTable.new 'http://www.worldmapper.org/data/opendoc/2_worldmapper_data.ods', :sheet => 'Data', :keep_blank_rows => true
|
343
|
+
|
344
|
+
RemoteTable.new 'https://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA'
|
345
|
+
|
346
|
+
RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx'
|
347
|
+
|
348
|
+
RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx', :headers => %w{foo bar baz}
|
349
|
+
|
350
|
+
RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx', :headers => false
|
351
|
+
|
352
|
+
RemoteTable.new 'http://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=293&Has_Group=3&Is_Zipped=0', :form_data => 'UserTableName=T_100_Segment__All_Carriers&[...]', :compression => :zip, :glob => '/*.csv'
|
353
|
+
|
354
|
+
RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-E.htm",
|
355
|
+
:encoding => 'US-ASCII',
|
356
|
+
:row_xpath => '//table/tr[2]/td/table/tr',
|
357
|
+
:column_xpath => 'td'
|
358
|
+
|
359
|
+
RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-G.htm",
|
360
|
+
:encoding => 'windows-1252',
|
361
|
+
:row_xpath => '//table/tr[2]/td/table/tr',
|
362
|
+
:column_xpath => 'td',
|
363
|
+
:errata => Errata.new(:url => 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw',
|
364
|
+
:responder => AircraftGuru.new)
|
365
|
+
|
366
|
+
RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-G.htm",
|
367
|
+
:encoding => 'windows-1252',
|
368
|
+
:row_xpath => '//table/tr[2]/td/table/tr',
|
369
|
+
:column_xpath => 'td',
|
370
|
+
:errata => { :url => 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw',
|
371
|
+
:responder => AircraftGuru.new }
|
372
|
+
|
373
|
+
RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/00data.zip',
|
374
|
+
:filename => 'Gd6-dsc.txt',
|
375
|
+
:format => :fixed_width,
|
376
|
+
:crop => 21..26, # inclusive
|
377
|
+
:cut => '2-',
|
378
|
+
:select => lambda { |row| /\A[A-Z]/.match row['code'] },
|
379
|
+
:schema => [[ 'code', 2, { :type => :string } ],
|
380
|
+
[ 'spacer', 2 ],
|
381
|
+
[ 'name', 52, { :type => :string } ]]
|
382
|
+
|
383
|
+
RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/test2.fixed_width.txt',
|
384
|
+
:format => :fixed_width,
|
385
|
+
:skip => 1,
|
386
|
+
:schema => [[ 'header4', 10, { :type => :string } ],
|
387
|
+
[ 'spacer', 1 ],
|
388
|
+
[ 'header5', 10, { :type => :string } ],
|
389
|
+
[ 'spacer', 12 ],
|
390
|
+
[ 'header6', 10, { :type => :string } ]]
|
391
|
+
|
392
|
+
RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/test2.fixed_width.txt',
|
393
|
+
:format => :fixed_width,
|
394
|
+
:keep_blank_rows => true,
|
395
|
+
:skip => 1,
|
396
|
+
:schema => [[ 'header4', 10, { :type => :string } ],
|
397
|
+
[ 'spacer', 1 ],
|
398
|
+
[ 'header5', 10, { :type => :string } ],
|
399
|
+
[ 'spacer', 12 ],
|
400
|
+
[ 'header6', 10, { :type => :string } ]]
|
401
|
+
|
402
|
+
RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/remote_table_row_hash_test.fixed_width.txt',
|
403
|
+
:format => :fixed_width,
|
404
|
+
:skip => 1,
|
405
|
+
:schema => [[ 'header1', 10, { :type => :string } ],
|
406
|
+
[ 'spacer', 1 ],
|
407
|
+
[ 'header2', 10, { :type => :string } ],
|
408
|
+
[ 'spacer', 12 ],
|
409
|
+
[ 'header3', 10, { :type => :string } ]]
|
410
|
+
|
411
|
+
RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/remote_table_row_hash_test.alternate_order.fixed_width.txt',
|
412
|
+
:format => :fixed_width,
|
413
|
+
:skip => 1,
|
414
|
+
:schema => [[ 'spacer', 11 ],
|
415
|
+
[ 'header2', 10, { :type => :string } ],
|
416
|
+
[ 'spacer', 1 ],
|
417
|
+
[ 'header3', 10, { :type => :string } ],
|
418
|
+
[ 'spacer', 1 ],
|
419
|
+
[ 'header1', 10, { :type => :string } ]]
|
420
|
+
|
421
|
+
## Requirements
|
422
|
+
|
423
|
+
* MRI (not JRuby)
|
424
|
+
* Unix tools like curl, iconv, perl, cat, cut, tail, etc. accessible from `ENV['PATH']`
|
425
|
+
|
426
|
+
As this library matures, that requirement should go away.
|
427
|
+
|
428
|
+
## Wishlist
|
429
|
+
|
430
|
+
* JRuby and Win32 compat
|
431
|
+
* The new "custom parser" syntax (aka transformer) hasn't been defined yet... only the old-style syntax is available
|
432
|
+
|
433
|
+
## Authors
|
434
|
+
|
435
|
+
* Seamus Abshere <seamus@abshere.net>
|
436
|
+
* Andy Rossmeissl <andy@rossmeissl.net>
|
437
|
+
|
438
|
+
## Copyright
|
439
|
+
|
440
|
+
Copyright (c) 2012 Brighter Planet. See LICENSE for details.
|