csvpack 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/HISTORY.md +4 -0
- data/Manifest.txt +13 -0
- data/README.md +354 -0
- data/Rakefile +32 -0
- data/lib/csvpack.rb +52 -0
- data/lib/csvpack/downloader.rb +62 -0
- data/lib/csvpack/pack.rb +246 -0
- data/lib/csvpack/version.rb +22 -0
- data/test/helper.rb +7 -0
- data/test/test_companies.rb +61 -0
- data/test/test_countries.rb +40 -0
- data/test/test_downloader.rb +32 -0
- data/test/test_import.rb +22 -0
- metadata +137 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: b9ea6e66b9c56d609a881d9edd84c19fa58c9f8e
|
4
|
+
data.tar.gz: 0a0b7df35fa79d3f06aceb7a100aeb20ee71bd0e
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 1e5aa488c56683fbd1215da82c60cc8213b890c0e3b42cd126626d3005e2de7ffac156288bb91bcab42f234a34e8734731acd8c1f61694b7b1362e7d154d8fc9
|
7
|
+
data.tar.gz: 4eccdd36ca45df01c5d69f63e51abe71bb191a3144998fe0cf2de9ceb639aeb399df24b470f66752fa3a5e96e276b15f7863ff4b7cb258c5cef86520de8874d1
|
data/HISTORY.md
ADDED
data/Manifest.txt
ADDED
@@ -0,0 +1,13 @@
|
|
1
|
+
HISTORY.md
|
2
|
+
Manifest.txt
|
3
|
+
README.md
|
4
|
+
Rakefile
|
5
|
+
lib/csvpack.rb
|
6
|
+
lib/csvpack/downloader.rb
|
7
|
+
lib/csvpack/pack.rb
|
8
|
+
lib/csvpack/version.rb
|
9
|
+
test/helper.rb
|
10
|
+
test/test_companies.rb
|
11
|
+
test/test_countries.rb
|
12
|
+
test/test_downloader.rb
|
13
|
+
test/test_import.rb
|
data/README.md
ADDED
@@ -0,0 +1,354 @@
|
|
1
|
+
# csvpack
|
2
|
+
|
3
|
+
work with tabular data packages using comma-separated values (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more
|
4
|
+
|
5
|
+
|
6
|
+
* home :: [github.com/csv11/csvpack](https://github.com/csv11/csvpack)
|
7
|
+
* bugs :: [github.com/csv11/csvpack/issues](https://github.com/csv11/csvpack/issues)
|
8
|
+
* gem :: [rubygems.org/gems/csvpack](https://rubygems.org/gems/csvpack)
|
9
|
+
* rdoc :: [rubydoc.info/gems/csvpack](http://rubydoc.info/gems/csvpack)
|
10
|
+
* forum :: [ruby-talk@ruby-lang.org](http://www.ruby-lang.org/en/community/mailing-lists/)
|
11
|
+
|
12
|
+
|
13
|
+
|
14
|
+
## Usage
|
15
|
+
|
16
|
+
|
17
|
+
### What's a tabular data package?
|
18
|
+
|
19
|
+
> Tabular Data Package is a simple structure for publishing and sharing
|
20
|
+
> tabular data with the following key features:
|
21
|
+
>
|
22
|
+
> - Data is stored in CSV (comma separated values) files
|
23
|
+
> - Metadata about the dataset both general (e.g. title, author)
|
24
|
+
> and the specific data files (e.g. schema) is stored in a single JSON file
|
25
|
+
> named `datapackage.json` which follows the Data Package format
|
26
|
+
|
27
|
+
(Source: [Tabular Data Packages, Frictionless Data Initiative • Data Hub.io • Open Knowledge Foundation • Data Protocols.org](https://datahub.io/docs/data-packages/tabular))
|
28
|
+
|
29
|
+
|
30
|
+
|
31
|
+
Here's a minimal example of a tabular data package holding two files, that is, `data.csv` and `datapackage.json`:
|
32
|
+
|
33
|
+
`data.csv`:
|
34
|
+
|
35
|
+
```
|
36
|
+
Brewery,City,Name,Abv
|
37
|
+
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
|
38
|
+
Augustiner Bräu München,München,Edelstoff,5.6%
|
39
|
+
Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
|
40
|
+
Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
|
41
|
+
Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
|
42
|
+
Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
|
43
|
+
...
|
44
|
+
```
|
45
|
+
|
46
|
+
`datapackage.json`:
|
47
|
+
|
48
|
+
``` json
|
49
|
+
{
|
50
|
+
"name": "beer",
|
51
|
+
"resources": [
|
52
|
+
{
|
53
|
+
"path": "data.csv",
|
54
|
+
"schema": {
|
55
|
+
"fields": [{ "name": "Brewery", "type": "string" },
|
56
|
+
{ "name": "City", "type": "string" },
|
57
|
+
{ "name": "Name", "type": "string" },
|
58
|
+
{ "name": "Abv", "type": "number" }]
|
59
|
+
}
|
60
|
+
}
|
61
|
+
]
|
62
|
+
}
|
63
|
+
```
|
64
|
+
|
65
|
+
|
66
|
+
|
67
|
+
### Where to find data packages?
|
68
|
+
|
69
|
+
For some real world examples see the [Data Packages Listing](https://datahub.io/core) ([Sources](https://github.com/datasets)) at the Data Hub.io • Frictionless Data Initiative
|
70
|
+
website for a start. Tabular data packages include:
|
71
|
+
|
72
|
+
Name | Comments
|
73
|
+
------------------------ | -------------
|
74
|
+
`country-codes` | Comprehensive country codes: ISO 3166, ITU, ISO 4217 currency codes and many more
|
75
|
+
`language-codes` | ISO Language Codes (639-1 and 693-2)
|
76
|
+
`currency-codes` | ISO 4217 Currency Codes
|
77
|
+
`gdb` | Country, Regional and World GDP (Gross Domestic Product)
|
78
|
+
`s-and-p-500-companies` | S&P 500 Companies with Financial Information
|
79
|
+
`un-locode` | UN-LOCODE Codelist
|
80
|
+
`gold-prices` | Gold Prices (Monthly in USD)
|
81
|
+
`bond-yields-uk-10y` | 10 Year UK Government Bond Yields (Long-Term Interest Rate)
|
82
|
+
|
83
|
+
|
84
|
+
|
85
|
+
and many more
|
86
|
+
|
87
|
+
|
88
|
+
### Code, Code, Code - Script Your Data Workflow with Ruby
|
89
|
+
|
90
|
+
|
91
|
+
``` ruby
|
92
|
+
require 'csvpack'
|
93
|
+
|
94
|
+
CsvPack.import(
|
95
|
+
's-and-p-500-companies',
|
96
|
+
'gdb'
|
97
|
+
)
|
98
|
+
```
|
99
|
+
|
100
|
+
Using `CsvPack.import` will:
|
101
|
+
|
102
|
+
1) download all data packages to the `./pack` folder
|
103
|
+
|
104
|
+
2) (auto-)add all tables to an in-memory SQLite database using SQL `create_table`
|
105
|
+
commands via `ActiveRecord` migrations e.g.
|
106
|
+
|
107
|
+
|
108
|
+
``` ruby
|
109
|
+
create_table :constituents_financials do |t|
|
110
|
+
t.string :symbol # Symbol (string)
|
111
|
+
t.string :name # Name (string)
|
112
|
+
t.string :sector # Sector (string)
|
113
|
+
t.float :price # Price (number)
|
114
|
+
t.float :dividend_yield # Dividend Yield (number)
|
115
|
+
t.float :price_earnings # Price/Earnings (number)
|
116
|
+
t.float :earnings_share # Earnings/Share (number)
|
117
|
+
t.float :book_value # Book Value (number)
|
118
|
+
t.float :_52_week_low # 52 week low (number)
|
119
|
+
t.float :_52_week_high # 52 week high (number)
|
120
|
+
t.float :market_cap # Market Cap (number)
|
121
|
+
t.float :ebitda # EBITDA (number)
|
122
|
+
t.float :price_sales # Price/Sales (number)
|
123
|
+
t.float :price_book # Price/Book (number)
|
124
|
+
t.string :sec_filings # SEC Filings (string)
|
125
|
+
end
|
126
|
+
```
|
127
|
+
|
128
|
+
3) (auto-)import all datasets using SQL inserts e.g.
|
129
|
+
|
130
|
+
``` sql
|
131
|
+
INSERT INTO constituents_financials
|
132
|
+
(symbol,
|
133
|
+
name,
|
134
|
+
sector,
|
135
|
+
price,
|
136
|
+
dividend_yield,
|
137
|
+
price_earnings,
|
138
|
+
earnings_share,
|
139
|
+
book_value,
|
140
|
+
_52_week_low,
|
141
|
+
_52_week_high,
|
142
|
+
market_cap,
|
143
|
+
ebitda,
|
144
|
+
price_sales,
|
145
|
+
price_book,
|
146
|
+
sec_filings)
|
147
|
+
VALUES
|
148
|
+
('MMM',
|
149
|
+
'3M Co',
|
150
|
+
'Industrials',
|
151
|
+
162.27,
|
152
|
+
2.11,
|
153
|
+
22.28,
|
154
|
+
7.284,
|
155
|
+
25.238,
|
156
|
+
123.61,
|
157
|
+
162.92,
|
158
|
+
104.0,
|
159
|
+
8.467,
|
160
|
+
3.28,
|
161
|
+
6.43,
|
162
|
+
'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=MMM')
|
163
|
+
```
|
164
|
+
|
165
|
+
4) (auto-)add ActiveRecord models for all tables.
|
166
|
+
|
167
|
+
|
168
|
+
So what? Now you can use all the "magic" of ActiveRecord to query
|
169
|
+
the datasets. Example:
|
170
|
+
|
171
|
+
``` ruby
|
172
|
+
pp Constituent.count
|
173
|
+
|
174
|
+
# SELECT COUNT(*) FROM "constituents"
|
175
|
+
# => 496
|
176
|
+
|
177
|
+
|
178
|
+
pp Constituent.first
|
179
|
+
|
180
|
+
# SELECT "constituents".* FROM "constituents" ORDER BY "constituents"."id" ASC LIMIT 1
|
181
|
+
# => #<Constituent:0x9f8cb78
|
182
|
+
# id: 1,
|
183
|
+
# symbol: "MMM",
|
184
|
+
# name: "3M Co",
|
185
|
+
# sector: "Industrials">
|
186
|
+
|
187
|
+
|
188
|
+
pp Constituent.find_by!( symbol: 'MMM' )
|
189
|
+
|
190
|
+
# SELECT "constituents".*
|
191
|
+
# FROM "constituents"
|
192
|
+
# WHERE "constituents"."symbol" = "MMM"
|
193
|
+
# LIMIT 1
|
194
|
+
# => #<Constituent:0x9f8cb78
|
195
|
+
# id: 1,
|
196
|
+
# symbol: "MMM",
|
197
|
+
# name: "3M Co",
|
198
|
+
# sector: "Industrials">
|
199
|
+
|
200
|
+
|
201
|
+
pp Constituent.find_by!( name: '3M Co' )
|
202
|
+
|
203
|
+
# SELECT "constituents".*
|
204
|
+
# FROM "constituents"
|
205
|
+
# WHERE "constituents"."name" = "3M Co"
|
206
|
+
# LIMIT 1
|
207
|
+
# => #<Constituent:0x9f8cb78
|
208
|
+
# id: 1,
|
209
|
+
# symbol: "MMM",
|
210
|
+
# name: "3M Co",
|
211
|
+
# sector: "Industrials">
|
212
|
+
|
213
|
+
|
214
|
+
pp Constituent.where( sector: 'Industrials' ).count
|
215
|
+
|
216
|
+
# SELECT COUNT(*) FROM "constituents"
|
217
|
+
# WHERE "constituents"."sector" = "Industrials"
|
218
|
+
# => 63
|
219
|
+
|
220
|
+
|
221
|
+
pp Constituent.where( sector: 'Industrials' ).all
|
222
|
+
|
223
|
+
# SELECT "constituents".*
|
224
|
+
# FROM "constituents"
|
225
|
+
# WHERE "constituents"."sector" = "Industrials"
|
226
|
+
# => [#<Constituent:0x9f8cb78
|
227
|
+
# id: 1,
|
228
|
+
# symbol: "MMM",
|
229
|
+
# name: "3M Co",
|
230
|
+
# sector: "Industrials">,
|
231
|
+
# #<Constituent:0xa2a4180
|
232
|
+
# id: 8,
|
233
|
+
# symbol: "ADT",
|
234
|
+
# name: "ADT Corp (The)",
|
235
|
+
# sector: "Industrials">,...]
|
236
|
+
```
|
237
|
+
|
238
|
+
and so on
|
239
|
+
|
240
|
+
|
241
|
+
|
242
|
+
### Frequently Asked Questions (F.A.Qs) and Answers
|
243
|
+
|
244
|
+
|
245
|
+
#### Q: How to dowload a data package ("by hand")?
|
246
|
+
|
247
|
+
Use the `CsvPack::Downloader` class to download a data package
|
248
|
+
to your disk (by default data packages get stored in `./pack`).
|
249
|
+
|
250
|
+
``` ruby
|
251
|
+
dl = CsvPack::Downloader.new
|
252
|
+
dl.fetch( 'language-codes' )
|
253
|
+
dl.fetch( 's-and-p-500-companies' )
|
254
|
+
dl.fetch( 'un-locode')
|
255
|
+
```
|
256
|
+
|
257
|
+
Will result in:
|
258
|
+
|
259
|
+
```
|
260
|
+
-- pack
|
261
|
+
|-- language-codes
|
262
|
+
| |-- data
|
263
|
+
| | |-- language-codes-3b2.csv
|
264
|
+
| | |-- language-codes.csv
|
265
|
+
| | `-- language-codes-full.csv
|
266
|
+
| `-- datapackage.json
|
267
|
+
|-- s-and-p-500-companies
|
268
|
+
| |-- data
|
269
|
+
| | |-- constituents.csv
|
270
|
+
| | `-- constituents-financials.csv
|
271
|
+
| `-- datapackage.json
|
272
|
+
`-- un-locode
|
273
|
+
|-- data
|
274
|
+
| |-- code-list.csv
|
275
|
+
| |-- country-codes.csv
|
276
|
+
| |-- function-classifiers.csv
|
277
|
+
| |-- status-indicators.csv
|
278
|
+
| `-- subdivision-codes.csv
|
279
|
+
`-- datapackage.json
|
280
|
+
```
|
281
|
+
|
282
|
+
|
283
|
+
#### Q: How to add and import a data package ("by hand")?
|
284
|
+
|
285
|
+
Use the `CsvPack::Pack` class to read-in a data package
|
286
|
+
and add and import into an SQL database.
|
287
|
+
|
288
|
+
``` ruby
|
289
|
+
pack = CsvPack::Pack.new( './pack/un-locode/datapackage.json' )
|
290
|
+
pack.tables.each do |table|
|
291
|
+
table.up! # (auto-) add table using SQL create_table via ActiveRecord migration
|
292
|
+
table.import! # import all records using SQL inserts
|
293
|
+
end
|
294
|
+
```
|
295
|
+
|
296
|
+
|
297
|
+
#### Q: How to connect to a different SQL database?
|
298
|
+
|
299
|
+
You can connect to any database supported by ActiveRecord. If you do NOT
|
300
|
+
establish a connection in your script - the standard (default fallback)
|
301
|
+
is using an in-memory SQLite3 database.
|
302
|
+
|
303
|
+
##### SQLite
|
304
|
+
|
305
|
+
For example, to create an SQLite3 database on disk - lets say `mine.db` -
|
306
|
+
use in your script (before the `CsvPack.import` statement):
|
307
|
+
|
308
|
+
``` ruby
|
309
|
+
ActiveRecord::Base.establish_connection( adapter: 'sqlite3',
|
310
|
+
database: './mine.db' )
|
311
|
+
```
|
312
|
+
|
313
|
+
##### PostgreSQL
|
314
|
+
|
315
|
+
For example, to connect to a PostgreSQL database use in your script
|
316
|
+
(before the `CsvPack.import` statement):
|
317
|
+
|
318
|
+
``` ruby
|
319
|
+
require 'pg' ## pull-in PostgreSQL (pg) machinery
|
320
|
+
|
321
|
+
ActiveRecord::Base.establish_connection( adapter: 'postgresql'
|
322
|
+
username: 'ruby',
|
323
|
+
password: 'topsecret',
|
324
|
+
database: 'database' )
|
325
|
+
```
|
326
|
+
|
327
|
+
|
328
|
+
|
329
|
+
|
330
|
+
## Install
|
331
|
+
|
332
|
+
Just install the gem:
|
333
|
+
|
334
|
+
```
|
335
|
+
$ gem install csvpack
|
336
|
+
```
|
337
|
+
|
338
|
+
|
339
|
+
|
340
|
+
## Alternatives
|
341
|
+
|
342
|
+
See the "[Tools and Plugins for working with Data Packages](https://frictionlessdata.io/software)"
|
343
|
+
page at the Frictionless Data Initiative.
|
344
|
+
|
345
|
+
|
346
|
+
## License
|
347
|
+
|
348
|
+
|
349
|
+
The `csvpack` scripts are dedicated to the public domain.
|
350
|
+
Use it as you please with no restrictions whatsoever.
|
351
|
+
|
352
|
+
## Questions? Comments?
|
353
|
+
|
354
|
+
Send them along to the ruby-talk mailing list. Thanks!
|
data/Rakefile
ADDED
@@ -0,0 +1,32 @@
|
|
1
|
+
require 'hoe'
|
2
|
+
require './lib/csvpack/version.rb'
|
3
|
+
|
4
|
+
Hoe.spec 'csvpack' do
|
5
|
+
|
6
|
+
self.version = CsvPack::VERSION
|
7
|
+
|
8
|
+
self.summary = 'csvpack - work with tabular data packages using comma-separated values (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more'
|
9
|
+
self.description = summary
|
10
|
+
|
11
|
+
self.urls = ['https://github.com/csv11/csvpack']
|
12
|
+
|
13
|
+
self.author = 'Gerald Bauer'
|
14
|
+
self.email = 'ruby-talk@ruby-lang.org'
|
15
|
+
|
16
|
+
# switch extension to .markdown for gihub formatting
|
17
|
+
self.readme_file = 'README.md'
|
18
|
+
self.history_file = 'HISTORY.md'
|
19
|
+
|
20
|
+
self.extra_deps = [
|
21
|
+
['logutils', '>=0.6.1'],
|
22
|
+
['fetcher', '>=0.4.5'],
|
23
|
+
['activerecord', '>=5.0.0'],
|
24
|
+
]
|
25
|
+
|
26
|
+
self.licenses = ['Public Domain']
|
27
|
+
|
28
|
+
self.spec_extras = {
|
29
|
+
required_ruby_version: '>= 2.2.2'
|
30
|
+
}
|
31
|
+
|
32
|
+
end
|
data/lib/csvpack.rb
ADDED
@@ -0,0 +1,52 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
|
4
|
+
require 'pp'
|
5
|
+
require 'forwardable'
|
6
|
+
|
7
|
+
### csv
|
8
|
+
require 'csv'
|
9
|
+
require 'json'
|
10
|
+
require 'fileutils'
|
11
|
+
|
12
|
+
|
13
|
+
### downloader
|
14
|
+
require 'fetcher'
|
15
|
+
|
16
|
+
### activerecord w/ sqlite3
|
17
|
+
## require 'active_support/all' ## needed for String#binary? method
|
18
|
+
require 'active_record'
|
19
|
+
|
20
|
+
|
21
|
+
|
22
|
+
# our own code
|
23
|
+
|
24
|
+
require 'csvpack/version' ## let version always go first
|
25
|
+
require 'csvpack/pack'
|
26
|
+
require 'csvpack/downloader'
|
27
|
+
|
28
|
+
module CsvPack
|
29
|
+
|
30
|
+
def self.import( *args )
|
31
|
+
## step 1: download
|
32
|
+
dl = Downloader.new
|
33
|
+
args.each do |arg|
|
34
|
+
dl.fetch( arg )
|
35
|
+
end
|
36
|
+
|
37
|
+
## step 2: up 'n' import
|
38
|
+
args.each do |arg|
|
39
|
+
pack = Pack.new( "./pack/#{arg}/datapackage.json" )
|
40
|
+
pack.tables.each do |table|
|
41
|
+
table.up!
|
42
|
+
table.import!
|
43
|
+
end
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
end # module CsvPack
|
48
|
+
|
49
|
+
|
50
|
+
|
51
|
+
# say hello
|
52
|
+
puts CsvPack.banner if defined?($RUBYLIBS_DEBUG) && $RUBYLIBS_DEBUG
|
@@ -0,0 +1,62 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
module CsvPack
|
4
|
+
|
5
|
+
class Downloader
|
6
|
+
|
7
|
+
def initialize( cache_dir='./pack' )
|
8
|
+
@cache_dir = cache_dir # todo: check if folder exists now (or on demand)?
|
9
|
+
@worker = Fetcher::Worker.new
|
10
|
+
end
|
11
|
+
|
12
|
+
SHORTCUTS = {
|
13
|
+
## to be done
|
14
|
+
}
|
15
|
+
|
16
|
+
def fetch( name_or_shortcut_or_url ) ## todo/check: use (re)name to get/update/etc. why? why not??
|
17
|
+
|
18
|
+
name = name_or_shortcut_or_url
|
19
|
+
|
20
|
+
##
|
21
|
+
## e.g. try
|
22
|
+
## country-list
|
23
|
+
##
|
24
|
+
|
25
|
+
## url_base = "http://data.okfn.org/data/core/#{name}"
|
26
|
+
url_base = "https://datahub.io/core/#{name}"
|
27
|
+
url = "#{url_base}/datapackage.json"
|
28
|
+
|
29
|
+
dest_dir = "#{@cache_dir}/#{name}"
|
30
|
+
FileUtils.mkdir_p( dest_dir )
|
31
|
+
|
32
|
+
pack_path = "#{dest_dir}/datapackage.json"
|
33
|
+
@worker.copy( url, pack_path )
|
34
|
+
|
35
|
+
h = JSON.parse( File.read( pack_path ) )
|
36
|
+
pp h
|
37
|
+
|
38
|
+
## copy resources (tables)
|
39
|
+
h['resources'].each do |r|
|
40
|
+
puts "== resource:"
|
41
|
+
pp r
|
42
|
+
|
43
|
+
res_url = r['url']
|
44
|
+
|
45
|
+
res_name = r['name']
|
46
|
+
res_relative_path = r['path']
|
47
|
+
if res_relative_path.nil?
|
48
|
+
res_relative_path = "#{res_name}.csv"
|
49
|
+
end
|
50
|
+
|
51
|
+
res_path = "#{dest_dir}/#{res_relative_path}"
|
52
|
+
puts "[debug] res_path: >#{res_path}<"
|
53
|
+
res_dir = File.dirname( res_path )
|
54
|
+
FileUtils.mkdir_p( res_dir )
|
55
|
+
|
56
|
+
@worker.copy( res_url, res_path )
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
end # class Downloader
|
61
|
+
|
62
|
+
end # module CsvPack
|
data/lib/csvpack/pack.rb
ADDED
@@ -0,0 +1,246 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
|
4
|
+
## note: for now use in-memory sqlite3 db
|
5
|
+
|
6
|
+
module CsvPack
|
7
|
+
|
8
|
+
class Pack
|
9
|
+
## load (tabular) datapackage into memory
|
10
|
+
def initialize( path )
|
11
|
+
|
12
|
+
## convenience
|
13
|
+
## - check: if path is a folder/directory
|
14
|
+
## (auto-)add /datapackage.json
|
15
|
+
|
16
|
+
text = File.open( path, 'r:utf-8' ).read
|
17
|
+
@h = JSON.parse( text )
|
18
|
+
|
19
|
+
pack_dir = File.dirname(path)
|
20
|
+
|
21
|
+
## pp @h
|
22
|
+
|
23
|
+
## read in tables
|
24
|
+
@tables = []
|
25
|
+
@h['resources'].each do |r|
|
26
|
+
## build table data
|
27
|
+
@tables << build_tab( r, pack_dir )
|
28
|
+
end
|
29
|
+
|
30
|
+
## pp @tables
|
31
|
+
end
|
32
|
+
|
33
|
+
def name() @h['name']; end
|
34
|
+
def title() @h['title']; end
|
35
|
+
def license() @h['license']; end
|
36
|
+
|
37
|
+
def tables() @tables; end
|
38
|
+
## convenience method - return first table
|
39
|
+
def table() @tables[0]; end
|
40
|
+
|
41
|
+
def build_tab( h, pack_dir )
|
42
|
+
name = h['name']
|
43
|
+
relative_path = h['path']
|
44
|
+
|
45
|
+
if relative_path.nil?
|
46
|
+
relative_path = "#{name}.csv"
|
47
|
+
puts " warn: no path defined; using fallback '#{relative_path}'"
|
48
|
+
end
|
49
|
+
|
50
|
+
puts " reading resource (table) #{name} (#{relative_path})..."
|
51
|
+
pp h
|
52
|
+
|
53
|
+
path = "#{pack_dir}/#{relative_path}"
|
54
|
+
text = File.open( path, 'r:utf-8' ).read
|
55
|
+
tab = Tab.new( h, text )
|
56
|
+
tab
|
57
|
+
end
|
58
|
+
end # class Pack
|
59
|
+
|
60
|
+
|
61
|
+
class Tab
|
62
|
+
extend Forwardable
|
63
|
+
|
64
|
+
def initialize( h, text )
|
65
|
+
@h = h
|
66
|
+
|
67
|
+
## todo parse csv
|
68
|
+
## note: use header options (first row MUST include headers)
|
69
|
+
@data = CSV.parse( text, headers: true )
|
70
|
+
|
71
|
+
pp @data[0]
|
72
|
+
end
|
73
|
+
|
74
|
+
def name() @h['name']; end
|
75
|
+
def_delegators :@data, :[], :each
|
76
|
+
|
77
|
+
def pretty_print( printer )
|
78
|
+
printer.text "Tab<#{object_id} @data.name=#{name}, @data.size=#{@data.size}>"
|
79
|
+
end
|
80
|
+
|
81
|
+
|
82
|
+
def up!
|
83
|
+
# run Migration#up to create table
|
84
|
+
connect!
|
85
|
+
con = ActiveRecord::Base.connection
|
86
|
+
|
87
|
+
con.create_table sanitize_name( name ) do |t|
|
88
|
+
@h['schema']['fields'].each do |f|
|
89
|
+
column_name = sanitize_name(f['name'])
|
90
|
+
column_type = DATA_TYPES[f['type']]
|
91
|
+
|
92
|
+
puts " #{column_type} :#{column_name} => #{f['type']} - #{f['name']}"
|
93
|
+
|
94
|
+
t.send( column_type.to_sym, column_name.to_sym ) ## todo/check: to_sym needed?
|
95
|
+
end
|
96
|
+
t.string :name
|
97
|
+
end
|
98
|
+
end
|
99
|
+
|
100
|
+
def import!
|
101
|
+
connect!
|
102
|
+
con = ActiveRecord::Base.connection
|
103
|
+
|
104
|
+
column_names = []
|
105
|
+
column_types = []
|
106
|
+
column_placeholders = []
|
107
|
+
@h['schema']['fields'].each do |f|
|
108
|
+
column_names << sanitize_name(f['name'])
|
109
|
+
column_types << DATA_TYPES[f['type']]
|
110
|
+
column_placeholders << '?'
|
111
|
+
end
|
112
|
+
|
113
|
+
sql_insert_into = "INSERT INTO #{sanitize_name(name)} (#{column_names.join(',')}) VALUES "
|
114
|
+
puts sql_insert_into
|
115
|
+
|
116
|
+
i=0
|
117
|
+
@data.each do |row|
|
118
|
+
i+=1
|
119
|
+
## next if i > 3 ## for testing; only insert a couple of recs
|
120
|
+
|
121
|
+
## todo: check if all string is ok; or number/date/etc. conversion needed/required?
|
122
|
+
values = []
|
123
|
+
row.fields.each_with_index do |value,index| # get array of values
|
124
|
+
type = column_types[index]
|
125
|
+
## todo add boolean ??
|
126
|
+
if value.blank?
|
127
|
+
values << 'NULL'
|
128
|
+
elsif [:number,:float,:integer].include?( type )
|
129
|
+
values << value ## do NOT wrap in quotes (numeric)
|
130
|
+
else
|
131
|
+
esc_value = value.gsub( "'", "''" ) ## escape quotes e.g. ' becomse \'\', that is, double quotes
|
132
|
+
values << "'#{esc_value}'" ## wrap in quotes
|
133
|
+
end
|
134
|
+
end
|
135
|
+
pp values
|
136
|
+
|
137
|
+
sql = "#{sql_insert_into} (#{values.join(',')})"
|
138
|
+
puts sql
|
139
|
+
con.execute( sql )
|
140
|
+
end
|
141
|
+
end # method import!
|
142
|
+
|
143
|
+
|
144
|
+
def import_v1!
|
145
|
+
### note: import via sql for (do NOT use ActiveRecord record class for now)
|
146
|
+
con = ActiveRecord::Base.connection
|
147
|
+
|
148
|
+
column_names = []
|
149
|
+
column_types = []
|
150
|
+
column_placeholders = []
|
151
|
+
@h['schema']['fields'].each do |f|
|
152
|
+
column_names << sanitize_name(f['name'])
|
153
|
+
column_types << DATA_TYPES[f['type']]
|
154
|
+
column_placeholders << '?'
|
155
|
+
end
|
156
|
+
|
157
|
+
sql = "INSERT INTO #{sanitize_name(name)} (#{column_names.join(',')}) VALUES (#{column_placeholders.join(',')})"
|
158
|
+
puts sql
|
159
|
+
|
160
|
+
i=0
|
161
|
+
@data.each do |row|
|
162
|
+
i+=1
|
163
|
+
next if i > 3 ## for testing; only insert a couple of recs
|
164
|
+
|
165
|
+
## todo: check if all string is ok; or number/date/etc. conversion needed/required?
|
166
|
+
params = row.fields # get array of values
|
167
|
+
pp params
|
168
|
+
con.exec_insert( sql, 'SQL', params ) # todo/check: 2nd param name used for logging only??
|
169
|
+
end
|
170
|
+
end # method import!
|
171
|
+
|
172
|
+
|
173
|
+
### note:
|
174
|
+
## activerecord supports:
|
175
|
+
## :string, :text, :integer, :float, :decimal, :datetime, :time, :date, :binary, :boolean
|
176
|
+
|
177
|
+
### mappings for data types
|
178
|
+
## from tabular data package to ActiveRecord migrations
|
179
|
+
##
|
180
|
+
# see http://dataprotocols.org/json-table-schema/ (section Field Types and Formats)
|
181
|
+
#
|
182
|
+
# for now supports these types
|
183
|
+
|
184
|
+
DATA_TYPES = {
|
185
|
+
'string' => :string, ## use text for larger strings ???
|
186
|
+
'number' => :float, ## note: use float for now
|
187
|
+
'integer' => :integer,
|
188
|
+
'boolean' => :boolean,
|
189
|
+
'datetime' => :datetime,
|
190
|
+
'date' => :date,
|
191
|
+
'time' => :time,
|
192
|
+
}
|
193
|
+
|
194
|
+
def dump_schema
|
195
|
+
## try to dump schema (fields)
|
196
|
+
puts "*** dump schema:"
|
197
|
+
|
198
|
+
@h['schema']['fields'].each do |f|
|
199
|
+
puts " #{f['name']} ( #{sanitize_name(f['name'])} ) : #{f['type']}} ( #{DATA_TYPES[f['type']]} )"
|
200
|
+
end
|
201
|
+
|
202
|
+
end
|
203
|
+
|
204
|
+
|
205
|
+
def sanitize_name( ident )
|
206
|
+
##
|
207
|
+
## if identifier starts w/ number add leading underscore (_)
|
208
|
+
## e.g. 52 Week Price => becomes _52_week_price
|
209
|
+
|
210
|
+
ident = ident.strip.downcase
|
211
|
+
ident = ident.gsub( /[\.\-\/]/, '_' ) ## convert some special chars to underscore (e.g. dash -)
|
212
|
+
ident = ident.gsub( ' ', '_' )
|
213
|
+
ident = ident.gsub( /[^a-z0-9_]/, '' )
|
214
|
+
ident = "_#{ident}" if ident =~ /^[0-9]/
|
215
|
+
ident
|
216
|
+
end
|
217
|
+
|
218
|
+
|
219
|
+
def ar_clazz
|
220
|
+
@ar_clazz ||= begin
|
221
|
+
clazz = Class.new( ActiveRecord::Base ) do
|
222
|
+
## nothing here for now
|
223
|
+
end
|
224
|
+
puts "set table_name to #{sanitize_name( name )}"
|
225
|
+
clazz.table_name = sanitize_name( name )
|
226
|
+
clazz
|
227
|
+
end
|
228
|
+
@ar_clazz
|
229
|
+
end
|
230
|
+
|
231
|
+
private
|
232
|
+
|
233
|
+
## helper to get connection; if not connection established use defaults
|
234
|
+
def connect!
|
235
|
+
## todo: cache returned con - why, why not ??
|
236
|
+
unless ActiveRecord::Base.connected?
|
237
|
+
puts "note: no database connection established; using defaults (e.g. in-memory SQLite database)"
|
238
|
+
ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ':memory:' )
|
239
|
+
ActiveRecord::Base.logger = Logger.new( STDOUT )
|
240
|
+
end
|
241
|
+
ActiveRecord::Base.connection
|
242
|
+
end
|
243
|
+
|
244
|
+
end # class Tab
|
245
|
+
|
246
|
+
end # module CsvPack
|
@@ -0,0 +1,22 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
module CsvPack
|
4
|
+
|
5
|
+
MAJOR = 0 ## todo: namespace inside version or something - why? why not??
|
6
|
+
MINOR = 1
|
7
|
+
PATCH = 0
|
8
|
+
VERSION = [MAJOR,MINOR,PATCH].join('.')
|
9
|
+
|
10
|
+
def self.version
|
11
|
+
VERSION
|
12
|
+
end
|
13
|
+
|
14
|
+
def self.banner
|
15
|
+
"csvpack/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}]"
|
16
|
+
end
|
17
|
+
|
18
|
+
def self.root
|
19
|
+
File.expand_path( File.dirname(File.dirname(File.dirname(__FILE__))) )
|
20
|
+
end
|
21
|
+
|
22
|
+
end # module CsvPack
|
data/test/helper.rb
ADDED
@@ -0,0 +1,61 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
###
|
4
|
+
# to run use
|
5
|
+
# ruby -I ./lib -I ./test test/test_companies.rb
|
6
|
+
|
7
|
+
|
8
|
+
require 'helper'
|
9
|
+
|
10
|
+
class TestCompanies < MiniTest::Test
|
11
|
+
|
12
|
+
def test_s_and_p_500_companies
|
13
|
+
|
14
|
+
pak = Datapak::Pak.new( './pak/s-and-p-500-companies/datapackage.json' )
|
15
|
+
|
16
|
+
puts "name: #{pak.name}"
|
17
|
+
puts "title: #{pak.title}"
|
18
|
+
puts "license: #{pak.license}"
|
19
|
+
|
20
|
+
pp pak.tables
|
21
|
+
pp pak.table[0]['Symbol']
|
22
|
+
pp pak.table[495]['Symbol']
|
23
|
+
|
24
|
+
## pak.table.each do |row|
|
25
|
+
## pp row
|
26
|
+
## end
|
27
|
+
|
28
|
+
puts pak.tables[0].dump_schema
|
29
|
+
puts pak.tables[1].dump_schema
|
30
|
+
|
31
|
+
# database setup 'n' config
|
32
|
+
ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ':memory:' )
|
33
|
+
ActiveRecord::Base.logger = Logger.new( STDOUT )
|
34
|
+
|
35
|
+
pak.table.up!
|
36
|
+
pak.table.import!
|
37
|
+
|
38
|
+
pak.tables[1].up!
|
39
|
+
pak.tables[1].import!
|
40
|
+
|
41
|
+
|
42
|
+
pp pak.table.ar_clazz
|
43
|
+
|
44
|
+
|
45
|
+
company = pak.table.ar_clazz
|
46
|
+
|
47
|
+
puts "Company.count: #{company.count}"
|
48
|
+
pp company.first
|
49
|
+
pp company.find_by!( symbol: 'MMM' )
|
50
|
+
pp company.find_by!( name: '3M Co' )
|
51
|
+
pp company.where( sector: 'Industrials' ).count
|
52
|
+
pp company.where( sector: 'Industrials' ).all
|
53
|
+
|
54
|
+
|
55
|
+
### todo: try a join w/ belongs_to ??
|
56
|
+
|
57
|
+
assert true # if we get here - test success
|
58
|
+
end
|
59
|
+
|
60
|
+
end # class TestCompanies
|
61
|
+
|
@@ -0,0 +1,40 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
###
|
4
|
+
# to run use
|
5
|
+
# ruby -I ./lib -I ./test test/test_countries.rb
|
6
|
+
|
7
|
+
|
8
|
+
require 'helper'
|
9
|
+
|
10
|
+
class TestCountries < MiniTest::Test
|
11
|
+
|
12
|
+
def test_country_list
|
13
|
+
pak = Datapak::Pak.new( './pak/country-list/datapackage.json' )
|
14
|
+
|
15
|
+
puts "name: #{pak.name}"
|
16
|
+
puts "title: #{pak.title}"
|
17
|
+
puts "license: #{pak.license}"
|
18
|
+
|
19
|
+
pp pak.tables
|
20
|
+
|
21
|
+
## pak.table.each do |row|
|
22
|
+
## pp row
|
23
|
+
## end
|
24
|
+
|
25
|
+
puts pak.table.dump_schema
|
26
|
+
|
27
|
+
# database setup 'n' config
|
28
|
+
ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ':memory:' )
|
29
|
+
ActiveRecord::Base.logger = Logger.new( STDOUT )
|
30
|
+
|
31
|
+
pak.table.up!
|
32
|
+
pak.table.import!
|
33
|
+
|
34
|
+
pp pak.table.ar_clazz
|
35
|
+
|
36
|
+
assert true # if we get here - test success
|
37
|
+
end
|
38
|
+
|
39
|
+
end # class TestCountries
|
40
|
+
|
@@ -0,0 +1,32 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
###
|
4
|
+
# to run use
|
5
|
+
# ruby -I ./lib -I ./test test/test_downloader.rb
|
6
|
+
|
7
|
+
|
8
|
+
require 'helper'
|
9
|
+
|
10
|
+
class TestDownloader < MiniTest::Test
|
11
|
+
|
12
|
+
def test_download
|
13
|
+
|
14
|
+
names = [
|
15
|
+
'country-list',
|
16
|
+
'country-codes',
|
17
|
+
'language-codes',
|
18
|
+
'cpi', ## Annual Consumer Price Index (CPI)
|
19
|
+
'gdp', ## Country, Regional and World GDP (Gross Domestic Product)
|
20
|
+
's-and-p-500-companies', ## S&P 500 Companies with Financial Information
|
21
|
+
'un-locode', ## UN-LOCODE Codelist - note: incl. country-codes.csv
|
22
|
+
]
|
23
|
+
|
24
|
+
dl = Datapak::Downloader.new
|
25
|
+
names.each do |name|
|
26
|
+
dl.fetch( name )
|
27
|
+
end
|
28
|
+
|
29
|
+
assert true # if we get here - test success
|
30
|
+
end
|
31
|
+
|
32
|
+
end # class TestDownloader
|
data/test/test_import.rb
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
###
|
4
|
+
# to run use
|
5
|
+
# ruby -I ./lib -I ./test test/test_import.rb
|
6
|
+
|
7
|
+
|
8
|
+
require 'helper'
|
9
|
+
|
10
|
+
class TestImport < MiniTest::Test
|
11
|
+
|
12
|
+
def test_import
|
13
|
+
|
14
|
+
CsvPack.import(
|
15
|
+
'cpi', ## Annual Consumer Price Index (CPI)
|
16
|
+
'gdp', ## Country, Regional and World GDP (Gross Domestic Product)
|
17
|
+
)
|
18
|
+
|
19
|
+
assert true # if we get here - test success
|
20
|
+
end
|
21
|
+
|
22
|
+
end # class TestImport
|
metadata
ADDED
@@ -0,0 +1,137 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: csvpack
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Gerald Bauer
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2018-08-07 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: logutils
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - ">="
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: 0.6.1
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - ">="
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: 0.6.1
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: fetcher
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: 0.4.5
|
34
|
+
type: :runtime
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: 0.4.5
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: activerecord
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - ">="
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: 5.0.0
|
48
|
+
type: :runtime
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: 5.0.0
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: rdoc
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - "~>"
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '4.0'
|
62
|
+
type: :development
|
63
|
+
prerelease: false
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
65
|
+
requirements:
|
66
|
+
- - "~>"
|
67
|
+
- !ruby/object:Gem::Version
|
68
|
+
version: '4.0'
|
69
|
+
- !ruby/object:Gem::Dependency
|
70
|
+
name: hoe
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
72
|
+
requirements:
|
73
|
+
- - "~>"
|
74
|
+
- !ruby/object:Gem::Version
|
75
|
+
version: '3.16'
|
76
|
+
type: :development
|
77
|
+
prerelease: false
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - "~>"
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '3.16'
|
83
|
+
description: csvpack - work with tabular data packages using comma-separated values
|
84
|
+
(CSV) datafiles in text with datapackage.json; download, read into and query comma-separated
|
85
|
+
values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of
|
86
|
+
choice and much more
|
87
|
+
email: ruby-talk@ruby-lang.org
|
88
|
+
executables: []
|
89
|
+
extensions: []
|
90
|
+
extra_rdoc_files:
|
91
|
+
- HISTORY.md
|
92
|
+
- Manifest.txt
|
93
|
+
- README.md
|
94
|
+
files:
|
95
|
+
- HISTORY.md
|
96
|
+
- Manifest.txt
|
97
|
+
- README.md
|
98
|
+
- Rakefile
|
99
|
+
- lib/csvpack.rb
|
100
|
+
- lib/csvpack/downloader.rb
|
101
|
+
- lib/csvpack/pack.rb
|
102
|
+
- lib/csvpack/version.rb
|
103
|
+
- test/helper.rb
|
104
|
+
- test/test_companies.rb
|
105
|
+
- test/test_countries.rb
|
106
|
+
- test/test_downloader.rb
|
107
|
+
- test/test_import.rb
|
108
|
+
homepage: https://github.com/csv11/csvpack
|
109
|
+
licenses:
|
110
|
+
- Public Domain
|
111
|
+
metadata: {}
|
112
|
+
post_install_message:
|
113
|
+
rdoc_options:
|
114
|
+
- "--main"
|
115
|
+
- README.md
|
116
|
+
require_paths:
|
117
|
+
- lib
|
118
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
119
|
+
requirements:
|
120
|
+
- - ">="
|
121
|
+
- !ruby/object:Gem::Version
|
122
|
+
version: 2.2.2
|
123
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
124
|
+
requirements:
|
125
|
+
- - ">="
|
126
|
+
- !ruby/object:Gem::Version
|
127
|
+
version: '0'
|
128
|
+
requirements: []
|
129
|
+
rubyforge_project:
|
130
|
+
rubygems_version: 2.5.2
|
131
|
+
signing_key:
|
132
|
+
specification_version: 4
|
133
|
+
summary: csvpack - work with tabular data packages using comma-separated values (CSV)
|
134
|
+
datafiles in text with datapackage.json; download, read into and query comma-separated
|
135
|
+
values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of
|
136
|
+
choice and much more
|
137
|
+
test_files: []
|