csvpack 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/HISTORY.md +4 -4
- data/README.md +354 -354
- data/Rakefile +32 -32
- data/lib/csvpack.rb +52 -52
- data/lib/csvpack/downloader.rb +72 -62
- data/lib/csvpack/pack.rb +47 -7
- data/lib/csvpack/version.rb +22 -22
- data/test/helper.rb +7 -7
- data/test/test_companies.rb +62 -61
- data/test/test_countries.rb +41 -40
- data/test/test_downloader.rb +32 -32
- data/test/test_import.rb +22 -22
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0615ac2810a14ad606d281a410dc82954a88404d
|
4
|
+
data.tar.gz: fd48dba7204cb843f5f30937efc8529b3b9c33c4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c39d996ef1d6ca86c8ce70285a5259dbf5f255d4e0cc3bed7232d7de8ec0e8fb4e87e78b0d6ac31dfcf1bda8f1862f207227dd6f25c29f2ac96ddd2a44a445de
|
7
|
+
data.tar.gz: a66e0742a75d0aabf1a41d2ce54c9ef8c7780afde641aa244d1ab37016a11afa32db26c3f6708589d71788854311619565b6c4ffa49a5d51b0a45826fde1947d
|
data/HISTORY.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
### 0.0.1 / 2015-04-23
|
2
|
-
|
3
|
-
* Everything is new. First release
|
4
|
-
|
1
|
+
### 0.0.1 / 2015-04-23
|
2
|
+
|
3
|
+
* Everything is new. First release
|
4
|
+
|
data/README.md
CHANGED
@@ -1,354 +1,354 @@
|
|
1
|
-
# csvpack
|
2
|
-
|
3
|
-
work with tabular data packages using comma-separated values (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more
|
4
|
-
|
5
|
-
|
6
|
-
* home :: [github.com/csv11/csvpack](https://github.com/csv11/csvpack)
|
7
|
-
* bugs :: [github.com/csv11/csvpack/issues](https://github.com/csv11/csvpack/issues)
|
8
|
-
* gem :: [rubygems.org/gems/csvpack](https://rubygems.org/gems/csvpack)
|
9
|
-
* rdoc :: [rubydoc.info/gems/csvpack](http://rubydoc.info/gems/csvpack)
|
10
|
-
* forum :: [ruby-talk@ruby-lang.org](http://www.ruby-lang.org/en/community/mailing-lists/)
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
## Usage
|
15
|
-
|
16
|
-
|
17
|
-
### What's a tabular data package?
|
18
|
-
|
19
|
-
> Tabular Data Package is a simple structure for publishing and sharing
|
20
|
-
> tabular data with the following key features:
|
21
|
-
>
|
22
|
-
> - Data is stored in CSV (comma separated values) files
|
23
|
-
> - Metadata about the dataset both general (e.g. title, author)
|
24
|
-
> and the specific data files (e.g. schema) is stored in a single JSON file
|
25
|
-
> named `datapackage.json` which follows the Data Package format
|
26
|
-
|
27
|
-
(Source: [Tabular Data Packages, Frictionless Data Initiative • Data Hub.io • Open Knowledge Foundation • Data Protocols.org](https://datahub.io/docs/data-packages/tabular))
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
Here's a minimal example of a tabular data package holding two files, that is, `data.csv` and `datapackage.json`:
|
32
|
-
|
33
|
-
`data.csv`:
|
34
|
-
|
35
|
-
```
|
36
|
-
Brewery,City,Name,Abv
|
37
|
-
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
|
38
|
-
Augustiner Bräu München,München,Edelstoff,5.6%
|
39
|
-
Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
|
40
|
-
Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
|
41
|
-
Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
|
42
|
-
Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
|
43
|
-
...
|
44
|
-
```
|
45
|
-
|
46
|
-
`datapackage.json`:
|
47
|
-
|
48
|
-
``` json
|
49
|
-
{
|
50
|
-
"name": "beer",
|
51
|
-
"resources": [
|
52
|
-
{
|
53
|
-
"path": "data.csv",
|
54
|
-
"schema": {
|
55
|
-
"fields": [{ "name": "Brewery", "type": "string" },
|
56
|
-
{ "name": "City", "type": "string" },
|
57
|
-
{ "name": "Name", "type": "string" },
|
58
|
-
{ "name": "Abv", "type": "number" }]
|
59
|
-
}
|
60
|
-
}
|
61
|
-
]
|
62
|
-
}
|
63
|
-
```
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
### Where to find data packages?
|
68
|
-
|
69
|
-
For some real world examples see the [Data Packages Listing](https://datahub.io/core) ([Sources](https://github.com/datasets)) at the Data Hub.io • Frictionless Data Initiative
|
70
|
-
website for a start. Tabular data packages include:
|
71
|
-
|
72
|
-
Name | Comments
|
73
|
-
------------------------ | -------------
|
74
|
-
`country-codes` | Comprehensive country codes: ISO 3166, ITU, ISO 4217 currency codes and many more
|
75
|
-
`language-codes` | ISO Language Codes (639-1 and 693-2)
|
76
|
-
`currency-codes` | ISO 4217 Currency Codes
|
77
|
-
`gdb` | Country, Regional and World GDP (Gross Domestic Product)
|
78
|
-
`s-and-p-500-companies` | S&P 500 Companies with Financial Information
|
79
|
-
`un-locode` | UN-LOCODE Codelist
|
80
|
-
`gold-prices` | Gold Prices (Monthly in USD)
|
81
|
-
`bond-yields-uk-10y` | 10 Year UK Government Bond Yields (Long-Term Interest Rate)
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
and many more
|
86
|
-
|
87
|
-
|
88
|
-
### Code, Code, Code - Script Your Data Workflow with Ruby
|
89
|
-
|
90
|
-
|
91
|
-
``` ruby
|
92
|
-
require 'csvpack'
|
93
|
-
|
94
|
-
CsvPack.import(
|
95
|
-
's-and-p-500-companies',
|
96
|
-
'gdb'
|
97
|
-
)
|
98
|
-
```
|
99
|
-
|
100
|
-
Using `CsvPack.import` will:
|
101
|
-
|
102
|
-
1) download all data packages to the `./pack` folder
|
103
|
-
|
104
|
-
2) (auto-)add all tables to an in-memory SQLite database using SQL `create_table`
|
105
|
-
commands via `ActiveRecord` migrations e.g.
|
106
|
-
|
107
|
-
|
108
|
-
``` ruby
|
109
|
-
create_table :constituents_financials do |t|
|
110
|
-
t.string :symbol # Symbol (string)
|
111
|
-
t.string :name # Name (string)
|
112
|
-
t.string :sector # Sector (string)
|
113
|
-
t.float :price # Price (number)
|
114
|
-
t.float :dividend_yield # Dividend Yield (number)
|
115
|
-
t.float :price_earnings # Price/Earnings (number)
|
116
|
-
t.float :earnings_share # Earnings/Share (number)
|
117
|
-
t.float :book_value # Book Value (number)
|
118
|
-
t.float :_52_week_low # 52 week low (number)
|
119
|
-
t.float :_52_week_high # 52 week high (number)
|
120
|
-
t.float :market_cap # Market Cap (number)
|
121
|
-
t.float :ebitda # EBITDA (number)
|
122
|
-
t.float :price_sales # Price/Sales (number)
|
123
|
-
t.float :price_book # Price/Book (number)
|
124
|
-
t.string :sec_filings # SEC Filings (string)
|
125
|
-
end
|
126
|
-
```
|
127
|
-
|
128
|
-
3) (auto-)import all datasets using SQL inserts e.g.
|
129
|
-
|
130
|
-
``` sql
|
131
|
-
INSERT INTO constituents_financials
|
132
|
-
(symbol,
|
133
|
-
name,
|
134
|
-
sector,
|
135
|
-
price,
|
136
|
-
dividend_yield,
|
137
|
-
price_earnings,
|
138
|
-
earnings_share,
|
139
|
-
book_value,
|
140
|
-
_52_week_low,
|
141
|
-
_52_week_high,
|
142
|
-
market_cap,
|
143
|
-
ebitda,
|
144
|
-
price_sales,
|
145
|
-
price_book,
|
146
|
-
sec_filings)
|
147
|
-
VALUES
|
148
|
-
('MMM',
|
149
|
-
'3M
|
150
|
-
'Industrials',
|
151
|
-
162.27,
|
152
|
-
2.11,
|
153
|
-
22.28,
|
154
|
-
7.284,
|
155
|
-
25.238,
|
156
|
-
123.61,
|
157
|
-
162.92,
|
158
|
-
104.0,
|
159
|
-
8.467,
|
160
|
-
3.28,
|
161
|
-
6.43,
|
162
|
-
'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=MMM')
|
163
|
-
```
|
164
|
-
|
165
|
-
4) (auto-)add ActiveRecord models for all tables.
|
166
|
-
|
167
|
-
|
168
|
-
So what? Now you can use all the "magic" of ActiveRecord to query
|
169
|
-
the datasets. Example:
|
170
|
-
|
171
|
-
``` ruby
|
172
|
-
pp Constituent.count
|
173
|
-
|
174
|
-
# SELECT COUNT(*) FROM "constituents"
|
175
|
-
# => 496
|
176
|
-
|
177
|
-
|
178
|
-
pp Constituent.first
|
179
|
-
|
180
|
-
# SELECT "constituents".* FROM "constituents" ORDER BY "constituents"."id" ASC LIMIT 1
|
181
|
-
# => #<Constituent:0x9f8cb78
|
182
|
-
# id: 1,
|
183
|
-
# symbol: "MMM",
|
184
|
-
# name: "3M
|
185
|
-
# sector: "Industrials">
|
186
|
-
|
187
|
-
|
188
|
-
pp Constituent.find_by!( symbol: 'MMM' )
|
189
|
-
|
190
|
-
# SELECT "constituents".*
|
191
|
-
# FROM "constituents"
|
192
|
-
# WHERE "constituents"."symbol" = "MMM"
|
193
|
-
# LIMIT 1
|
194
|
-
# => #<Constituent:0x9f8cb78
|
195
|
-
# id: 1,
|
196
|
-
# symbol: "MMM",
|
197
|
-
# name: "3M
|
198
|
-
# sector: "Industrials">
|
199
|
-
|
200
|
-
|
201
|
-
pp Constituent.find_by!( name: '3M
|
202
|
-
|
203
|
-
# SELECT "constituents".*
|
204
|
-
# FROM "constituents"
|
205
|
-
# WHERE "constituents"."name" = "3M
|
206
|
-
# LIMIT 1
|
207
|
-
# => #<Constituent:0x9f8cb78
|
208
|
-
# id: 1,
|
209
|
-
# symbol: "MMM",
|
210
|
-
# name: "3M
|
211
|
-
# sector: "Industrials">
|
212
|
-
|
213
|
-
|
214
|
-
pp Constituent.where( sector: 'Industrials' ).count
|
215
|
-
|
216
|
-
# SELECT COUNT(*) FROM "constituents"
|
217
|
-
# WHERE "constituents"."sector" = "Industrials"
|
218
|
-
# => 63
|
219
|
-
|
220
|
-
|
221
|
-
pp Constituent.where( sector: 'Industrials' ).all
|
222
|
-
|
223
|
-
# SELECT "constituents".*
|
224
|
-
# FROM "constituents"
|
225
|
-
# WHERE "constituents"."sector" = "Industrials"
|
226
|
-
# => [#<Constituent:0x9f8cb78
|
227
|
-
# id: 1,
|
228
|
-
# symbol: "MMM",
|
229
|
-
# name: "3M
|
230
|
-
# sector: "Industrials">,
|
231
|
-
# #<Constituent:0xa2a4180
|
232
|
-
# id: 8,
|
233
|
-
# symbol: "ADT",
|
234
|
-
# name: "ADT Corp (The)",
|
235
|
-
# sector: "Industrials">,...]
|
236
|
-
```
|
237
|
-
|
238
|
-
and so on
|
239
|
-
|
240
|
-
|
241
|
-
|
242
|
-
### Frequently Asked Questions (F.A.Qs) and Answers
|
243
|
-
|
244
|
-
|
245
|
-
#### Q: How to dowload a data package ("by hand")?
|
246
|
-
|
247
|
-
Use the `CsvPack::Downloader` class to download a data package
|
248
|
-
to your disk (by default data packages get stored in `./pack`).
|
249
|
-
|
250
|
-
``` ruby
|
251
|
-
dl = CsvPack::Downloader.new
|
252
|
-
dl.fetch( 'language-codes' )
|
253
|
-
dl.fetch( 's-and-p-500-companies' )
|
254
|
-
dl.fetch( 'un-locode')
|
255
|
-
```
|
256
|
-
|
257
|
-
Will result in:
|
258
|
-
|
259
|
-
```
|
260
|
-
-- pack
|
261
|
-
|-- language-codes
|
262
|
-
| |-- data
|
263
|
-
| | |-- language-
|
264
|
-
| | |-- language-codes.csv
|
265
|
-
| |
|
266
|
-
| `--
|
267
|
-
|
268
|
-
|
269
|
-
|
|
270
|
-
| | `-- constituents
|
271
|
-
| `-- datapackage.json
|
272
|
-
`-- un-locode
|
273
|
-
|-- data
|
274
|
-
| |-- code-list.csv
|
275
|
-
| |-- country-codes.csv
|
276
|
-
| |-- function-classifiers.csv
|
277
|
-
| |-- status-indicators.csv
|
278
|
-
| `-- subdivision-codes.csv
|
279
|
-
`-- datapackage.json
|
280
|
-
```
|
281
|
-
|
282
|
-
|
283
|
-
#### Q: How to add and import a data package ("by hand")?
|
284
|
-
|
285
|
-
Use the `CsvPack::Pack` class to read-in a data package
|
286
|
-
and add and import into an SQL database.
|
287
|
-
|
288
|
-
``` ruby
|
289
|
-
pack = CsvPack::Pack.new( './pack/un-locode/datapackage.json' )
|
290
|
-
pack.tables.each do |table|
|
291
|
-
table.up! # (auto-) add table using SQL create_table via ActiveRecord migration
|
292
|
-
table.import! # import all records using SQL inserts
|
293
|
-
end
|
294
|
-
```
|
295
|
-
|
296
|
-
|
297
|
-
#### Q: How to connect to a different SQL database?
|
298
|
-
|
299
|
-
You can connect to any database supported by ActiveRecord. If you do NOT
|
300
|
-
establish a connection in your script - the standard (default fallback)
|
301
|
-
is using an in-memory SQLite3 database.
|
302
|
-
|
303
|
-
##### SQLite
|
304
|
-
|
305
|
-
For example, to create an SQLite3 database on disk - lets say `mine.db` -
|
306
|
-
use in your script (before the `CsvPack.import` statement):
|
307
|
-
|
308
|
-
``` ruby
|
309
|
-
ActiveRecord::Base.establish_connection( adapter: 'sqlite3',
|
310
|
-
database: './mine.db' )
|
311
|
-
```
|
312
|
-
|
313
|
-
##### PostgreSQL
|
314
|
-
|
315
|
-
For example, to connect to a PostgreSQL database use in your script
|
316
|
-
(before the `CsvPack.import` statement):
|
317
|
-
|
318
|
-
``` ruby
|
319
|
-
require 'pg' ## pull-in PostgreSQL (pg) machinery
|
320
|
-
|
321
|
-
ActiveRecord::Base.establish_connection( adapter: 'postgresql'
|
322
|
-
username: 'ruby',
|
323
|
-
password: 'topsecret',
|
324
|
-
database: 'database' )
|
325
|
-
```
|
326
|
-
|
327
|
-
|
328
|
-
|
329
|
-
|
330
|
-
## Install
|
331
|
-
|
332
|
-
Just install the gem:
|
333
|
-
|
334
|
-
```
|
335
|
-
$ gem install csvpack
|
336
|
-
```
|
337
|
-
|
338
|
-
|
339
|
-
|
340
|
-
## Alternatives
|
341
|
-
|
342
|
-
See the "[Tools and Plugins for working with Data Packages](https://frictionlessdata.io/software)"
|
343
|
-
page at the Frictionless Data Initiative.
|
344
|
-
|
345
|
-
|
346
|
-
## License
|
347
|
-
|
348
|
-
|
349
|
-
The `csvpack` scripts are dedicated to the public domain.
|
350
|
-
Use it as you please with no restrictions whatsoever.
|
351
|
-
|
352
|
-
## Questions? Comments?
|
353
|
-
|
354
|
-
Send them along to the ruby-talk mailing list. Thanks!
|
1
|
+
# csvpack
|
2
|
+
|
3
|
+
work with tabular data packages using comma-separated values (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more
|
4
|
+
|
5
|
+
|
6
|
+
* home :: [github.com/csv11/csvpack](https://github.com/csv11/csvpack)
|
7
|
+
* bugs :: [github.com/csv11/csvpack/issues](https://github.com/csv11/csvpack/issues)
|
8
|
+
* gem :: [rubygems.org/gems/csvpack](https://rubygems.org/gems/csvpack)
|
9
|
+
* rdoc :: [rubydoc.info/gems/csvpack](http://rubydoc.info/gems/csvpack)
|
10
|
+
* forum :: [ruby-talk@ruby-lang.org](http://www.ruby-lang.org/en/community/mailing-lists/)
|
11
|
+
|
12
|
+
|
13
|
+
|
14
|
+
## Usage
|
15
|
+
|
16
|
+
|
17
|
+
### What's a tabular data package?
|
18
|
+
|
19
|
+
> Tabular Data Package is a simple structure for publishing and sharing
|
20
|
+
> tabular data with the following key features:
|
21
|
+
>
|
22
|
+
> - Data is stored in CSV (comma separated values) files
|
23
|
+
> - Metadata about the dataset both general (e.g. title, author)
|
24
|
+
> and the specific data files (e.g. schema) is stored in a single JSON file
|
25
|
+
> named `datapackage.json` which follows the Data Package format
|
26
|
+
|
27
|
+
(Source: [Tabular Data Packages, Frictionless Data Initiative • Data Hub.io • Open Knowledge Foundation • Data Protocols.org](https://datahub.io/docs/data-packages/tabular))
|
28
|
+
|
29
|
+
|
30
|
+
|
31
|
+
Here's a minimal example of a tabular data package holding two files, that is, `data.csv` and `datapackage.json`:
|
32
|
+
|
33
|
+
`data.csv`:
|
34
|
+
|
35
|
+
```
|
36
|
+
Brewery,City,Name,Abv
|
37
|
+
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
|
38
|
+
Augustiner Bräu München,München,Edelstoff,5.6%
|
39
|
+
Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
|
40
|
+
Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
|
41
|
+
Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
|
42
|
+
Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
|
43
|
+
...
|
44
|
+
```
|
45
|
+
|
46
|
+
`datapackage.json`:
|
47
|
+
|
48
|
+
``` json
|
49
|
+
{
|
50
|
+
"name": "beer",
|
51
|
+
"resources": [
|
52
|
+
{
|
53
|
+
"path": "data.csv",
|
54
|
+
"schema": {
|
55
|
+
"fields": [{ "name": "Brewery", "type": "string" },
|
56
|
+
{ "name": "City", "type": "string" },
|
57
|
+
{ "name": "Name", "type": "string" },
|
58
|
+
{ "name": "Abv", "type": "number" }]
|
59
|
+
}
|
60
|
+
}
|
61
|
+
]
|
62
|
+
}
|
63
|
+
```
|
64
|
+
|
65
|
+
|
66
|
+
|
67
|
+
### Where to find data packages?
|
68
|
+
|
69
|
+
For some real world examples see the [Data Packages Listing](https://datahub.io/core) ([Sources](https://github.com/datasets)) at the Data Hub.io • Frictionless Data Initiative
|
70
|
+
website for a start. Tabular data packages include:
|
71
|
+
|
72
|
+
Name | Comments
|
73
|
+
------------------------ | -------------
|
74
|
+
`country-codes` | Comprehensive country codes: ISO 3166, ITU, ISO 4217 currency codes and many more
|
75
|
+
`language-codes` | ISO Language Codes (639-1 and 693-2)
|
76
|
+
`currency-codes` | ISO 4217 Currency Codes
|
77
|
+
`gdb` | Country, Regional and World GDP (Gross Domestic Product)
|
78
|
+
`s-and-p-500-companies` | S&P 500 Companies with Financial Information
|
79
|
+
`un-locode` | UN-LOCODE Codelist
|
80
|
+
`gold-prices` | Gold Prices (Monthly in USD)
|
81
|
+
`bond-yields-uk-10y` | 10 Year UK Government Bond Yields (Long-Term Interest Rate)
|
82
|
+
|
83
|
+
|
84
|
+
|
85
|
+
and many more
|
86
|
+
|
87
|
+
|
88
|
+
### Code, Code, Code - Script Your Data Workflow with Ruby
|
89
|
+
|
90
|
+
|
91
|
+
``` ruby
|
92
|
+
require 'csvpack'
|
93
|
+
|
94
|
+
CsvPack.import(
|
95
|
+
's-and-p-500-companies',
|
96
|
+
'gdb'
|
97
|
+
)
|
98
|
+
```
|
99
|
+
|
100
|
+
Using `CsvPack.import` will:
|
101
|
+
|
102
|
+
1) download all data packages to the `./pack` folder
|
103
|
+
|
104
|
+
2) (auto-)add all tables to an in-memory SQLite database using SQL `create_table`
|
105
|
+
commands via `ActiveRecord` migrations e.g.
|
106
|
+
|
107
|
+
|
108
|
+
``` ruby
|
109
|
+
create_table :constituents_financials do |t|
|
110
|
+
t.string :symbol # Symbol (string)
|
111
|
+
t.string :name # Name (string)
|
112
|
+
t.string :sector # Sector (string)
|
113
|
+
t.float :price # Price (number)
|
114
|
+
t.float :dividend_yield # Dividend Yield (number)
|
115
|
+
t.float :price_earnings # Price/Earnings (number)
|
116
|
+
t.float :earnings_share # Earnings/Share (number)
|
117
|
+
t.float :book_value # Book Value (number)
|
118
|
+
t.float :_52_week_low # 52 week low (number)
|
119
|
+
t.float :_52_week_high # 52 week high (number)
|
120
|
+
t.float :market_cap # Market Cap (number)
|
121
|
+
t.float :ebitda # EBITDA (number)
|
122
|
+
t.float :price_sales # Price/Sales (number)
|
123
|
+
t.float :price_book # Price/Book (number)
|
124
|
+
t.string :sec_filings # SEC Filings (string)
|
125
|
+
end
|
126
|
+
```
|
127
|
+
|
128
|
+
3) (auto-)import all datasets using SQL inserts e.g.
|
129
|
+
|
130
|
+
``` sql
|
131
|
+
INSERT INTO constituents_financials
|
132
|
+
(symbol,
|
133
|
+
name,
|
134
|
+
sector,
|
135
|
+
price,
|
136
|
+
dividend_yield,
|
137
|
+
price_earnings,
|
138
|
+
earnings_share,
|
139
|
+
book_value,
|
140
|
+
_52_week_low,
|
141
|
+
_52_week_high,
|
142
|
+
market_cap,
|
143
|
+
ebitda,
|
144
|
+
price_sales,
|
145
|
+
price_book,
|
146
|
+
sec_filings)
|
147
|
+
VALUES
|
148
|
+
('MMM',
|
149
|
+
'3M Company',
|
150
|
+
'Industrials',
|
151
|
+
162.27,
|
152
|
+
2.11,
|
153
|
+
22.28,
|
154
|
+
7.284,
|
155
|
+
25.238,
|
156
|
+
123.61,
|
157
|
+
162.92,
|
158
|
+
104.0,
|
159
|
+
8.467,
|
160
|
+
3.28,
|
161
|
+
6.43,
|
162
|
+
'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=MMM')
|
163
|
+
```
|
164
|
+
|
165
|
+
4) (auto-)add ActiveRecord models for all tables.
|
166
|
+
|
167
|
+
|
168
|
+
So what? Now you can use all the "magic" of ActiveRecord to query
|
169
|
+
the datasets. Example:
|
170
|
+
|
171
|
+
``` ruby
|
172
|
+
pp Constituent.count
|
173
|
+
|
174
|
+
# SELECT COUNT(*) FROM "constituents"
|
175
|
+
# => 496
|
176
|
+
|
177
|
+
|
178
|
+
pp Constituent.first
|
179
|
+
|
180
|
+
# SELECT "constituents".* FROM "constituents" ORDER BY "constituents"."id" ASC LIMIT 1
|
181
|
+
# => #<Constituent:0x9f8cb78
|
182
|
+
# id: 1,
|
183
|
+
# symbol: "MMM",
|
184
|
+
# name: "3M Company",
|
185
|
+
# sector: "Industrials">
|
186
|
+
|
187
|
+
|
188
|
+
pp Constituent.find_by!( symbol: 'MMM' )
|
189
|
+
|
190
|
+
# SELECT "constituents".*
|
191
|
+
# FROM "constituents"
|
192
|
+
# WHERE "constituents"."symbol" = "MMM"
|
193
|
+
# LIMIT 1
|
194
|
+
# => #<Constituent:0x9f8cb78
|
195
|
+
# id: 1,
|
196
|
+
# symbol: "MMM",
|
197
|
+
# name: "3M Company",
|
198
|
+
# sector: "Industrials">
|
199
|
+
|
200
|
+
|
201
|
+
pp Constituent.find_by!( name: '3M Company' )
|
202
|
+
|
203
|
+
# SELECT "constituents".*
|
204
|
+
# FROM "constituents"
|
205
|
+
# WHERE "constituents"."name" = "3M Company"
|
206
|
+
# LIMIT 1
|
207
|
+
# => #<Constituent:0x9f8cb78
|
208
|
+
# id: 1,
|
209
|
+
# symbol: "MMM",
|
210
|
+
# name: "3M Company",
|
211
|
+
# sector: "Industrials">
|
212
|
+
|
213
|
+
|
214
|
+
pp Constituent.where( sector: 'Industrials' ).count
|
215
|
+
|
216
|
+
# SELECT COUNT(*) FROM "constituents"
|
217
|
+
# WHERE "constituents"."sector" = "Industrials"
|
218
|
+
# => 63
|
219
|
+
|
220
|
+
|
221
|
+
pp Constituent.where( sector: 'Industrials' ).all
|
222
|
+
|
223
|
+
# SELECT "constituents".*
|
224
|
+
# FROM "constituents"
|
225
|
+
# WHERE "constituents"."sector" = "Industrials"
|
226
|
+
# => [#<Constituent:0x9f8cb78
|
227
|
+
# id: 1,
|
228
|
+
# symbol: "MMM",
|
229
|
+
# name: "3M Company",
|
230
|
+
# sector: "Industrials">,
|
231
|
+
# #<Constituent:0xa2a4180
|
232
|
+
# id: 8,
|
233
|
+
# symbol: "ADT",
|
234
|
+
# name: "ADT Corp (The)",
|
235
|
+
# sector: "Industrials">,...]
|
236
|
+
```
|
237
|
+
|
238
|
+
and so on
|
239
|
+
|
240
|
+
|
241
|
+
|
242
|
+
### Frequently Asked Questions (F.A.Qs) and Answers
|
243
|
+
|
244
|
+
|
245
|
+
#### Q: How to dowload a data package ("by hand")?
|
246
|
+
|
247
|
+
Use the `CsvPack::Downloader` class to download a data package
|
248
|
+
to your disk (by default data packages get stored in `./pack`).
|
249
|
+
|
250
|
+
``` ruby
|
251
|
+
dl = CsvPack::Downloader.new
|
252
|
+
dl.fetch( 'language-codes' )
|
253
|
+
dl.fetch( 's-and-p-500-companies' )
|
254
|
+
dl.fetch( 'un-locode')
|
255
|
+
```
|
256
|
+
|
257
|
+
Will result in:
|
258
|
+
|
259
|
+
```
|
260
|
+
-- pack
|
261
|
+
|-- language-codes
|
262
|
+
| |-- data
|
263
|
+
| | |-- ietf-language-tags.csv
|
264
|
+
| | |-- language-codes-3b2.csv
|
265
|
+
| | |-- language-codes-full.csv
|
266
|
+
| | `-- language-codes.csv
|
267
|
+
| `-- datapackage.json
|
268
|
+
|-- s-and-p-500-companies
|
269
|
+
| |-- data
|
270
|
+
| | `-- constituents.csv
|
271
|
+
| `-- datapackage.json
|
272
|
+
`-- un-locode
|
273
|
+
|-- data
|
274
|
+
| |-- code-list.csv
|
275
|
+
| |-- country-codes.csv
|
276
|
+
| |-- function-classifiers.csv
|
277
|
+
| |-- status-indicators.csv
|
278
|
+
| `-- subdivision-codes.csv
|
279
|
+
`-- datapackage.json
|
280
|
+
```
|
281
|
+
|
282
|
+
|
283
|
+
#### Q: How to add and import a data package ("by hand")?
|
284
|
+
|
285
|
+
Use the `CsvPack::Pack` class to read-in a data package
|
286
|
+
and add and import into an SQL database.
|
287
|
+
|
288
|
+
``` ruby
|
289
|
+
pack = CsvPack::Pack.new( './pack/un-locode/datapackage.json' )
|
290
|
+
pack.tables.each do |table|
|
291
|
+
table.up! # (auto-) add table using SQL create_table via ActiveRecord migration
|
292
|
+
table.import! # import all records using SQL inserts
|
293
|
+
end
|
294
|
+
```
|
295
|
+
|
296
|
+
|
297
|
+
#### Q: How to connect to a different SQL database?
|
298
|
+
|
299
|
+
You can connect to any database supported by ActiveRecord. If you do NOT
|
300
|
+
establish a connection in your script - the standard (default fallback)
|
301
|
+
is using an in-memory SQLite3 database.
|
302
|
+
|
303
|
+
##### SQLite
|
304
|
+
|
305
|
+
For example, to create an SQLite3 database on disk - lets say `mine.db` -
|
306
|
+
use in your script (before the `CsvPack.import` statement):
|
307
|
+
|
308
|
+
``` ruby
|
309
|
+
ActiveRecord::Base.establish_connection( adapter: 'sqlite3',
|
310
|
+
database: './mine.db' )
|
311
|
+
```
|
312
|
+
|
313
|
+
##### PostgreSQL
|
314
|
+
|
315
|
+
For example, to connect to a PostgreSQL database use in your script
|
316
|
+
(before the `CsvPack.import` statement):
|
317
|
+
|
318
|
+
``` ruby
|
319
|
+
require 'pg' ## pull-in PostgreSQL (pg) machinery
|
320
|
+
|
321
|
+
ActiveRecord::Base.establish_connection( adapter: 'postgresql'
|
322
|
+
username: 'ruby',
|
323
|
+
password: 'topsecret',
|
324
|
+
database: 'database' )
|
325
|
+
```
|
326
|
+
|
327
|
+
|
328
|
+
|
329
|
+
|
330
|
+
## Install
|
331
|
+
|
332
|
+
Just install the gem:
|
333
|
+
|
334
|
+
```
|
335
|
+
$ gem install csvpack
|
336
|
+
```
|
337
|
+
|
338
|
+
|
339
|
+
|
340
|
+
## Alternatives
|
341
|
+
|
342
|
+
See the "[Tools and Plugins for working with Data Packages](https://frictionlessdata.io/software)"
|
343
|
+
page at the Frictionless Data Initiative.
|
344
|
+
|
345
|
+
|
346
|
+
## License
|
347
|
+
|
348
|
+
|
349
|
+
The `csvpack` scripts are dedicated to the public domain.
|
350
|
+
Use it as you please with no restrictions whatsoever.
|
351
|
+
|
352
|
+
## Questions? Comments?
|
353
|
+
|
354
|
+
Send them along to the ruby-talk mailing list. Thanks!
|