csvpack 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: b9ea6e66b9c56d609a881d9edd84c19fa58c9f8e
4
+ data.tar.gz: 0a0b7df35fa79d3f06aceb7a100aeb20ee71bd0e
5
+ SHA512:
6
+ metadata.gz: 1e5aa488c56683fbd1215da82c60cc8213b890c0e3b42cd126626d3005e2de7ffac156288bb91bcab42f234a34e8734731acd8c1f61694b7b1362e7d154d8fc9
7
+ data.tar.gz: 4eccdd36ca45df01c5d69f63e51abe71bb191a3144998fe0cf2de9ceb639aeb399df24b470f66752fa3a5e96e276b15f7863ff4b7cb258c5cef86520de8874d1
data/HISTORY.md ADDED
@@ -0,0 +1,4 @@
1
+ ### 0.0.1 / 2015-04-23
2
+
3
+ * Everything is new. First release
4
+
data/Manifest.txt ADDED
@@ -0,0 +1,13 @@
1
+ HISTORY.md
2
+ Manifest.txt
3
+ README.md
4
+ Rakefile
5
+ lib/csvpack.rb
6
+ lib/csvpack/downloader.rb
7
+ lib/csvpack/pack.rb
8
+ lib/csvpack/version.rb
9
+ test/helper.rb
10
+ test/test_companies.rb
11
+ test/test_countries.rb
12
+ test/test_downloader.rb
13
+ test/test_import.rb
data/README.md ADDED
@@ -0,0 +1,354 @@
1
+ # csvpack
2
+
3
+ work with tabular data packages using comma-separated values (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more
4
+
5
+
6
+ * home :: [github.com/csv11/csvpack](https://github.com/csv11/csvpack)
7
+ * bugs :: [github.com/csv11/csvpack/issues](https://github.com/csv11/csvpack/issues)
8
+ * gem :: [rubygems.org/gems/csvpack](https://rubygems.org/gems/csvpack)
9
+ * rdoc :: [rubydoc.info/gems/csvpack](http://rubydoc.info/gems/csvpack)
10
+ * forum :: [ruby-talk@ruby-lang.org](http://www.ruby-lang.org/en/community/mailing-lists/)
11
+
12
+
13
+
14
+ ## Usage
15
+
16
+
17
+ ### What's a tabular data package?
18
+
19
+ > Tabular Data Package is a simple structure for publishing and sharing
20
+ > tabular data with the following key features:
21
+ >
22
+ > - Data is stored in CSV (comma separated values) files
23
+ > - Metadata about the dataset both general (e.g. title, author)
24
+ > and the specific data files (e.g. schema) is stored in a single JSON file
25
+ > named `datapackage.json` which follows the Data Package format
26
+
27
+ (Source: [Tabular Data Packages, Frictionless Data Initiative • Data Hub.io • Open Knowledge Foundation • Data Protocols.org](https://datahub.io/docs/data-packages/tabular))
28
+
29
+
30
+
31
+ Here's a minimal example of a tabular data package holding two files, that is, `data.csv` and `datapackage.json`:
32
+
33
+ `data.csv`:
34
+
35
+ ```
36
+ Brewery,City,Name,Abv
37
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
38
+ Augustiner Bräu München,München,Edelstoff,5.6%
39
+ Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
40
+ Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
41
+ Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
42
+ Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
43
+ ...
44
+ ```
45
+
46
+ `datapackage.json`:
47
+
48
+ ``` json
49
+ {
50
+ "name": "beer",
51
+ "resources": [
52
+ {
53
+ "path": "data.csv",
54
+ "schema": {
55
+ "fields": [{ "name": "Brewery", "type": "string" },
56
+ { "name": "City", "type": "string" },
57
+ { "name": "Name", "type": "string" },
58
+ { "name": "Abv", "type": "number" }]
59
+ }
60
+ }
61
+ ]
62
+ }
63
+ ```
64
+
65
+
66
+
67
+ ### Where to find data packages?
68
+
69
+ For some real world examples see the [Data Packages Listing](https://datahub.io/core) ([Sources](https://github.com/datasets)) at the Data Hub.io • Frictionless Data Initiative
70
+ website for a start. Tabular data packages include:
71
+
72
+ Name | Comments
73
+ ------------------------ | -------------
74
+ `country-codes` | Comprehensive country codes: ISO 3166, ITU, ISO 4217 currency codes and many more
75
+ `language-codes` | ISO Language Codes (639-1 and 693-2)
76
+ `currency-codes` | ISO 4217 Currency Codes
77
+ `gdb` | Country, Regional and World GDP (Gross Domestic Product)
78
+ `s-and-p-500-companies` | S&P 500 Companies with Financial Information
79
+ `un-locode` | UN-LOCODE Codelist
80
+ `gold-prices` | Gold Prices (Monthly in USD)
81
+ `bond-yields-uk-10y` | 10 Year UK Government Bond Yields (Long-Term Interest Rate)
82
+
83
+
84
+
85
+ and many more
86
+
87
+
88
+ ### Code, Code, Code - Script Your Data Workflow with Ruby
89
+
90
+
91
+ ``` ruby
92
+ require 'csvpack'
93
+
94
+ CsvPack.import(
95
+ 's-and-p-500-companies',
96
+ 'gdb'
97
+ )
98
+ ```
99
+
100
+ Using `CsvPack.import` will:
101
+
102
+ 1) download all data packages to the `./pack` folder
103
+
104
+ 2) (auto-)add all tables to an in-memory SQLite database using SQL `create_table`
105
+ commands via `ActiveRecord` migrations e.g.
106
+
107
+
108
+ ``` ruby
109
+ create_table :constituents_financials do |t|
110
+ t.string :symbol # Symbol (string)
111
+ t.string :name # Name (string)
112
+ t.string :sector # Sector (string)
113
+ t.float :price # Price (number)
114
+ t.float :dividend_yield # Dividend Yield (number)
115
+ t.float :price_earnings # Price/Earnings (number)
116
+ t.float :earnings_share # Earnings/Share (number)
117
+ t.float :book_value # Book Value (number)
118
+ t.float :_52_week_low # 52 week low (number)
119
+ t.float :_52_week_high # 52 week high (number)
120
+ t.float :market_cap # Market Cap (number)
121
+ t.float :ebitda # EBITDA (number)
122
+ t.float :price_sales # Price/Sales (number)
123
+ t.float :price_book # Price/Book (number)
124
+ t.string :sec_filings # SEC Filings (string)
125
+ end
126
+ ```
127
+
128
+ 3) (auto-)import all datasets using SQL inserts e.g.
129
+
130
+ ``` sql
131
+ INSERT INTO constituents_financials
132
+ (symbol,
133
+ name,
134
+ sector,
135
+ price,
136
+ dividend_yield,
137
+ price_earnings,
138
+ earnings_share,
139
+ book_value,
140
+ _52_week_low,
141
+ _52_week_high,
142
+ market_cap,
143
+ ebitda,
144
+ price_sales,
145
+ price_book,
146
+ sec_filings)
147
+ VALUES
148
+ ('MMM',
149
+ '3M Co',
150
+ 'Industrials',
151
+ 162.27,
152
+ 2.11,
153
+ 22.28,
154
+ 7.284,
155
+ 25.238,
156
+ 123.61,
157
+ 162.92,
158
+ 104.0,
159
+ 8.467,
160
+ 3.28,
161
+ 6.43,
162
+ 'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=MMM')
163
+ ```
164
+
165
+ 4) (auto-)add ActiveRecord models for all tables.
166
+
167
+
168
+ So what? Now you can use all the "magic" of ActiveRecord to query
169
+ the datasets. Example:
170
+
171
+ ``` ruby
172
+ pp Constituent.count
173
+
174
+ # SELECT COUNT(*) FROM "constituents"
175
+ # => 496
176
+
177
+
178
+ pp Constituent.first
179
+
180
+ # SELECT "constituents".* FROM "constituents" ORDER BY "constituents"."id" ASC LIMIT 1
181
+ # => #<Constituent:0x9f8cb78
182
+ # id: 1,
183
+ # symbol: "MMM",
184
+ # name: "3M Co",
185
+ # sector: "Industrials">
186
+
187
+
188
+ pp Constituent.find_by!( symbol: 'MMM' )
189
+
190
+ # SELECT "constituents".*
191
+ # FROM "constituents"
192
+ # WHERE "constituents"."symbol" = "MMM"
193
+ # LIMIT 1
194
+ # => #<Constituent:0x9f8cb78
195
+ # id: 1,
196
+ # symbol: "MMM",
197
+ # name: "3M Co",
198
+ # sector: "Industrials">
199
+
200
+
201
+ pp Constituent.find_by!( name: '3M Co' )
202
+
203
+ # SELECT "constituents".*
204
+ # FROM "constituents"
205
+ # WHERE "constituents"."name" = "3M Co"
206
+ # LIMIT 1
207
+ # => #<Constituent:0x9f8cb78
208
+ # id: 1,
209
+ # symbol: "MMM",
210
+ # name: "3M Co",
211
+ # sector: "Industrials">
212
+
213
+
214
+ pp Constituent.where( sector: 'Industrials' ).count
215
+
216
+ # SELECT COUNT(*) FROM "constituents"
217
+ # WHERE "constituents"."sector" = "Industrials"
218
+ # => 63
219
+
220
+
221
+ pp Constituent.where( sector: 'Industrials' ).all
222
+
223
+ # SELECT "constituents".*
224
+ # FROM "constituents"
225
+ # WHERE "constituents"."sector" = "Industrials"
226
+ # => [#<Constituent:0x9f8cb78
227
+ # id: 1,
228
+ # symbol: "MMM",
229
+ # name: "3M Co",
230
+ # sector: "Industrials">,
231
+ # #<Constituent:0xa2a4180
232
+ # id: 8,
233
+ # symbol: "ADT",
234
+ # name: "ADT Corp (The)",
235
+ # sector: "Industrials">,...]
236
+ ```
237
+
238
+ and so on
239
+
240
+
241
+
242
+ ### Frequently Asked Questions (F.A.Qs) and Answers
243
+
244
+
245
+ #### Q: How to dowload a data package ("by hand")?
246
+
247
+ Use the `CsvPack::Downloader` class to download a data package
248
+ to your disk (by default data packages get stored in `./pack`).
249
+
250
+ ``` ruby
251
+ dl = CsvPack::Downloader.new
252
+ dl.fetch( 'language-codes' )
253
+ dl.fetch( 's-and-p-500-companies' )
254
+ dl.fetch( 'un-locode')
255
+ ```
256
+
257
+ Will result in:
258
+
259
+ ```
260
+ -- pack
261
+ |-- language-codes
262
+ | |-- data
263
+ | | |-- language-codes-3b2.csv
264
+ | | |-- language-codes.csv
265
+ | | `-- language-codes-full.csv
266
+ | `-- datapackage.json
267
+ |-- s-and-p-500-companies
268
+ | |-- data
269
+ | | |-- constituents.csv
270
+ | | `-- constituents-financials.csv
271
+ | `-- datapackage.json
272
+ `-- un-locode
273
+ |-- data
274
+ | |-- code-list.csv
275
+ | |-- country-codes.csv
276
+ | |-- function-classifiers.csv
277
+ | |-- status-indicators.csv
278
+ | `-- subdivision-codes.csv
279
+ `-- datapackage.json
280
+ ```
281
+
282
+
283
+ #### Q: How to add and import a data package ("by hand")?
284
+
285
+ Use the `CsvPack::Pack` class to read-in a data package
286
+ and add and import into an SQL database.
287
+
288
+ ``` ruby
289
+ pack = CsvPack::Pack.new( './pack/un-locode/datapackage.json' )
290
+ pack.tables.each do |table|
291
+ table.up! # (auto-) add table using SQL create_table via ActiveRecord migration
292
+ table.import! # import all records using SQL inserts
293
+ end
294
+ ```
295
+
296
+
297
+ #### Q: How to connect to a different SQL database?
298
+
299
+ You can connect to any database supported by ActiveRecord. If you do NOT
300
+ establish a connection in your script - the standard (default fallback)
301
+ is using an in-memory SQLite3 database.
302
+
303
+ ##### SQLite
304
+
305
+ For example, to create an SQLite3 database on disk - lets say `mine.db` -
306
+ use in your script (before the `CsvPack.import` statement):
307
+
308
+ ``` ruby
309
+ ActiveRecord::Base.establish_connection( adapter: 'sqlite3',
310
+ database: './mine.db' )
311
+ ```
312
+
313
+ ##### PostgreSQL
314
+
315
+ For example, to connect to a PostgreSQL database use in your script
316
+ (before the `CsvPack.import` statement):
317
+
318
+ ``` ruby
319
+ require 'pg' ## pull-in PostgreSQL (pg) machinery
320
+
321
+ ActiveRecord::Base.establish_connection( adapter: 'postgresql'
322
+ username: 'ruby',
323
+ password: 'topsecret',
324
+ database: 'database' )
325
+ ```
326
+
327
+
328
+
329
+
330
+ ## Install
331
+
332
+ Just install the gem:
333
+
334
+ ```
335
+ $ gem install csvpack
336
+ ```
337
+
338
+
339
+
340
+ ## Alternatives
341
+
342
+ See the "[Tools and Plugins for working with Data Packages](https://frictionlessdata.io/software)"
343
+ page at the Frictionless Data Initiative.
344
+
345
+
346
+ ## License
347
+
348
+
349
+ The `csvpack` scripts are dedicated to the public domain.
350
+ Use it as you please with no restrictions whatsoever.
351
+
352
+ ## Questions? Comments?
353
+
354
+ Send them along to the ruby-talk mailing list. Thanks!
data/Rakefile ADDED
@@ -0,0 +1,32 @@
1
+ require 'hoe'
2
+ require './lib/csvpack/version.rb'
3
+
4
+ Hoe.spec 'csvpack' do
5
+
6
+ self.version = CsvPack::VERSION
7
+
8
+ self.summary = 'csvpack - work with tabular data packages using comma-separated values (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more'
9
+ self.description = summary
10
+
11
+ self.urls = ['https://github.com/csv11/csvpack']
12
+
13
+ self.author = 'Gerald Bauer'
14
+ self.email = 'ruby-talk@ruby-lang.org'
15
+
16
+ # switch extension to .markdown for gihub formatting
17
+ self.readme_file = 'README.md'
18
+ self.history_file = 'HISTORY.md'
19
+
20
+ self.extra_deps = [
21
+ ['logutils', '>=0.6.1'],
22
+ ['fetcher', '>=0.4.5'],
23
+ ['activerecord', '>=5.0.0'],
24
+ ]
25
+
26
+ self.licenses = ['Public Domain']
27
+
28
+ self.spec_extras = {
29
+ required_ruby_version: '>= 2.2.2'
30
+ }
31
+
32
+ end
data/lib/csvpack.rb ADDED
@@ -0,0 +1,52 @@
1
+ # encoding: utf-8
2
+
3
+
4
+ require 'pp'
5
+ require 'forwardable'
6
+
7
+ ### csv
8
+ require 'csv'
9
+ require 'json'
10
+ require 'fileutils'
11
+
12
+
13
+ ### downloader
14
+ require 'fetcher'
15
+
16
+ ### activerecord w/ sqlite3
17
+ ## require 'active_support/all' ## needed for String#binary? method
18
+ require 'active_record'
19
+
20
+
21
+
22
+ # our own code
23
+
24
+ require 'csvpack/version' ## let version always go first
25
+ require 'csvpack/pack'
26
+ require 'csvpack/downloader'
27
+
28
+ module CsvPack
29
+
30
+ def self.import( *args )
31
+ ## step 1: download
32
+ dl = Downloader.new
33
+ args.each do |arg|
34
+ dl.fetch( arg )
35
+ end
36
+
37
+ ## step 2: up 'n' import
38
+ args.each do |arg|
39
+ pack = Pack.new( "./pack/#{arg}/datapackage.json" )
40
+ pack.tables.each do |table|
41
+ table.up!
42
+ table.import!
43
+ end
44
+ end
45
+ end
46
+
47
+ end # module CsvPack
48
+
49
+
50
+
51
+ # say hello
52
+ puts CsvPack.banner if defined?($RUBYLIBS_DEBUG) && $RUBYLIBS_DEBUG
@@ -0,0 +1,62 @@
1
+ # encoding: utf-8
2
+
3
+ module CsvPack
4
+
5
+ class Downloader
6
+
7
+ def initialize( cache_dir='./pack' )
8
+ @cache_dir = cache_dir # todo: check if folder exists now (or on demand)?
9
+ @worker = Fetcher::Worker.new
10
+ end
11
+
12
+ SHORTCUTS = {
13
+ ## to be done
14
+ }
15
+
16
+ def fetch( name_or_shortcut_or_url ) ## todo/check: use (re)name to get/update/etc. why? why not??
17
+
18
+ name = name_or_shortcut_or_url
19
+
20
+ ##
21
+ ## e.g. try
22
+ ## country-list
23
+ ##
24
+
25
+ ## url_base = "http://data.okfn.org/data/core/#{name}"
26
+ url_base = "https://datahub.io/core/#{name}"
27
+ url = "#{url_base}/datapackage.json"
28
+
29
+ dest_dir = "#{@cache_dir}/#{name}"
30
+ FileUtils.mkdir_p( dest_dir )
31
+
32
+ pack_path = "#{dest_dir}/datapackage.json"
33
+ @worker.copy( url, pack_path )
34
+
35
+ h = JSON.parse( File.read( pack_path ) )
36
+ pp h
37
+
38
+ ## copy resources (tables)
39
+ h['resources'].each do |r|
40
+ puts "== resource:"
41
+ pp r
42
+
43
+ res_url = r['url']
44
+
45
+ res_name = r['name']
46
+ res_relative_path = r['path']
47
+ if res_relative_path.nil?
48
+ res_relative_path = "#{res_name}.csv"
49
+ end
50
+
51
+ res_path = "#{dest_dir}/#{res_relative_path}"
52
+ puts "[debug] res_path: >#{res_path}<"
53
+ res_dir = File.dirname( res_path )
54
+ FileUtils.mkdir_p( res_dir )
55
+
56
+ @worker.copy( res_url, res_path )
57
+ end
58
+ end
59
+
60
+ end # class Downloader
61
+
62
+ end # module CsvPack
@@ -0,0 +1,246 @@
1
+ # encoding: utf-8
2
+
3
+
4
+ ## note: for now use in-memory sqlite3 db
5
+
6
+ module CsvPack
7
+
8
+ class Pack
9
+ ## load (tabular) datapackage into memory
10
+ def initialize( path )
11
+
12
+ ## convenience
13
+ ## - check: if path is a folder/directory
14
+ ## (auto-)add /datapackage.json
15
+
16
+ text = File.open( path, 'r:utf-8' ).read
17
+ @h = JSON.parse( text )
18
+
19
+ pack_dir = File.dirname(path)
20
+
21
+ ## pp @h
22
+
23
+ ## read in tables
24
+ @tables = []
25
+ @h['resources'].each do |r|
26
+ ## build table data
27
+ @tables << build_tab( r, pack_dir )
28
+ end
29
+
30
+ ## pp @tables
31
+ end
32
+
33
+ def name() @h['name']; end
34
+ def title() @h['title']; end
35
+ def license() @h['license']; end
36
+
37
+ def tables() @tables; end
38
+ ## convenience method - return first table
39
+ def table() @tables[0]; end
40
+
41
+ def build_tab( h, pack_dir )
42
+ name = h['name']
43
+ relative_path = h['path']
44
+
45
+ if relative_path.nil?
46
+ relative_path = "#{name}.csv"
47
+ puts " warn: no path defined; using fallback '#{relative_path}'"
48
+ end
49
+
50
+ puts " reading resource (table) #{name} (#{relative_path})..."
51
+ pp h
52
+
53
+ path = "#{pack_dir}/#{relative_path}"
54
+ text = File.open( path, 'r:utf-8' ).read
55
+ tab = Tab.new( h, text )
56
+ tab
57
+ end
58
+ end # class Pack
59
+
60
+
61
+ class Tab
62
+ extend Forwardable
63
+
64
+ def initialize( h, text )
65
+ @h = h
66
+
67
+ ## todo parse csv
68
+ ## note: use header options (first row MUST include headers)
69
+ @data = CSV.parse( text, headers: true )
70
+
71
+ pp @data[0]
72
+ end
73
+
74
+ def name() @h['name']; end
75
+ def_delegators :@data, :[], :each
76
+
77
+ def pretty_print( printer )
78
+ printer.text "Tab<#{object_id} @data.name=#{name}, @data.size=#{@data.size}>"
79
+ end
80
+
81
+
82
+ def up!
83
+ # run Migration#up to create table
84
+ connect!
85
+ con = ActiveRecord::Base.connection
86
+
87
+ con.create_table sanitize_name( name ) do |t|
88
+ @h['schema']['fields'].each do |f|
89
+ column_name = sanitize_name(f['name'])
90
+ column_type = DATA_TYPES[f['type']]
91
+
92
+ puts " #{column_type} :#{column_name} => #{f['type']} - #{f['name']}"
93
+
94
+ t.send( column_type.to_sym, column_name.to_sym ) ## todo/check: to_sym needed?
95
+ end
96
+ t.string :name
97
+ end
98
+ end
99
+
100
+ def import!
101
+ connect!
102
+ con = ActiveRecord::Base.connection
103
+
104
+ column_names = []
105
+ column_types = []
106
+ column_placeholders = []
107
+ @h['schema']['fields'].each do |f|
108
+ column_names << sanitize_name(f['name'])
109
+ column_types << DATA_TYPES[f['type']]
110
+ column_placeholders << '?'
111
+ end
112
+
113
+ sql_insert_into = "INSERT INTO #{sanitize_name(name)} (#{column_names.join(',')}) VALUES "
114
+ puts sql_insert_into
115
+
116
+ i=0
117
+ @data.each do |row|
118
+ i+=1
119
+ ## next if i > 3 ## for testing; only insert a couple of recs
120
+
121
+ ## todo: check if all string is ok; or number/date/etc. conversion needed/required?
122
+ values = []
123
+ row.fields.each_with_index do |value,index| # get array of values
124
+ type = column_types[index]
125
+ ## todo add boolean ??
126
+ if value.blank?
127
+ values << 'NULL'
128
+ elsif [:number,:float,:integer].include?( type )
129
+ values << value ## do NOT wrap in quotes (numeric)
130
+ else
131
+ esc_value = value.gsub( "'", "''" ) ## escape quotes e.g. ' becomse \'\', that is, double quotes
132
+ values << "'#{esc_value}'" ## wrap in quotes
133
+ end
134
+ end
135
+ pp values
136
+
137
+ sql = "#{sql_insert_into} (#{values.join(',')})"
138
+ puts sql
139
+ con.execute( sql )
140
+ end
141
+ end # method import!
142
+
143
+
144
+ def import_v1!
145
+ ### note: import via sql for (do NOT use ActiveRecord record class for now)
146
+ con = ActiveRecord::Base.connection
147
+
148
+ column_names = []
149
+ column_types = []
150
+ column_placeholders = []
151
+ @h['schema']['fields'].each do |f|
152
+ column_names << sanitize_name(f['name'])
153
+ column_types << DATA_TYPES[f['type']]
154
+ column_placeholders << '?'
155
+ end
156
+
157
+ sql = "INSERT INTO #{sanitize_name(name)} (#{column_names.join(',')}) VALUES (#{column_placeholders.join(',')})"
158
+ puts sql
159
+
160
+ i=0
161
+ @data.each do |row|
162
+ i+=1
163
+ next if i > 3 ## for testing; only insert a couple of recs
164
+
165
+ ## todo: check if all string is ok; or number/date/etc. conversion needed/required?
166
+ params = row.fields # get array of values
167
+ pp params
168
+ con.exec_insert( sql, 'SQL', params ) # todo/check: 2nd param name used for logging only??
169
+ end
170
+ end # method import!
171
+
172
+
173
+ ### note:
174
+ ## activerecord supports:
175
+ ## :string, :text, :integer, :float, :decimal, :datetime, :time, :date, :binary, :boolean
176
+
177
+ ### mappings for data types
178
+ ## from tabular data package to ActiveRecord migrations
179
+ ##
180
+ # see http://dataprotocols.org/json-table-schema/ (section Field Types and Formats)
181
+ #
182
+ # for now supports these types
183
+
184
+ DATA_TYPES = {
185
+ 'string' => :string, ## use text for larger strings ???
186
+ 'number' => :float, ## note: use float for now
187
+ 'integer' => :integer,
188
+ 'boolean' => :boolean,
189
+ 'datetime' => :datetime,
190
+ 'date' => :date,
191
+ 'time' => :time,
192
+ }
193
+
194
+ def dump_schema
195
+ ## try to dump schema (fields)
196
+ puts "*** dump schema:"
197
+
198
+ @h['schema']['fields'].each do |f|
199
+ puts " #{f['name']} ( #{sanitize_name(f['name'])} ) : #{f['type']}} ( #{DATA_TYPES[f['type']]} )"
200
+ end
201
+
202
+ end
203
+
204
+
205
+ def sanitize_name( ident )
206
+ ##
207
+ ## if identifier starts w/ number add leading underscore (_)
208
+ ## e.g. 52 Week Price => becomes _52_week_price
209
+
210
+ ident = ident.strip.downcase
211
+ ident = ident.gsub( /[\.\-\/]/, '_' ) ## convert some special chars to underscore (e.g. dash -)
212
+ ident = ident.gsub( ' ', '_' )
213
+ ident = ident.gsub( /[^a-z0-9_]/, '' )
214
+ ident = "_#{ident}" if ident =~ /^[0-9]/
215
+ ident
216
+ end
217
+
218
+
219
+ def ar_clazz
220
+ @ar_clazz ||= begin
221
+ clazz = Class.new( ActiveRecord::Base ) do
222
+ ## nothing here for now
223
+ end
224
+ puts "set table_name to #{sanitize_name( name )}"
225
+ clazz.table_name = sanitize_name( name )
226
+ clazz
227
+ end
228
+ @ar_clazz
229
+ end
230
+
231
+ private
232
+
233
+ ## helper to get connection; if not connection established use defaults
234
+ def connect!
235
+ ## todo: cache returned con - why, why not ??
236
+ unless ActiveRecord::Base.connected?
237
+ puts "note: no database connection established; using defaults (e.g. in-memory SQLite database)"
238
+ ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ':memory:' )
239
+ ActiveRecord::Base.logger = Logger.new( STDOUT )
240
+ end
241
+ ActiveRecord::Base.connection
242
+ end
243
+
244
+ end # class Tab
245
+
246
+ end # module CsvPack
@@ -0,0 +1,22 @@
1
+ # encoding: utf-8
2
+
3
+ module CsvPack
4
+
5
+ MAJOR = 0 ## todo: namespace inside version or something - why? why not??
6
+ MINOR = 1
7
+ PATCH = 0
8
+ VERSION = [MAJOR,MINOR,PATCH].join('.')
9
+
10
+ def self.version
11
+ VERSION
12
+ end
13
+
14
+ def self.banner
15
+ "csvpack/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}]"
16
+ end
17
+
18
+ def self.root
19
+ File.expand_path( File.dirname(File.dirname(File.dirname(__FILE__))) )
20
+ end
21
+
22
+ end # module CsvPack
data/test/helper.rb ADDED
@@ -0,0 +1,7 @@
1
+
2
+ ## minitest setup
3
+ require 'minitest/autorun'
4
+
5
+
6
+ ## our own code
7
+ require 'csvpack'
@@ -0,0 +1,61 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_companies.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+ class TestCompanies < MiniTest::Test
11
+
12
+ def test_s_and_p_500_companies
13
+
14
+ pak = Datapak::Pak.new( './pak/s-and-p-500-companies/datapackage.json' )
15
+
16
+ puts "name: #{pak.name}"
17
+ puts "title: #{pak.title}"
18
+ puts "license: #{pak.license}"
19
+
20
+ pp pak.tables
21
+ pp pak.table[0]['Symbol']
22
+ pp pak.table[495]['Symbol']
23
+
24
+ ## pak.table.each do |row|
25
+ ## pp row
26
+ ## end
27
+
28
+ puts pak.tables[0].dump_schema
29
+ puts pak.tables[1].dump_schema
30
+
31
+ # database setup 'n' config
32
+ ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ':memory:' )
33
+ ActiveRecord::Base.logger = Logger.new( STDOUT )
34
+
35
+ pak.table.up!
36
+ pak.table.import!
37
+
38
+ pak.tables[1].up!
39
+ pak.tables[1].import!
40
+
41
+
42
+ pp pak.table.ar_clazz
43
+
44
+
45
+ company = pak.table.ar_clazz
46
+
47
+ puts "Company.count: #{company.count}"
48
+ pp company.first
49
+ pp company.find_by!( symbol: 'MMM' )
50
+ pp company.find_by!( name: '3M Co' )
51
+ pp company.where( sector: 'Industrials' ).count
52
+ pp company.where( sector: 'Industrials' ).all
53
+
54
+
55
+ ### todo: try a join w/ belongs_to ??
56
+
57
+ assert true # if we get here - test success
58
+ end
59
+
60
+ end # class TestCompanies
61
+
@@ -0,0 +1,40 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_countries.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+ class TestCountries < MiniTest::Test
11
+
12
+ def test_country_list
13
+ pak = Datapak::Pak.new( './pak/country-list/datapackage.json' )
14
+
15
+ puts "name: #{pak.name}"
16
+ puts "title: #{pak.title}"
17
+ puts "license: #{pak.license}"
18
+
19
+ pp pak.tables
20
+
21
+ ## pak.table.each do |row|
22
+ ## pp row
23
+ ## end
24
+
25
+ puts pak.table.dump_schema
26
+
27
+ # database setup 'n' config
28
+ ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ':memory:' )
29
+ ActiveRecord::Base.logger = Logger.new( STDOUT )
30
+
31
+ pak.table.up!
32
+ pak.table.import!
33
+
34
+ pp pak.table.ar_clazz
35
+
36
+ assert true # if we get here - test success
37
+ end
38
+
39
+ end # class TestCountries
40
+
@@ -0,0 +1,32 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_downloader.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+ class TestDownloader < MiniTest::Test
11
+
12
+ def test_download
13
+
14
+ names = [
15
+ 'country-list',
16
+ 'country-codes',
17
+ 'language-codes',
18
+ 'cpi', ## Annual Consumer Price Index (CPI)
19
+ 'gdp', ## Country, Regional and World GDP (Gross Domestic Product)
20
+ 's-and-p-500-companies', ## S&P 500 Companies with Financial Information
21
+ 'un-locode', ## UN-LOCODE Codelist - note: incl. country-codes.csv
22
+ ]
23
+
24
+ dl = Datapak::Downloader.new
25
+ names.each do |name|
26
+ dl.fetch( name )
27
+ end
28
+
29
+ assert true # if we get here - test success
30
+ end
31
+
32
+ end # class TestDownloader
@@ -0,0 +1,22 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_import.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+ class TestImport < MiniTest::Test
11
+
12
+ def test_import
13
+
14
+ CsvPack.import(
15
+ 'cpi', ## Annual Consumer Price Index (CPI)
16
+ 'gdp', ## Country, Regional and World GDP (Gross Domestic Product)
17
+ )
18
+
19
+ assert true # if we get here - test success
20
+ end
21
+
22
+ end # class TestImport
metadata ADDED
@@ -0,0 +1,137 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: csvpack
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Gerald Bauer
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2018-08-07 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: logutils
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: 0.6.1
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: 0.6.1
27
+ - !ruby/object:Gem::Dependency
28
+ name: fetcher
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: 0.4.5
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: 0.4.5
41
+ - !ruby/object:Gem::Dependency
42
+ name: activerecord
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: 5.0.0
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: 5.0.0
55
+ - !ruby/object:Gem::Dependency
56
+ name: rdoc
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '4.0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '4.0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: hoe
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '3.16'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '3.16'
83
+ description: csvpack - work with tabular data packages using comma-separated values
84
+ (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated
85
+ values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of
86
+ choice and much more
87
+ email: ruby-talk@ruby-lang.org
88
+ executables: []
89
+ extensions: []
90
+ extra_rdoc_files:
91
+ - HISTORY.md
92
+ - Manifest.txt
93
+ - README.md
94
+ files:
95
+ - HISTORY.md
96
+ - Manifest.txt
97
+ - README.md
98
+ - Rakefile
99
+ - lib/csvpack.rb
100
+ - lib/csvpack/downloader.rb
101
+ - lib/csvpack/pack.rb
102
+ - lib/csvpack/version.rb
103
+ - test/helper.rb
104
+ - test/test_companies.rb
105
+ - test/test_countries.rb
106
+ - test/test_downloader.rb
107
+ - test/test_import.rb
108
+ homepage: https://github.com/csv11/csvpack
109
+ licenses:
110
+ - Public Domain
111
+ metadata: {}
112
+ post_install_message:
113
+ rdoc_options:
114
+ - "--main"
115
+ - README.md
116
+ require_paths:
117
+ - lib
118
+ required_ruby_version: !ruby/object:Gem::Requirement
119
+ requirements:
120
+ - - ">="
121
+ - !ruby/object:Gem::Version
122
+ version: 2.2.2
123
+ required_rubygems_version: !ruby/object:Gem::Requirement
124
+ requirements:
125
+ - - ">="
126
+ - !ruby/object:Gem::Version
127
+ version: '0'
128
+ requirements: []
129
+ rubyforge_project:
130
+ rubygems_version: 2.5.2
131
+ signing_key:
132
+ specification_version: 4
133
+ summary: csvpack - work with tabular data packages using comma-separated values (CSV)
134
+ datafiles in text with datapackage.json; download, read into and query comma-separated
135
+ values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of
136
+ choice and much more
137
+ test_files: []