csvpack 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: b9ea6e66b9c56d609a881d9edd84c19fa58c9f8e
4
+ data.tar.gz: 0a0b7df35fa79d3f06aceb7a100aeb20ee71bd0e
5
+ SHA512:
6
+ metadata.gz: 1e5aa488c56683fbd1215da82c60cc8213b890c0e3b42cd126626d3005e2de7ffac156288bb91bcab42f234a34e8734731acd8c1f61694b7b1362e7d154d8fc9
7
+ data.tar.gz: 4eccdd36ca45df01c5d69f63e51abe71bb191a3144998fe0cf2de9ceb639aeb399df24b470f66752fa3a5e96e276b15f7863ff4b7cb258c5cef86520de8874d1
data/HISTORY.md ADDED
@@ -0,0 +1,4 @@
1
+ ### 0.0.1 / 2015-04-23
2
+
3
+ * Everything is new. First release
4
+
data/Manifest.txt ADDED
@@ -0,0 +1,13 @@
1
+ HISTORY.md
2
+ Manifest.txt
3
+ README.md
4
+ Rakefile
5
+ lib/csvpack.rb
6
+ lib/csvpack/downloader.rb
7
+ lib/csvpack/pack.rb
8
+ lib/csvpack/version.rb
9
+ test/helper.rb
10
+ test/test_companies.rb
11
+ test/test_countries.rb
12
+ test/test_downloader.rb
13
+ test/test_import.rb
data/README.md ADDED
@@ -0,0 +1,354 @@
1
+ # csvpack
2
+
3
+ work with tabular data packages using comma-separated values (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more
4
+
5
+
6
+ * home :: [github.com/csv11/csvpack](https://github.com/csv11/csvpack)
7
+ * bugs :: [github.com/csv11/csvpack/issues](https://github.com/csv11/csvpack/issues)
8
+ * gem :: [rubygems.org/gems/csvpack](https://rubygems.org/gems/csvpack)
9
+ * rdoc :: [rubydoc.info/gems/csvpack](http://rubydoc.info/gems/csvpack)
10
+ * forum :: [ruby-talk@ruby-lang.org](http://www.ruby-lang.org/en/community/mailing-lists/)
11
+
12
+
13
+
14
+ ## Usage
15
+
16
+
17
+ ### What's a tabular data package?
18
+
19
+ > Tabular Data Package is a simple structure for publishing and sharing
20
+ > tabular data with the following key features:
21
+ >
22
+ > - Data is stored in CSV (comma separated values) files
23
+ > - Metadata about the dataset both general (e.g. title, author)
24
+ > and the specific data files (e.g. schema) is stored in a single JSON file
25
+ > named `datapackage.json` which follows the Data Package format
26
+
27
+ (Source: [Tabular Data Packages, Frictionless Data Initiative • Data Hub.io • Open Knowledge Foundation • Data Protocols.org](https://datahub.io/docs/data-packages/tabular))
28
+
29
+
30
+
31
+ Here's a minimal example of a tabular data package holding two files, that is, `data.csv` and `datapackage.json`:
32
+
33
+ `data.csv`:
34
+
35
+ ```
36
+ Brewery,City,Name,Abv
37
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
38
+ Augustiner Bräu München,München,Edelstoff,5.6%
39
+ Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
40
+ Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
41
+ Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
42
+ Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
43
+ ...
44
+ ```
45
+
46
+ `datapackage.json`:
47
+
48
+ ``` json
49
+ {
50
+ "name": "beer",
51
+ "resources": [
52
+ {
53
+ "path": "data.csv",
54
+ "schema": {
55
+ "fields": [{ "name": "Brewery", "type": "string" },
56
+ { "name": "City", "type": "string" },
57
+ { "name": "Name", "type": "string" },
58
+ { "name": "Abv", "type": "number" }]
59
+ }
60
+ }
61
+ ]
62
+ }
63
+ ```
64
+
65
+
66
+
67
+ ### Where to find data packages?
68
+
69
+ For some real world examples see the [Data Packages Listing](https://datahub.io/core) ([Sources](https://github.com/datasets)) at the Data Hub.io • Frictionless Data Initiative
70
+ website for a start. Tabular data packages include:
71
+
72
+ Name | Comments
73
+ ------------------------ | -------------
74
+ `country-codes` | Comprehensive country codes: ISO 3166, ITU, ISO 4217 currency codes and many more
75
+ `language-codes` | ISO Language Codes (639-1 and 693-2)
76
+ `currency-codes` | ISO 4217 Currency Codes
77
+ `gdb` | Country, Regional and World GDP (Gross Domestic Product)
78
+ `s-and-p-500-companies` | S&P 500 Companies with Financial Information
79
+ `un-locode` | UN-LOCODE Codelist
80
+ `gold-prices` | Gold Prices (Monthly in USD)
81
+ `bond-yields-uk-10y` | 10 Year UK Government Bond Yields (Long-Term Interest Rate)
82
+
83
+
84
+
85
+ and many more
86
+
87
+
88
+ ### Code, Code, Code - Script Your Data Workflow with Ruby
89
+
90
+
91
+ ``` ruby
92
+ require 'csvpack'
93
+
94
+ CsvPack.import(
95
+ 's-and-p-500-companies',
96
+ 'gdb'
97
+ )
98
+ ```
99
+
100
+ Using `CsvPack.import` will:
101
+
102
+ 1) download all data packages to the `./pack` folder
103
+
104
+ 2) (auto-)add all tables to an in-memory SQLite database using SQL `create_table`
105
+ commands via `ActiveRecord` migrations e.g.
106
+
107
+
108
+ ``` ruby
109
+ create_table :constituents_financials do |t|
110
+ t.string :symbol # Symbol (string)
111
+ t.string :name # Name (string)
112
+ t.string :sector # Sector (string)
113
+ t.float :price # Price (number)
114
+ t.float :dividend_yield # Dividend Yield (number)
115
+ t.float :price_earnings # Price/Earnings (number)
116
+ t.float :earnings_share # Earnings/Share (number)
117
+ t.float :book_value # Book Value (number)
118
+ t.float :_52_week_low # 52 week low (number)
119
+ t.float :_52_week_high # 52 week high (number)
120
+ t.float :market_cap # Market Cap (number)
121
+ t.float :ebitda # EBITDA (number)
122
+ t.float :price_sales # Price/Sales (number)
123
+ t.float :price_book # Price/Book (number)
124
+ t.string :sec_filings # SEC Filings (string)
125
+ end
126
+ ```
127
+
128
+ 3) (auto-)import all datasets using SQL inserts e.g.
129
+
130
+ ``` sql
131
+ INSERT INTO constituents_financials
132
+ (symbol,
133
+ name,
134
+ sector,
135
+ price,
136
+ dividend_yield,
137
+ price_earnings,
138
+ earnings_share,
139
+ book_value,
140
+ _52_week_low,
141
+ _52_week_high,
142
+ market_cap,
143
+ ebitda,
144
+ price_sales,
145
+ price_book,
146
+ sec_filings)
147
+ VALUES
148
+ ('MMM',
149
+ '3M Co',
150
+ 'Industrials',
151
+ 162.27,
152
+ 2.11,
153
+ 22.28,
154
+ 7.284,
155
+ 25.238,
156
+ 123.61,
157
+ 162.92,
158
+ 104.0,
159
+ 8.467,
160
+ 3.28,
161
+ 6.43,
162
+ 'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=MMM')
163
+ ```
164
+
165
+ 4) (auto-)add ActiveRecord models for all tables.
166
+
167
+
168
+ So what? Now you can use all the "magic" of ActiveRecord to query
169
+ the datasets. Example:
170
+
171
+ ``` ruby
172
+ pp Constituent.count
173
+
174
+ # SELECT COUNT(*) FROM "constituents"
175
+ # => 496
176
+
177
+
178
+ pp Constituent.first
179
+
180
+ # SELECT "constituents".* FROM "constituents" ORDER BY "constituents"."id" ASC LIMIT 1
181
+ # => #<Constituent:0x9f8cb78
182
+ # id: 1,
183
+ # symbol: "MMM",
184
+ # name: "3M Co",
185
+ # sector: "Industrials">
186
+
187
+
188
+ pp Constituent.find_by!( symbol: 'MMM' )
189
+
190
+ # SELECT "constituents".*
191
+ # FROM "constituents"
192
+ # WHERE "constituents"."symbol" = "MMM"
193
+ # LIMIT 1
194
+ # => #<Constituent:0x9f8cb78
195
+ # id: 1,
196
+ # symbol: "MMM",
197
+ # name: "3M Co",
198
+ # sector: "Industrials">
199
+
200
+
201
+ pp Constituent.find_by!( name: '3M Co' )
202
+
203
+ # SELECT "constituents".*
204
+ # FROM "constituents"
205
+ # WHERE "constituents"."name" = "3M Co"
206
+ # LIMIT 1
207
+ # => #<Constituent:0x9f8cb78
208
+ # id: 1,
209
+ # symbol: "MMM",
210
+ # name: "3M Co",
211
+ # sector: "Industrials">
212
+
213
+
214
+ pp Constituent.where( sector: 'Industrials' ).count
215
+
216
+ # SELECT COUNT(*) FROM "constituents"
217
+ # WHERE "constituents"."sector" = "Industrials"
218
+ # => 63
219
+
220
+
221
+ pp Constituent.where( sector: 'Industrials' ).all
222
+
223
+ # SELECT "constituents".*
224
+ # FROM "constituents"
225
+ # WHERE "constituents"."sector" = "Industrials"
226
+ # => [#<Constituent:0x9f8cb78
227
+ # id: 1,
228
+ # symbol: "MMM",
229
+ # name: "3M Co",
230
+ # sector: "Industrials">,
231
+ # #<Constituent:0xa2a4180
232
+ # id: 8,
233
+ # symbol: "ADT",
234
+ # name: "ADT Corp (The)",
235
+ # sector: "Industrials">,...]
236
+ ```
237
+
238
+ and so on
239
+
240
+
241
+
242
+ ### Frequently Asked Questions (F.A.Qs) and Answers
243
+
244
+
245
+ #### Q: How to dowload a data package ("by hand")?
246
+
247
+ Use the `CsvPack::Downloader` class to download a data package
248
+ to your disk (by default data packages get stored in `./pack`).
249
+
250
+ ``` ruby
251
+ dl = CsvPack::Downloader.new
252
+ dl.fetch( 'language-codes' )
253
+ dl.fetch( 's-and-p-500-companies' )
254
+ dl.fetch( 'un-locode')
255
+ ```
256
+
257
+ Will result in:
258
+
259
+ ```
260
+ -- pack
261
+ |-- language-codes
262
+ | |-- data
263
+ | | |-- language-codes-3b2.csv
264
+ | | |-- language-codes.csv
265
+ | | `-- language-codes-full.csv
266
+ | `-- datapackage.json
267
+ |-- s-and-p-500-companies
268
+ | |-- data
269
+ | | |-- constituents.csv
270
+ | | `-- constituents-financials.csv
271
+ | `-- datapackage.json
272
+ `-- un-locode
273
+ |-- data
274
+ | |-- code-list.csv
275
+ | |-- country-codes.csv
276
+ | |-- function-classifiers.csv
277
+ | |-- status-indicators.csv
278
+ | `-- subdivision-codes.csv
279
+ `-- datapackage.json
280
+ ```
281
+
282
+
283
+ #### Q: How to add and import a data package ("by hand")?
284
+
285
+ Use the `CsvPack::Pack` class to read-in a data package
286
+ and add and import into an SQL database.
287
+
288
+ ``` ruby
289
+ pack = CsvPack::Pack.new( './pack/un-locode/datapackage.json' )
290
+ pack.tables.each do |table|
291
+ table.up! # (auto-) add table using SQL create_table via ActiveRecord migration
292
+ table.import! # import all records using SQL inserts
293
+ end
294
+ ```
295
+
296
+
297
+ #### Q: How to connect to a different SQL database?
298
+
299
+ You can connect to any database supported by ActiveRecord. If you do NOT
300
+ establish a connection in your script - the standard (default fallback)
301
+ is using an in-memory SQLite3 database.
302
+
303
+ ##### SQLite
304
+
305
+ For example, to create an SQLite3 database on disk - lets say `mine.db` -
306
+ use in your script (before the `CsvPack.import` statement):
307
+
308
+ ``` ruby
309
+ ActiveRecord::Base.establish_connection( adapter: 'sqlite3',
310
+ database: './mine.db' )
311
+ ```
312
+
313
+ ##### PostgreSQL
314
+
315
+ For example, to connect to a PostgreSQL database use in your script
316
+ (before the `CsvPack.import` statement):
317
+
318
+ ``` ruby
319
+ require 'pg' ## pull-in PostgreSQL (pg) machinery
320
+
321
+ ActiveRecord::Base.establish_connection( adapter: 'postgresql'
322
+ username: 'ruby',
323
+ password: 'topsecret',
324
+ database: 'database' )
325
+ ```
326
+
327
+
328
+
329
+
330
+ ## Install
331
+
332
+ Just install the gem:
333
+
334
+ ```
335
+ $ gem install csvpack
336
+ ```
337
+
338
+
339
+
340
+ ## Alternatives
341
+
342
+ See the "[Tools and Plugins for working with Data Packages](https://frictionlessdata.io/software)"
343
+ page at the Frictionless Data Initiative.
344
+
345
+
346
+ ## License
347
+
348
+
349
+ The `csvpack` scripts are dedicated to the public domain.
350
+ Use it as you please with no restrictions whatsoever.
351
+
352
+ ## Questions? Comments?
353
+
354
+ Send them along to the ruby-talk mailing list. Thanks!
data/Rakefile ADDED
@@ -0,0 +1,32 @@
1
+ require 'hoe'
2
+ require './lib/csvpack/version.rb'
3
+
4
+ Hoe.spec 'csvpack' do
5
+
6
+ self.version = CsvPack::VERSION
7
+
8
+ self.summary = 'csvpack - work with tabular data packages using comma-separated values (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more'
9
+ self.description = summary
10
+
11
+ self.urls = ['https://github.com/csv11/csvpack']
12
+
13
+ self.author = 'Gerald Bauer'
14
+ self.email = 'ruby-talk@ruby-lang.org'
15
+
16
+ # switch extension to .markdown for gihub formatting
17
+ self.readme_file = 'README.md'
18
+ self.history_file = 'HISTORY.md'
19
+
20
+ self.extra_deps = [
21
+ ['logutils', '>=0.6.1'],
22
+ ['fetcher', '>=0.4.5'],
23
+ ['activerecord', '>=5.0.0'],
24
+ ]
25
+
26
+ self.licenses = ['Public Domain']
27
+
28
+ self.spec_extras = {
29
+ required_ruby_version: '>= 2.2.2'
30
+ }
31
+
32
+ end
data/lib/csvpack.rb ADDED
@@ -0,0 +1,52 @@
1
+ # encoding: utf-8
2
+
3
+
4
+ require 'pp'
5
+ require 'forwardable'
6
+
7
+ ### csv
8
+ require 'csv'
9
+ require 'json'
10
+ require 'fileutils'
11
+
12
+
13
+ ### downloader
14
+ require 'fetcher'
15
+
16
+ ### activerecord w/ sqlite3
17
+ ## require 'active_support/all' ## needed for String#binary? method
18
+ require 'active_record'
19
+
20
+
21
+
22
+ # our own code
23
+
24
+ require 'csvpack/version' ## let version always go first
25
+ require 'csvpack/pack'
26
+ require 'csvpack/downloader'
27
+
28
+ module CsvPack
29
+
30
+ def self.import( *args )
31
+ ## step 1: download
32
+ dl = Downloader.new
33
+ args.each do |arg|
34
+ dl.fetch( arg )
35
+ end
36
+
37
+ ## step 2: up 'n' import
38
+ args.each do |arg|
39
+ pack = Pack.new( "./pack/#{arg}/datapackage.json" )
40
+ pack.tables.each do |table|
41
+ table.up!
42
+ table.import!
43
+ end
44
+ end
45
+ end
46
+
47
+ end # module CsvPack
48
+
49
+
50
+
51
+ # say hello
52
+ puts CsvPack.banner if defined?($RUBYLIBS_DEBUG) && $RUBYLIBS_DEBUG
@@ -0,0 +1,62 @@
1
+ # encoding: utf-8
2
+
3
+ module CsvPack
4
+
5
+ class Downloader
6
+
7
+ def initialize( cache_dir='./pack' )
8
+ @cache_dir = cache_dir # todo: check if folder exists now (or on demand)?
9
+ @worker = Fetcher::Worker.new
10
+ end
11
+
12
+ SHORTCUTS = {
13
+ ## to be done
14
+ }
15
+
16
+ def fetch( name_or_shortcut_or_url ) ## todo/check: use (re)name to get/update/etc. why? why not??
17
+
18
+ name = name_or_shortcut_or_url
19
+
20
+ ##
21
+ ## e.g. try
22
+ ## country-list
23
+ ##
24
+
25
+ ## url_base = "http://data.okfn.org/data/core/#{name}"
26
+ url_base = "https://datahub.io/core/#{name}"
27
+ url = "#{url_base}/datapackage.json"
28
+
29
+ dest_dir = "#{@cache_dir}/#{name}"
30
+ FileUtils.mkdir_p( dest_dir )
31
+
32
+ pack_path = "#{dest_dir}/datapackage.json"
33
+ @worker.copy( url, pack_path )
34
+
35
+ h = JSON.parse( File.read( pack_path ) )
36
+ pp h
37
+
38
+ ## copy resources (tables)
39
+ h['resources'].each do |r|
40
+ puts "== resource:"
41
+ pp r
42
+
43
+ res_url = r['url']
44
+
45
+ res_name = r['name']
46
+ res_relative_path = r['path']
47
+ if res_relative_path.nil?
48
+ res_relative_path = "#{res_name}.csv"
49
+ end
50
+
51
+ res_path = "#{dest_dir}/#{res_relative_path}"
52
+ puts "[debug] res_path: >#{res_path}<"
53
+ res_dir = File.dirname( res_path )
54
+ FileUtils.mkdir_p( res_dir )
55
+
56
+ @worker.copy( res_url, res_path )
57
+ end
58
+ end
59
+
60
+ end # class Downloader
61
+
62
+ end # module CsvPack
@@ -0,0 +1,246 @@
1
+ # encoding: utf-8
2
+
3
+
4
+ ## note: for now use in-memory sqlite3 db
5
+
6
+ module CsvPack
7
+
8
+ class Pack
9
+ ## load (tabular) datapackage into memory
10
+ def initialize( path )
11
+
12
+ ## convenience
13
+ ## - check: if path is a folder/directory
14
+ ## (auto-)add /datapackage.json
15
+
16
+ text = File.open( path, 'r:utf-8' ).read
17
+ @h = JSON.parse( text )
18
+
19
+ pack_dir = File.dirname(path)
20
+
21
+ ## pp @h
22
+
23
+ ## read in tables
24
+ @tables = []
25
+ @h['resources'].each do |r|
26
+ ## build table data
27
+ @tables << build_tab( r, pack_dir )
28
+ end
29
+
30
+ ## pp @tables
31
+ end
32
+
33
+ def name() @h['name']; end
34
+ def title() @h['title']; end
35
+ def license() @h['license']; end
36
+
37
+ def tables() @tables; end
38
+ ## convenience method - return first table
39
+ def table() @tables[0]; end
40
+
41
+ def build_tab( h, pack_dir )
42
+ name = h['name']
43
+ relative_path = h['path']
44
+
45
+ if relative_path.nil?
46
+ relative_path = "#{name}.csv"
47
+ puts " warn: no path defined; using fallback '#{relative_path}'"
48
+ end
49
+
50
+ puts " reading resource (table) #{name} (#{relative_path})..."
51
+ pp h
52
+
53
+ path = "#{pack_dir}/#{relative_path}"
54
+ text = File.open( path, 'r:utf-8' ).read
55
+ tab = Tab.new( h, text )
56
+ tab
57
+ end
58
+ end # class Pack
59
+
60
+
61
+ class Tab
62
+ extend Forwardable
63
+
64
+ def initialize( h, text )
65
+ @h = h
66
+
67
+ ## todo parse csv
68
+ ## note: use header options (first row MUST include headers)
69
+ @data = CSV.parse( text, headers: true )
70
+
71
+ pp @data[0]
72
+ end
73
+
74
+ def name() @h['name']; end
75
+ def_delegators :@data, :[], :each
76
+
77
+ def pretty_print( printer )
78
+ printer.text "Tab<#{object_id} @data.name=#{name}, @data.size=#{@data.size}>"
79
+ end
80
+
81
+
82
+ def up!
83
+ # run Migration#up to create table
84
+ connect!
85
+ con = ActiveRecord::Base.connection
86
+
87
+ con.create_table sanitize_name( name ) do |t|
88
+ @h['schema']['fields'].each do |f|
89
+ column_name = sanitize_name(f['name'])
90
+ column_type = DATA_TYPES[f['type']]
91
+
92
+ puts " #{column_type} :#{column_name} => #{f['type']} - #{f['name']}"
93
+
94
+ t.send( column_type.to_sym, column_name.to_sym ) ## todo/check: to_sym needed?
95
+ end
96
+ t.string :name
97
+ end
98
+ end
99
+
100
+ def import!
101
+ connect!
102
+ con = ActiveRecord::Base.connection
103
+
104
+ column_names = []
105
+ column_types = []
106
+ column_placeholders = []
107
+ @h['schema']['fields'].each do |f|
108
+ column_names << sanitize_name(f['name'])
109
+ column_types << DATA_TYPES[f['type']]
110
+ column_placeholders << '?'
111
+ end
112
+
113
+ sql_insert_into = "INSERT INTO #{sanitize_name(name)} (#{column_names.join(',')}) VALUES "
114
+ puts sql_insert_into
115
+
116
+ i=0
117
+ @data.each do |row|
118
+ i+=1
119
+ ## next if i > 3 ## for testing; only insert a couple of recs
120
+
121
+ ## todo: check if all string is ok; or number/date/etc. conversion needed/required?
122
+ values = []
123
+ row.fields.each_with_index do |value,index| # get array of values
124
+ type = column_types[index]
125
+ ## todo add boolean ??
126
+ if value.blank?
127
+ values << 'NULL'
128
+ elsif [:number,:float,:integer].include?( type )
129
+ values << value ## do NOT wrap in quotes (numeric)
130
+ else
131
+ esc_value = value.gsub( "'", "''" ) ## escape quotes e.g. ' becomse \'\', that is, double quotes
132
+ values << "'#{esc_value}'" ## wrap in quotes
133
+ end
134
+ end
135
+ pp values
136
+
137
+ sql = "#{sql_insert_into} (#{values.join(',')})"
138
+ puts sql
139
+ con.execute( sql )
140
+ end
141
+ end # method import!
142
+
143
+
144
+ def import_v1!
145
+ ### note: import via sql for (do NOT use ActiveRecord record class for now)
146
+ con = ActiveRecord::Base.connection
147
+
148
+ column_names = []
149
+ column_types = []
150
+ column_placeholders = []
151
+ @h['schema']['fields'].each do |f|
152
+ column_names << sanitize_name(f['name'])
153
+ column_types << DATA_TYPES[f['type']]
154
+ column_placeholders << '?'
155
+ end
156
+
157
+ sql = "INSERT INTO #{sanitize_name(name)} (#{column_names.join(',')}) VALUES (#{column_placeholders.join(',')})"
158
+ puts sql
159
+
160
+ i=0
161
+ @data.each do |row|
162
+ i+=1
163
+ next if i > 3 ## for testing; only insert a couple of recs
164
+
165
+ ## todo: check if all string is ok; or number/date/etc. conversion needed/required?
166
+ params = row.fields # get array of values
167
+ pp params
168
+ con.exec_insert( sql, 'SQL', params ) # todo/check: 2nd param name used for logging only??
169
+ end
170
+ end # method import!
171
+
172
+
173
+ ### note:
174
+ ## activerecord supports:
175
+ ## :string, :text, :integer, :float, :decimal, :datetime, :time, :date, :binary, :boolean
176
+
177
+ ### mappings for data types
178
+ ## from tabular data package to ActiveRecord migrations
179
+ ##
180
+ # see http://dataprotocols.org/json-table-schema/ (section Field Types and Formats)
181
+ #
182
+ # for now supports these types
183
+
184
+ DATA_TYPES = {
185
+ 'string' => :string, ## use text for larger strings ???
186
+ 'number' => :float, ## note: use float for now
187
+ 'integer' => :integer,
188
+ 'boolean' => :boolean,
189
+ 'datetime' => :datetime,
190
+ 'date' => :date,
191
+ 'time' => :time,
192
+ }
193
+
194
+ def dump_schema
195
+ ## try to dump schema (fields)
196
+ puts "*** dump schema:"
197
+
198
+ @h['schema']['fields'].each do |f|
199
+ puts " #{f['name']} ( #{sanitize_name(f['name'])} ) : #{f['type']}} ( #{DATA_TYPES[f['type']]} )"
200
+ end
201
+
202
+ end
203
+
204
+
205
+ def sanitize_name( ident )
206
+ ##
207
+ ## if identifier starts w/ number add leading underscore (_)
208
+ ## e.g. 52 Week Price => becomes _52_week_price
209
+
210
+ ident = ident.strip.downcase
211
+ ident = ident.gsub( /[\.\-\/]/, '_' ) ## convert some special chars to underscore (e.g. dash -)
212
+ ident = ident.gsub( ' ', '_' )
213
+ ident = ident.gsub( /[^a-z0-9_]/, '' )
214
+ ident = "_#{ident}" if ident =~ /^[0-9]/
215
+ ident
216
+ end
217
+
218
+
219
+ def ar_clazz
220
+ @ar_clazz ||= begin
221
+ clazz = Class.new( ActiveRecord::Base ) do
222
+ ## nothing here for now
223
+ end
224
+ puts "set table_name to #{sanitize_name( name )}"
225
+ clazz.table_name = sanitize_name( name )
226
+ clazz
227
+ end
228
+ @ar_clazz
229
+ end
230
+
231
+ private
232
+
233
+ ## helper to get connection; if not connection established use defaults
234
+ def connect!
235
+ ## todo: cache returned con - why, why not ??
236
+ unless ActiveRecord::Base.connected?
237
+ puts "note: no database connection established; using defaults (e.g. in-memory SQLite database)"
238
+ ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ':memory:' )
239
+ ActiveRecord::Base.logger = Logger.new( STDOUT )
240
+ end
241
+ ActiveRecord::Base.connection
242
+ end
243
+
244
+ end # class Tab
245
+
246
+ end # module CsvPack
@@ -0,0 +1,22 @@
1
+ # encoding: utf-8
2
+
3
+ module CsvPack
4
+
5
+ MAJOR = 0 ## todo: namespace inside version or something - why? why not??
6
+ MINOR = 1
7
+ PATCH = 0
8
+ VERSION = [MAJOR,MINOR,PATCH].join('.')
9
+
10
+ def self.version
11
+ VERSION
12
+ end
13
+
14
+ def self.banner
15
+ "csvpack/#{VERSION} on Ruby #{RUBY_VERSION} (#{RUBY_RELEASE_DATE}) [#{RUBY_PLATFORM}]"
16
+ end
17
+
18
+ def self.root
19
+ File.expand_path( File.dirname(File.dirname(File.dirname(__FILE__))) )
20
+ end
21
+
22
+ end # module CsvPack
data/test/helper.rb ADDED
@@ -0,0 +1,7 @@
1
+
2
+ ## minitest setup
3
+ require 'minitest/autorun'
4
+
5
+
6
+ ## our own code
7
+ require 'csvpack'
@@ -0,0 +1,61 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_companies.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+ class TestCompanies < MiniTest::Test
11
+
12
+ def test_s_and_p_500_companies
13
+
14
+ pak = Datapak::Pak.new( './pak/s-and-p-500-companies/datapackage.json' )
15
+
16
+ puts "name: #{pak.name}"
17
+ puts "title: #{pak.title}"
18
+ puts "license: #{pak.license}"
19
+
20
+ pp pak.tables
21
+ pp pak.table[0]['Symbol']
22
+ pp pak.table[495]['Symbol']
23
+
24
+ ## pak.table.each do |row|
25
+ ## pp row
26
+ ## end
27
+
28
+ puts pak.tables[0].dump_schema
29
+ puts pak.tables[1].dump_schema
30
+
31
+ # database setup 'n' config
32
+ ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ':memory:' )
33
+ ActiveRecord::Base.logger = Logger.new( STDOUT )
34
+
35
+ pak.table.up!
36
+ pak.table.import!
37
+
38
+ pak.tables[1].up!
39
+ pak.tables[1].import!
40
+
41
+
42
+ pp pak.table.ar_clazz
43
+
44
+
45
+ company = pak.table.ar_clazz
46
+
47
+ puts "Company.count: #{company.count}"
48
+ pp company.first
49
+ pp company.find_by!( symbol: 'MMM' )
50
+ pp company.find_by!( name: '3M Co' )
51
+ pp company.where( sector: 'Industrials' ).count
52
+ pp company.where( sector: 'Industrials' ).all
53
+
54
+
55
+ ### todo: try a join w/ belongs_to ??
56
+
57
+ assert true # if we get here - test success
58
+ end
59
+
60
+ end # class TestCompanies
61
+
@@ -0,0 +1,40 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_countries.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+ class TestCountries < MiniTest::Test
11
+
12
+ def test_country_list
13
+ pak = Datapak::Pak.new( './pak/country-list/datapackage.json' )
14
+
15
+ puts "name: #{pak.name}"
16
+ puts "title: #{pak.title}"
17
+ puts "license: #{pak.license}"
18
+
19
+ pp pak.tables
20
+
21
+ ## pak.table.each do |row|
22
+ ## pp row
23
+ ## end
24
+
25
+ puts pak.table.dump_schema
26
+
27
+ # database setup 'n' config
28
+ ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ':memory:' )
29
+ ActiveRecord::Base.logger = Logger.new( STDOUT )
30
+
31
+ pak.table.up!
32
+ pak.table.import!
33
+
34
+ pp pak.table.ar_clazz
35
+
36
+ assert true # if we get here - test success
37
+ end
38
+
39
+ end # class TestCountries
40
+
@@ -0,0 +1,32 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_downloader.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+ class TestDownloader < MiniTest::Test
11
+
12
+ def test_download
13
+
14
+ names = [
15
+ 'country-list',
16
+ 'country-codes',
17
+ 'language-codes',
18
+ 'cpi', ## Annual Consumer Price Index (CPI)
19
+ 'gdp', ## Country, Regional and World GDP (Gross Domestic Product)
20
+ 's-and-p-500-companies', ## S&P 500 Companies with Financial Information
21
+ 'un-locode', ## UN-LOCODE Codelist - note: incl. country-codes.csv
22
+ ]
23
+
24
+ dl = Datapak::Downloader.new
25
+ names.each do |name|
26
+ dl.fetch( name )
27
+ end
28
+
29
+ assert true # if we get here - test success
30
+ end
31
+
32
+ end # class TestDownloader
@@ -0,0 +1,22 @@
1
+ # encoding: utf-8
2
+
3
+ ###
4
+ # to run use
5
+ # ruby -I ./lib -I ./test test/test_import.rb
6
+
7
+
8
+ require 'helper'
9
+
10
+ class TestImport < MiniTest::Test
11
+
12
+ def test_import
13
+
14
+ CsvPack.import(
15
+ 'cpi', ## Annual Consumer Price Index (CPI)
16
+ 'gdp', ## Country, Regional and World GDP (Gross Domestic Product)
17
+ )
18
+
19
+ assert true # if we get here - test success
20
+ end
21
+
22
+ end # class TestImport
metadata ADDED
@@ -0,0 +1,137 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: csvpack
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Gerald Bauer
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2018-08-07 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: logutils
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: 0.6.1
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: 0.6.1
27
+ - !ruby/object:Gem::Dependency
28
+ name: fetcher
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: 0.4.5
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: 0.4.5
41
+ - !ruby/object:Gem::Dependency
42
+ name: activerecord
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: 5.0.0
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: 5.0.0
55
+ - !ruby/object:Gem::Dependency
56
+ name: rdoc
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '4.0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '4.0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: hoe
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '3.16'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '3.16'
83
+ description: csvpack - work with tabular data packages using comma-separated values
84
+ (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated
85
+ values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of
86
+ choice and much more
87
+ email: ruby-talk@ruby-lang.org
88
+ executables: []
89
+ extensions: []
90
+ extra_rdoc_files:
91
+ - HISTORY.md
92
+ - Manifest.txt
93
+ - README.md
94
+ files:
95
+ - HISTORY.md
96
+ - Manifest.txt
97
+ - README.md
98
+ - Rakefile
99
+ - lib/csvpack.rb
100
+ - lib/csvpack/downloader.rb
101
+ - lib/csvpack/pack.rb
102
+ - lib/csvpack/version.rb
103
+ - test/helper.rb
104
+ - test/test_companies.rb
105
+ - test/test_countries.rb
106
+ - test/test_downloader.rb
107
+ - test/test_import.rb
108
+ homepage: https://github.com/csv11/csvpack
109
+ licenses:
110
+ - Public Domain
111
+ metadata: {}
112
+ post_install_message:
113
+ rdoc_options:
114
+ - "--main"
115
+ - README.md
116
+ require_paths:
117
+ - lib
118
+ required_ruby_version: !ruby/object:Gem::Requirement
119
+ requirements:
120
+ - - ">="
121
+ - !ruby/object:Gem::Version
122
+ version: 2.2.2
123
+ required_rubygems_version: !ruby/object:Gem::Requirement
124
+ requirements:
125
+ - - ">="
126
+ - !ruby/object:Gem::Version
127
+ version: '0'
128
+ requirements: []
129
+ rubyforge_project:
130
+ rubygems_version: 2.5.2
131
+ signing_key:
132
+ specification_version: 4
133
+ summary: csvpack - work with tabular data packages using comma-separated values (CSV)
134
+ datafiles in text with datapackage.json; download, read into and query comma-separated
135
+ values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of
136
+ choice and much more
137
+ test_files: []