dataduck 0.2.0 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0f87bbaf674b1943242d3ea173a5e34fe00e0724
4
- data.tar.gz: b8fbacadd9323ab917498712c8d4f39f1f5ca907
3
+ metadata.gz: dcc9a5407d2bae97ab0ecb754f95d3c872b92cfe
4
+ data.tar.gz: d32783d694d625367fb5f732602a6c2a997e8241
5
5
  SHA512:
6
- metadata.gz: 40bbfce9c990d1542236c59967c31fe3fe5982c84bed12ccaf604c7ce15f2cebc5432b865dfedac5e95607dc37f20e0d681462ee9e7936e30ffdce8361688c96
7
- data.tar.gz: fdc25e1ddf3a00faeceb13f11f4b7452f4085b3c0e5ca805137cc21174727aabec052dd335c371cd21c78db7ca0af7ef4e5a93f2876df9abfff531ad90bc8612
6
+ metadata.gz: 184f298a735a3928a78d5b8e85e22e498d4e26b554d89a2b4201afa0e91d8b812872e0c72e4a6046970b6b8489ff451ad4b1fd93d2271314d133484762189470
7
+ data.tar.gz: d9e4001147d98b51c3a481f894d52129349ba656d1f6b362845f37618e41b2bdf29005389b106df842ffb8ec02d0b84b45c683a561a49bc4674a88ce392fefcc
data/README.md CHANGED
@@ -18,10 +18,12 @@ See [https://github.com/DataDuckETL/DataDuck/tree/master/examples/example](https
18
18
 
19
19
  ##### Instructions for using DataDuck ETL
20
20
 
21
- Create a new project, then add the following to your Gemfile:
21
+ Create a new, empty directory. Inside this directory, create a file named Gemfile, and add the following to it:
22
22
 
23
23
  ```ruby
24
- gem 'dataduck', :git => 'git://github.com/DataDuckETL/DataDuck.git'
24
+ source 'https://rubygems.org'
25
+
26
+ gem 'dataduck'
25
27
  ```
26
28
 
27
29
  Then execute:
@@ -23,6 +23,7 @@ Gem::Specification.new do |spec|
23
23
 
24
24
  spec.add_runtime_dependency "sequel", '~> 4.19'
25
25
  spec.add_runtime_dependency "pg", '~> 0.16'
26
+ spec.add_runtime_dependency "mysql", "~> 2.9"
26
27
  spec.add_runtime_dependency "aws-sdk", "~> 2.0"
27
28
  spec.add_runtime_dependency "sequel-redshift"
28
29
  end
@@ -0,0 +1,7 @@
1
+ # Documentation
2
+
3
+ The documentation directory is viewable at (http://dataducketl.com/docs)[http://dataducketl.com/docs].
4
+
5
+ # Autogenerated
6
+
7
+ The documentation directory is autogenerated from the main DataDuck ETL git repo. If you would like to add or correct something in the documentation, let us know or make a pull request to (https://github.com/DataDuckETL/DataDuck/docs)[https://github.com/DataDuckETL/DataDuck/docs].
@@ -0,0 +1,6 @@
1
+ "Overview":
2
+ "Welcome": README
3
+ "Getting Started": getting_started
4
+
5
+ "Tables":
6
+ "The Table Class": README
@@ -0,0 +1,24 @@
1
+ # Overview
2
+
3
+ DataDuck ETL is a straightforward, effective extract-transform-load framework for data warehousing. If you want to set
4
+ up a data warehouse, DataDuck ETL makes it simple and straightforward to do.
5
+
6
+ ## Getting Started
7
+
8
+ Getting started with DataDuck ETL takes just a few minutes. For instructions, read the
9
+ [getting started](/docs/overview/getting_started) page.
10
+
11
+ ## Why Use a Data Warehouse
12
+
13
+ If you already have your data in your main database, and probably use a web analytics product like Google Analytics, you
14
+ may be wondering why you'd want a data warehouse anyway.
15
+
16
+ There's many advantages to using a data warehouse, including:
17
+
18
+ - integrating multiple data sources so you can analyze them together
19
+ - helping to ensure data quality by cleaning up the data and running data quality checks
20
+ - having a single source of truth that the entire company trusts
21
+ - connecting business intelligence products for reports and dashboards
22
+ - using the data warehouse to build models, which may get incorporated back in the product, or used for predictions and company decision making
23
+ - performance optimizations so your queries run fast
24
+ - ensuring sensitive data doesn't end up in reports, by not passing it to the data warehouse (encrypted passwords, salts, etc have no practical analytics value, so they are not ETLed)
@@ -0,0 +1,28 @@
1
+ # Getting Started
2
+
3
+ ## Requirements
4
+
5
+ DataDuck ETL currently supports extracting from MySQL and PostgreSQL databases. It supports loading into Amazon
6
+ Redshift. If you would like to extract or load into a database not yet supported, contact us.
7
+
8
+ ## Instructions
9
+
10
+ First, create a new, empty directory. Inside this directory, create a file named Gemfile with the following:
11
+
12
+ ```ruby
13
+ source 'https://rubygems.org'
14
+
15
+ gem 'dataduck'
16
+ ```
17
+
18
+ Then execute:
19
+
20
+ $ bundle install
21
+
22
+ Finally, run the quickstart command:
23
+
24
+ $ dataduck quickstart
25
+
26
+ It will ask you for the credentials to your database, and then create the basic setup for your project. After the setup, your project's ETL can be run by running `ruby src/main.rb`
27
+
28
+ If you would like to run this regularly, such as every night, it's recommended to use the [whenever](https://github.com/javan/whenever) gem to manage a cron job to regularly run the ETL.
@@ -0,0 +1,48 @@
1
+ # The Table Class
2
+
3
+ If you've run the `dataduck quickstart` command, you'll notice a bunch of table files were generated under /src/tables.
4
+ Each of these table files inherits from `DataDuck::Table`, the base table class. Tables need to have the `source` and `output` defined.
5
+
6
+ You may also define transformations with the `transforms` method and validations with `validates` method.
7
+
8
+ ## Example Table
9
+
10
+ The following is an example table.
11
+
12
+ ```ruby
13
+ class Decks < DataDuck::Table
14
+ source :my_database, ["id", "name", "user_id", "cards",
15
+ "num_wins", "num_losses", "created_at", "updated_at",
16
+ "is_drafted", "num_draft_wins", "num_draft_losses"]
17
+
18
+ transforms :calculate_num_totals
19
+
20
+ validates :validates_num_total
21
+
22
+ output({
23
+ :id => :integer,
24
+ :name => :string,
25
+ :user_id => :integer,
26
+ :num_wins => :integer,
27
+ :num_losses => :integer,
28
+ :num_total => :integer,
29
+ :num_draft_total => :integer,
30
+ :created_at => :datetime,
31
+ :updated_at => :datetime,
32
+ :is_drafted => :boolean,
33
+ # Note that num_draft_wins and num_draft_losses
34
+ # are not included in the output, but are used in
35
+ # the transformation.
36
+ })
37
+
38
+ def calculate_num_totals(row)
39
+ row[:num_total] = row[:num_wins] + row[:num_losses]
40
+ row[:num_draft_total] = row[:num_draft_wins] + row[:num_draft_losses]
41
+ row
42
+ end
43
+
44
+ def validates_num_total(row)
45
+ return "Deck id #{ row[:id] } has negative value #{ row[:num_total] } for num_total." if row[:num_total] < 0
46
+ end
47
+ end
48
+ ```
@@ -16,6 +16,21 @@ module DataDuck
16
16
  end
17
17
  end
18
18
 
19
+ def self.prompt_choices(choices = [])
20
+ while true
21
+ print "Enter a number 0 - #{ choices.length - 1}\n"
22
+ choices.each_with_index do |choice, idx|
23
+ choice_name = choice.is_a?(String) ? choice : choice[1]
24
+ print "#{ idx }: #{ choice_name }\n"
25
+ end
26
+ choice = STDIN.gets.strip.to_i
27
+ if 0 <= choice && choice < choices.length
28
+ selected = choices[choice]
29
+ return selected.is_a?(String) ? selected : selected[0]
30
+ end
31
+ end
32
+ end
33
+
19
34
  def self.acceptable_commands
20
35
  ['console', 'quickstart']
21
36
  end
@@ -47,7 +62,20 @@ module DataDuck
47
62
  puts "Welcome to DataDuck!"
48
63
  puts "This quickstart wizard will create your application, assuming the source is a Postgres database and the destination is an Amazon Redshift data warehouse."
49
64
 
50
- puts "Enter the source (Postgres database) hostname:"
65
+
66
+ puts "What kind of database would you like to source from?"
67
+ db_type = prompt_choices([
68
+ [:mysql, "MySQL"],
69
+ [:postgresql, "PostgreSQL"],
70
+ [:other, "other"],
71
+ ])
72
+
73
+ if db_type == :other
74
+ puts "You've selected 'other'. Unfortunately, those are the only choices supported at the moment. Contact us at DataDuckETL.com to request support for your database."
75
+ exit
76
+ end
77
+
78
+ puts "Enter the source hostname:"
51
79
  source_host = STDIN.gets.strip
52
80
 
53
81
  puts "Enter the name of the database when connecting to #{ source_host }:"
@@ -62,8 +90,13 @@ module DataDuck
62
90
  puts "Enter the password:"
63
91
  source_password = STDIN.noecho(&:gets).chomp
64
92
 
65
- db_source = DataDuck::PostgresqlSource.new({
66
- 'type' => 'postgresql',
93
+ db_class = {
94
+ mysql: DataDuck::MysqlSource,
95
+ postgresql: DataDuck::PostgresqlSource,
96
+ }[db_type]
97
+
98
+ db_source = db_class.new({
99
+ 'db_type' => db_type.to_s,
67
100
  'host' => source_host,
68
101
  'database' => source_database,
69
102
  'port' => source_port,
@@ -3,7 +3,7 @@ require_relative 'sql_db_source.rb'
3
3
  require 'sequel'
4
4
 
5
5
  module DataDuck
6
- class PostrgresqlSource < DataDuck::SqlDbSource
6
+ class PostgresqlSource < DataDuck::SqlDbSource
7
7
  def db_type
8
8
  'postgres'
9
9
  end
@@ -1,6 +1,6 @@
1
1
  module DataDuck
2
2
  VERSION_MAJOR = 0
3
- VERSION_MINOR = 2
3
+ VERSION_MINOR = 3
4
4
  VERSION_PATCH = 0
5
5
  VERSION = [VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH].join('.')
6
6
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dataduck
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jeff Pickhardt
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-10-10 00:00:00.000000000 Z
11
+ date: 2015-10-11 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -80,6 +80,20 @@ dependencies:
80
80
  - - "~>"
81
81
  - !ruby/object:Gem::Version
82
82
  version: '0.16'
83
+ - !ruby/object:Gem::Dependency
84
+ name: mysql
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '2.9'
90
+ type: :runtime
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '2.9'
83
97
  - !ruby/object:Gem::Dependency
84
98
  name: aws-sdk
85
99
  requirement: !ruby/object:Gem::Requirement
@@ -127,6 +141,11 @@ files:
127
141
  - bin/dataduck
128
142
  - bin/setup
129
143
  - dataduck.gemspec
144
+ - docs/README.md
145
+ - docs/contents.yml
146
+ - docs/overview/README.md
147
+ - docs/overview/getting_started.md
148
+ - docs/tables/README.md
130
149
  - examples/example/.gitignore
131
150
  - examples/example/.ruby-version
132
151
  - examples/example/Gemfile