RubyGems - dataduck - Versions diffs - 0.2.0 → 0.3.0 - Mend

dataduck 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml +4 -4
data/README.md +4 -2
data/dataduck.gemspec +1 -0
data/docs/README.md +7 -0
data/docs/contents.yml +6 -0
data/docs/overview/README.md +24 -0
data/docs/overview/getting_started.md +28 -0
data/docs/tables/README.md +48 -0
data/lib/dataduck/commands.rb +36 -3
data/lib/dataduck/postgresql_source.rb +1 -1
data/lib/dataduck/version.rb +1 -1
metadata +21 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 0f87bbaf674b1943242d3ea173a5e34fe00e0724
-  data.tar.gz: b8fbacadd9323ab917498712c8d4f39f1f5ca907
+  metadata.gz: dcc9a5407d2bae97ab0ecb754f95d3c872b92cfe
+  data.tar.gz: d32783d694d625367fb5f732602a6c2a997e8241
 SHA512:
-  metadata.gz: 40bbfce9c990d1542236c59967c31fe3fe5982c84bed12ccaf604c7ce15f2cebc5432b865dfedac5e95607dc37f20e0d681462ee9e7936e30ffdce8361688c96
-  data.tar.gz: fdc25e1ddf3a00faeceb13f11f4b7452f4085b3c0e5ca805137cc21174727aabec052dd335c371cd21c78db7ca0af7ef4e5a93f2876df9abfff531ad90bc8612
+  metadata.gz: 184f298a735a3928a78d5b8e85e22e498d4e26b554d89a2b4201afa0e91d8b812872e0c72e4a6046970b6b8489ff451ad4b1fd93d2271314d133484762189470
+  data.tar.gz: d9e4001147d98b51c3a481f894d52129349ba656d1f6b362845f37618e41b2bdf29005389b106df842ffb8ec02d0b84b45c683a561a49bc4674a88ce392fefcc

data/README.md CHANGED

@@ -18,10 +18,12 @@ See [https://github.com/DataDuckETL/DataDuck/tree/master/examples/example](https
 ##### Instructions for using DataDuck ETL
-Create a new project, then add the following to your Gemfile:
+Create a new, empty directory. Inside this directory, create a file named Gemfile, and add the following to it:
 ```ruby
-gem 'dataduck', :git => 'git://github.com/DataDuckETL/DataDuck.git'
+source 'https://rubygems.org'
+gem 'dataduck'
 ```
 Then execute:

data/dataduck.gemspec CHANGED

@@ -23,6 +23,7 @@ Gem::Specification.new do |spec|
   spec.add_runtime_dependency "sequel", '~> 4.19'
   spec.add_runtime_dependency "pg", '~> 0.16'
+  spec.add_runtime_dependency "mysql", "~> 2.9"
   spec.add_runtime_dependency "aws-sdk", "~> 2.0"
   spec.add_runtime_dependency "sequel-redshift"
 end

data/docs/README.md ADDED

@@ -0,0 +1,7 @@
+# Documentation
+The documentation directory is viewable at (http://dataducketl.com/docs)[http://dataducketl.com/docs].
+# Autogenerated
+The documentation directory is autogenerated from the main DataDuck ETL git repo. If you would like to add or correct something in the documentation, let us know or make a pull request to (https://github.com/DataDuckETL/DataDuck/docs)[https://github.com/DataDuckETL/DataDuck/docs].

data/docs/contents.yml ADDED

@@ -0,0 +1,6 @@
+"Overview":
+  "Welcome": README
+  "Getting Started": getting_started
+"Tables":
+  "The Table Class": README

data/docs/overview/README.md ADDED

@@ -0,0 +1,24 @@
+# Overview
+DataDuck ETL is a straightforward, effective extract-transform-load framework for data warehousing. If you want to set
+up a data warehouse, DataDuck ETL makes it simple and straightforward to do.
+## Getting Started
+Getting started with DataDuck ETL takes just a few minutes. For instructions, read the
+[getting started](/docs/overview/getting_started) page.
+## Why Use a Data Warehouse
+If you already have your data in your main database, and probably use a web analytics product like Google Analytics, you
+may be wondering why you'd want a data warehouse anyway.
+There's many advantages to using a data warehouse, including:
+- integrating multiple data sources so you can analyze them together
+- helping to ensure data quality by cleaning up the data and running data quality checks
+- having a single source of truth that the entire company trusts
+- connecting business intelligence products for reports and dashboards
+- using the data warehouse to build models, which may get incorporated back in the product, or used for predictions and company decision making
+- performance optimizations so your queries run fast
+- ensuring sensitive data doesn't end up in reports, by not passing it to the data warehouse (encrypted passwords, salts, etc have no practical analytics value, so they are not ETLed)

data/docs/overview/getting_started.md ADDED

@@ -0,0 +1,28 @@
+# Getting Started
+## Requirements
+DataDuck ETL currently supports extracting from MySQL and PostgreSQL databases. It supports loading into Amazon
+Redshift. If you would like to extract or load into a database not yet supported, contact us.
+## Instructions
+First, create a new, empty directory. Inside this directory, create a file named Gemfile with the following:
+```ruby
+source 'https://rubygems.org'
+gem 'dataduck'
+```
+Then execute:
+    $ bundle install
+Finally, run the quickstart command:
+    $ dataduck quickstart
+It will ask you for the credentials to your database, and then create the basic setup for your project. After the setup, your project's ETL can be run by running `ruby src/main.rb`
+If you would like to run this regularly, such as every night, it's recommended to use the [whenever](https://github.com/javan/whenever) gem to manage a cron job to regularly run the ETL.

data/docs/tables/README.md ADDED

@@ -0,0 +1,48 @@
+# The Table Class
+If you've run the `dataduck quickstart` command, you'll notice a bunch of table files were generated under /src/tables.
+Each of these table files inherits from `DataDuck::Table`, the base table class. Tables need to have the `source` and `output` defined.
+You may also define transformations with the `transforms` method and validations with `validates` method.
+## Example Table
+The following is an example table.
+```ruby
+class Decks < DataDuck::Table
+  source :my_database, ["id", "name", "user_id", "cards",
+      "num_wins", "num_losses", "created_at", "updated_at",
+      "is_drafted", "num_draft_wins", "num_draft_losses"]
+  transforms :calculate_num_totals
+  validates :validates_num_total
+  output({
+      :id => :integer,
+      :name => :string,
+      :user_id => :integer,
+      :num_wins => :integer,
+      :num_losses => :integer,
+      :num_total => :integer,
+      :num_draft_total => :integer,
+      :created_at => :datetime,
+      :updated_at => :datetime,
+      :is_drafted => :boolean,
+      # Note that num_draft_wins and num_draft_losses
+      # are not included in the output, but are used in
+      # the transformation.
+  })
+  def calculate_num_totals(row)
+    row[:num_total] = row[:num_wins] + row[:num_losses]
+    row[:num_draft_total] = row[:num_draft_wins] + row[:num_draft_losses]
+    row
+  end
+  def validates_num_total(row)
+    return "Deck id #{ row[:id] } has negative value #{ row[:num_total] } for num_total." if row[:num_total] < 0
+  end
+end
+```

data/lib/dataduck/commands.rb CHANGED

@@ -16,6 +16,21 @@ module DataDuck
       end
     end
+    def self.prompt_choices(choices = [])
+      while true
+        print "Enter a number 0 - #{ choices.length - 1}\n"
+        choices.each_with_index do |choice, idx|
+          choice_name = choice.is_a?(String) ? choice : choice[1]
+          print "#{ idx }: #{ choice_name }\n"
+        end
+        choice = STDIN.gets.strip.to_i
+        if 0 <= choice && choice < choices.length
+          selected = choices[choice]
+          return selected.is_a?(String) ? selected : selected[0]
+        end
+      end
+    end
     def self.acceptable_commands
       ['console', 'quickstart']
     end
@@ -47,7 +62,20 @@ module DataDuck
       puts "Welcome to DataDuck!"
       puts "This quickstart wizard will create your application, assuming the source is a Postgres database and the destination is an Amazon Redshift data warehouse."
-      puts "Enter the source (Postgres database) hostname:"
+      puts "What kind of database would you like to source from?"
+      db_type = prompt_choices([
+          [:mysql, "MySQL"],
+          [:postgresql, "PostgreSQL"],
+          [:other, "other"],
+      ])
+      if db_type == :other
+        puts "You've selected 'other'. Unfortunately, those are the only choices supported at the moment. Contact us at DataDuckETL.com to request support for your database."
+        exit
+      end
+      puts "Enter the source hostname:"
       source_host = STDIN.gets.strip
       puts "Enter the name of the database when connecting to #{ source_host }:"
@@ -62,8 +90,13 @@ module DataDuck
       puts "Enter the password:"
       source_password = STDIN.noecho(&:gets).chomp
-      db_source = DataDuck::PostgresqlSource.new({
-          'type' => 'postgresql',
+      db_class = {
+          mysql: DataDuck::MysqlSource,
+          postgresql: DataDuck::PostgresqlSource,
+      }[db_type]
+      db_source = db_class.new({
+          'db_type' => db_type.to_s,
           'host' => source_host,
           'database' => source_database,
           'port' => source_port,

data/lib/dataduck/postgresql_source.rb CHANGED

@@ -3,7 +3,7 @@ require_relative 'sql_db_source.rb'
 require 'sequel'
 module DataDuck
-  class PostrgresqlSource < DataDuck::SqlDbSource
+  class PostgresqlSource < DataDuck::SqlDbSource
     def db_type
       'postgres'
     end

data/lib/dataduck/version.rb CHANGED

@@ -1,6 +1,6 @@
 module DataDuck
   VERSION_MAJOR = 0
-  VERSION_MINOR = 2
+  VERSION_MINOR = 3
   VERSION_PATCH = 0
   VERSION = [VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH].join('.')
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: dataduck
 version: !ruby/object:Gem::Version
-  version: 0.2.0
+  version: 0.3.0
 platform: ruby
 authors:
 - Jeff Pickhardt
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2015-10-10 00:00:00.000000000 Z
+date: 2015-10-11 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -80,6 +80,20 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '0.16'
+- !ruby/object:Gem::Dependency
+  name: mysql
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.9'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.9'
 - !ruby/object:Gem::Dependency
   name: aws-sdk
   requirement: !ruby/object:Gem::Requirement
@@ -127,6 +141,11 @@ files:
 - bin/dataduck
 - bin/setup
 - dataduck.gemspec
+- docs/README.md
+- docs/contents.yml
+- docs/overview/README.md
+- docs/overview/getting_started.md
+- docs/tables/README.md
 - examples/example/.gitignore
 - examples/example/.ruby-version
 - examples/example/Gemfile