RubyGems - dataduck - Versions diffs - 1.1.0 → 1.2.0 - Mend

dataduck 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml +4 -4
data/docs/tables/README.md +39 -1
data/lib/dataduck/commands.rb +4 -1
data/lib/dataduck/redshift_destination.rb +2 -1
data/lib/dataduck/table.rb +4 -0
data/lib/dataduck/version.rb +1 -1
metadata +3 -3

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 4c85a45aea48dc00fdef79467a5148e71a32c691
-  data.tar.gz: 3c6a10310fcde7ca07efee6ed127d5339174b3dd
+  metadata.gz: a48e9e513313a27d24f94c23e4f2659ebecf1c1d
+  data.tar.gz: a89623ebfe52f3dd24bb239aea5e366aeed0e99f
 SHA512:
-  metadata.gz: 14b0a7a521cf17446418dbf7536bbf60cebe32e38e2eae34464b96980d1e6318aa5f21be430cda827098771c63dd607f1b7b3fc21f1b27f2590e9bfe3ea2bd7b
-  data.tar.gz: fd14a28b61f6e75c4f5af591c39ec72b20abcfaf659de7ef0e63cf80c943f42f02567e61a1c2757e368e6ba322a90c2279f057d070ab68d93d9d0605551499ad
+  metadata.gz: 07c9acd3135428eda030cf6fe7270ee0e827448f6227761627c2ef546cc861faef4dd0151a446aa91b2ba5267e1014d45dac5c2b27daae5e58d09d0116b63a0c
+  data.tar.gz: 468134eb9bdffbc2996c9a7060449bca8625d5a8597a2dc81ed0ea04c3d1aa26cb7932580f33ea1da440d645534fa54c4a0f4d30720cb75f7ad006fbacc6ac89

data/docs/tables/README.md CHANGED

@@ -45,9 +45,47 @@ and leave the rest of the process (and the Redshift loading) up to DataDuck.
 ## The `extract!` method
 The `extract!` method takes one argument: the destination. It then extracts the data from the source necessary to load
-data into the destination. If you are writing your own Table class with some custom third party API, you will probably
+data into the destination. If you are writing your own Table class with some custom third party API, you will probably
 want to overwrite this method.
+## Overriding indexes (sortkeys)
+By sortkey, Redshift means what other databases would generally call indexes. DataDuck ETL will use `id` and `created_at` as sortkeys by default. If you would like to specify your own, simply overwrite the `indexes` method on your table, like this example:
+```ruby
+class Decks < DataDuck::Table
+  # source info goes here
+  def indexes
+    ["id", "user_id"]
+  end
+  # output info goes here
+end
+```
+## Overriding distkeys and diststyles
+For large datasets, Redshift can distribute the data across multiple compute nodes according to your distkey and diststyle. To use these, simply overwrite the distribution_key and distribution_style methods.
+```ruby
+class Decks < DataDuck::Table
+  # source info goes here
+  def distribution_key
+    "company_id"
+  end
+  def distribution_style
+    "all"
+  end
+  # output info goes here
+end
+```
+For more info, read: [http://docs.aws.amazon.com/redshift/latest/dg/t_Distributing_data.html](Distributing Data)
 ## Example Table
 The following is an example table.

data/lib/dataduck/commands.rb CHANGED

@@ -240,7 +240,10 @@ module DataDuck
       puts "Connection successful. Detected #{ table_names.length } tables."
       puts "Creating scaffolding..."
       table_names.each do |table_name|
-        DataDuck::Commands.quickstart_create_table(table_name, db_source)
+        begin
+          DataDuck::Commands.quickstart_create_table(table_name, db_source)
+        rescue
+        end
       end
       config_obj = {

data/lib/dataduck/redshift_destination.rb CHANGED

@@ -72,9 +72,10 @@ module DataDuck
       props_string = props_array.join(', ')
       distribution_clause = table.distribution_key ? "DISTKEY(#{ table.distribution_key })" : ""
+      distribution_style_clause = table.distribution_style ? "DISTSTYLE #{ distribution_style }" : ""
       index_clause = table.indexes.length > 0 ? "INTERLEAVED SORTKEY (#{ table.indexes.join(',') })" : ""
-      "CREATE TABLE IF NOT EXISTS #{ table_name } (#{ props_string }) #{ distribution_clause } #{ index_clause }"
+      "CREATE TABLE IF NOT EXISTS #{ table_name } (#{ props_string }) #{ distribution_clause } #{ distribution_style_clause } #{ index_clause }"
     end
     def create_output_tables!(table)

data/lib/dataduck/table.rb CHANGED

@@ -70,6 +70,10 @@ module DataDuck
       end
     end
+    def distribution_style
+      nil
+    end
     def etl!(destinations, options = {})
       if destinations.length != 1
         raise ArgumentError.new("DataDuck can only etl to one destination at a time for now.")

data/lib/dataduck/version.rb CHANGED

@@ -1,7 +1,7 @@
 module DataDuck
   if !defined?(DataDuck::VERSION)
     VERSION_MAJOR = 1
-    VERSION_MINOR = 1
+    VERSION_MINOR = 2
     VERSION_PATCH = 0
     VERSION = [VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH].join('.')
   end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: dataduck
 version: !ruby/object:Gem::Version
-  version: 1.1.0
+  version: 1.2.0
 platform: ruby
 authors:
 - Jeff Pickhardt
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-09-05 00:00:00.000000000 Z
+date: 2017-03-14 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -295,7 +295,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.4.6
+rubygems_version: 2.6.8
 signing_key:
 specification_version: 4
 summary: A straightforward, effective ETL framework.