redshifter 0.4.0 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 09710d63e1dafe18da8216b0d4de3662ca660e62
4
- data.tar.gz: de3081992b84e55b39ae4c84b4d81fb18db40a4f
3
+ metadata.gz: 904cec0aad5170e27712addd72009852c1b3a6d5
4
+ data.tar.gz: 6f5d606f0862a1522288987636928c1be4d0e6e1
5
5
  SHA512:
6
- metadata.gz: fabe694521c2268d6b55cf634ca2ef62808695641a307eb60f12672e9e05049ccf0cdf3bf6a6ea1176e4e0a1b1d6d67a2e001465233ff025798c55d71553788f
7
- data.tar.gz: 658c6ccf1c6dbcf9a21b14f658d4707f2af9f192d400050bc6d559d70064b787f8cef37b7cb4715b617b81e94376d702be9265fe6b7cff407029c20514c1ad4c
6
+ metadata.gz: 8cc099b110abcd0d0696219604388879215ebcac444d3f36dd098840a8b8b49cbd91d727383493a857c216633ce6f86970122d298f42b2ab02b2094b091a0c6d
7
+ data.tar.gz: 33a85ccf7d1b28bacb02d36354e0e43406f73edd0ef6d9f49600a9791e05c0c0a7caea1e5a798bffabf64f12bd02d71237f4de9e0936a7309f0181c6e8fd2707
data/README.md CHANGED
@@ -16,6 +16,7 @@ Feature Roadmap:
16
16
  - 0.2.4 - New config format; update and replace rake tasks available
17
17
  - 0.3.0 - Public version
18
18
  - 0.4.0 - Make S3 region configurable
19
+ - 0.5.0 - Added functionality for source_table_filter to selectively export rows
19
20
 
20
21
  ## Installation
21
22
 
@@ -79,6 +80,8 @@ Redshifter.config.tables = {
79
80
  source_table_name: 'books',
80
81
  # [required] Prefixing your redshift table with its source is recommended
81
82
  redshift_table_name: 'app_name_books',
83
+ # [optional] Provide a conditional to specify which rows get exported to Redshift
84
+ source_table_filter: 'title IS NOT NULL',
82
85
  # [required] Columns with Redshift datatypes to create; may differ from source DB
83
86
  redshift_columns: {
84
87
  'id' => 'INTEGER',
@@ -120,7 +123,7 @@ Redshifter.config.tables = {
120
123
  ```
121
124
  $ rake redshifter:replace[books_with_export_at]
122
125
  ```
123
-
126
+
124
127
  ### Schedule a Redshifter::Job::UpdateRedshiftTableJob resque job per each table you want to export updates for
125
128
 
126
129
  Then schedule this meta job to run in `resque_schedule.yml` to run once at 10:00pm
@@ -134,6 +137,36 @@ etl_books_to_redshift:
134
137
  description: 'Export the books table to Redshift'
135
138
  ```
136
139
 
140
+ ### Monitoring Rake tasks with New Relic (optional)
141
+ New Relic offers Rake task instrumentation as of version `3.13.0` of their `newrelic_rpm` agent. Redshifter does not directly use or depend on `newrelic_rpm`. You must use New Relic's rake instrumentation and explicitly identify the tasks in your app that you want to monitor. See [New Relic docs](https://docs.newrelic.com/docs/agents/ruby-agent/background-jobs/rake-instrumentation).
142
+
143
+ In addition, to setting `attributes.include` and `rake.tasks` as defined in their docs, it also seems to be necessary to manually start the agent synchonously in your Rakefile to assure that fast running tasks are reported.
144
+
145
+ In summary, your `newrelic.yml` should include keys and values like this to monitor redshifter rake tasks:
146
+ ```yaml
147
+ common: &default_settings
148
+ #...
149
+ attributes:
150
+ include: job.rake.* # allows rake args reporting
151
+ rake:
152
+ # rake task monitoring must be white listed here AND not blacklisted via
153
+ # autostart.blacklisted_* config values
154
+ tasks: ['redshifter:update', 'redshifter:replace']
155
+ #...
156
+ ```
157
+
158
+ and your `Rakefile` should end up looking something like this:
159
+ ```ruby
160
+ # Rakefile
161
+ # ...
162
+ require 'redshifter/tasks'
163
+
164
+ # Force agent start assuring rake.tasks listed in newrelic.yml are instrumented
165
+ NewRelic::Agent.manual_start(sync_startup: true) if Rails.env.production?
166
+ # ...
167
+ ```
168
+
169
+
137
170
  ## Development
138
171
 
139
172
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `bin/console` for an interactive prompt that will allow you to experiment.
@@ -14,9 +14,10 @@ module Redshifter
14
14
  @redshift_sort_keys = config[:redshift_sort_keys]
15
15
  @redshift_sort_style = config[:redshift_sort_style]
16
16
  @redshift_primary_key = config[:redshift_primary_key]
17
+ @source_table_filter = config[:source_table_filter] || 1
17
18
  end
18
19
 
19
- attr_reader :source_table_name, :redshift_table_name
20
+ attr_reader :source_table_name, :redshift_table_name, :source_table_filter
20
21
 
21
22
  def redshift_column_names
22
23
  redshift_columns.keys
@@ -74,7 +74,7 @@ module Redshifter
74
74
  end
75
75
 
76
76
  def select_batch_sql(columns:, batch_size:, start_id:)
77
- "select #{columns.join(', ')} from #{table.source_table_name} where updated_at >= '#{since}' AND id >= #{start_id} ORDER BY id ASC limit #{batch_size}"
77
+ "SELECT #{columns.join(', ')} FROM #{table.source_table_name} WHERE (#{table.source_table_filter}) AND updated_at >= '#{since}' AND id >= #{start_id} ORDER BY id ASC limit #{batch_size}"
78
78
  end
79
79
  end
80
80
  end
@@ -1,3 +1,3 @@
1
1
  module Redshifter
2
- VERSION = "0.4.0"
2
+ VERSION = "0.5.0"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: redshifter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.0
4
+ version: 0.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Justin Richard
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-02-04 00:00:00.000000000 Z
11
+ date: 2016-03-30 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: dynosaur
@@ -173,7 +173,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
173
173
  version: '0'
174
174
  requirements: []
175
175
  rubyforge_project:
176
- rubygems_version: 2.2.2
176
+ rubygems_version: 2.4.5.1
177
177
  signing_key:
178
178
  specification_version: 4
179
179
  summary: ETL processing jobs to exporting Rails model tables to Redshift