redshift_extractor 0.1.0 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +60 -15
- data/lib/redshift_extractor/copy.rb +3 -1
- data/lib/redshift_extractor/extractor.rb +9 -3
- data/lib/redshift_extractor/unload.rb +2 -1
- data/lib/redshift_extractor/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d6b1a776fcff84a096fd9e510d1774fcd006191d
|
4
|
+
data.tar.gz: 0a69e8eb4ad78653b47a0151c9e6bd1d3389dbeb
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 32c29672a4b5ed08a44d56ccdd6e4447b0cddf48f1a5b534b0102ac6235cd4c3e927ae8bd9acfaf262f856bbef1a59131902122350d0c77329d65d1a92685bb1
|
7
|
+
data.tar.gz: d2ef768e4d19c90b0cb71617e019ba5365e05a65b2ad5f7f8e8d1acf49d4315cbe27d46518b1d7d25ed13c605d4f2b6028ca5df7f341a3079d87138ac3c392a4
|
data/README.md
CHANGED
@@ -1,40 +1,85 @@
|
|
1
1
|
# RedshiftExtractor
|
2
2
|
|
3
|
-
|
3
|
+
redshift_extractor moves data from one Amazon Redshift cluster to another. Here is how it works:
|
4
4
|
|
5
|
-
|
5
|
+
- Source database
|
6
6
|
|
7
|
-
|
7
|
+
1. [UNLOAD](http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) - runs a SELECT query and exports the results to CSV files in S3.
|
8
8
|
|
9
|
-
|
9
|
+
- Destination database
|
10
|
+
|
11
|
+
2. Drop - Drops a database table (the table in the destination database where the data will be stored).
|
12
|
+
|
13
|
+
3. Create - Creates a database table.
|
14
|
+
|
15
|
+
4. [COPY](http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) - Loads data from S3 into a Redshift database.
|
16
|
+
|
17
|
+
One database connection is established with the source database to UNLOAD the data to S3. After the data is UNLOADed, a second database connection is establed with the destination database to drop/create the database table that will store the data. The final step is to COPY the data from the S3 files to the destination table.
|
18
|
+
|
19
|
+
## Running the Code
|
20
|
+
|
21
|
+
The RedshiftExtractor::Extractor class is instantiated with a long hash of arguments.
|
10
22
|
|
11
23
|
```ruby
|
12
|
-
|
24
|
+
args = {
|
25
|
+
database_config_source: "database_config_source",
|
26
|
+
database_config_destination: "database_config_destination",
|
27
|
+
unload_s3_destination: "unload_s3_destination",
|
28
|
+
unload_select_sql: "unload_select_sql",
|
29
|
+
table_name: "table_name",
|
30
|
+
create_sql: "create_sql",
|
31
|
+
copy_data_source: "copy_data_source",
|
32
|
+
aws_access_key_id: "aws_access_key_id",
|
33
|
+
aws_secret_access_key: "aws_secret_access_key"
|
34
|
+
}
|
35
|
+
|
36
|
+
extractor = RedshiftExtractor::Extractor.new(args)
|
37
|
+
extractor.run
|
13
38
|
```
|
14
39
|
|
15
|
-
|
40
|
+
Here is a description of the parameters:
|
16
41
|
|
17
|
-
|
42
|
+
- database_config_source: A hash that's acceptable for the [Ruby Postgres gem](https://bitbucket.org/ged/ruby-pg/wiki/Home). Here's an example:
|
43
|
+
|
44
|
+
```ruby
|
45
|
+
{
|
46
|
+
dbname: "db_name",
|
47
|
+
user: "username",
|
48
|
+
password: "password",
|
49
|
+
host: "host",
|
50
|
+
sslmode: 'require',
|
51
|
+
port: 5439
|
52
|
+
}
|
53
|
+
```
|
54
|
+
|
55
|
+
- unload_s3_destination: A S3 path, something like `"s3://bucket_name/something_else/"`
|
18
56
|
|
19
|
-
|
57
|
+
- unload_select_sql: A SQL SELECT query that will be run on the source table
|
20
58
|
|
21
|
-
|
59
|
+
- table_name: The table that will be dropped, recreated, and populated with data from the COPY command
|
22
60
|
|
23
|
-
|
61
|
+
- create_sql: The SQL that creates the table_name table (this SQL is run to recreate the table in the step above)
|
24
62
|
|
25
|
-
|
63
|
+
- copy_data_source: This is typically `"#{unload_s3_destination}manifest"`. The UNLOAD command automatically creates a manifest file that can be used by the COPY command to load the data.
|
26
64
|
|
27
|
-
|
65
|
+
- aws_keys: The keys you get from AWS.
|
28
66
|
|
29
|
-
|
67
|
+
## Installation
|
30
68
|
|
31
|
-
|
69
|
+
Add this line to your application's Gemfile:
|
70
|
+
|
71
|
+
```ruby
|
72
|
+
gem 'redshift_extractor'
|
73
|
+
```
|
74
|
+
|
75
|
+
And then execute:
|
76
|
+
|
77
|
+
$ bundle
|
32
78
|
|
33
79
|
## Contributing
|
34
80
|
|
35
81
|
Bug reports and pull requests are welcome on GitHub at https://github.com/MrPowers/redshift_extractor.
|
36
82
|
|
37
|
-
|
38
83
|
## License
|
39
84
|
|
40
85
|
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
|
@@ -10,7 +10,9 @@ module RedshiftExtractor; class Copy
|
|
10
10
|
end
|
11
11
|
|
12
12
|
def copy_sql
|
13
|
-
"copy #{table_name} from '#{data_source}'
|
13
|
+
"copy #{table_name} from '#{data_source}'"\
|
14
|
+
" credentials '#{credentials}'"\
|
15
|
+
" manifest dateformat 'auto' timeformat 'auto' blanksasnull emptyasnull escape gzip removequotes delimiter '|';"
|
14
16
|
end
|
15
17
|
|
16
18
|
def credentials
|
@@ -30,8 +30,11 @@ module RedshiftExtractor; class Extractor
|
|
30
30
|
source_connection.exec(unloader.unload_sql)
|
31
31
|
end
|
32
32
|
|
33
|
+
def dropper
|
34
|
+
Drop.new(table_name: config.table_name)
|
35
|
+
end
|
36
|
+
|
33
37
|
def drop
|
34
|
-
dropper = Drop.new(table_name: config.table_name)
|
35
38
|
destination_connection.exec(dropper.drop_sql)
|
36
39
|
end
|
37
40
|
|
@@ -39,13 +42,16 @@ module RedshiftExtractor; class Extractor
|
|
39
42
|
destination_connection.exec(config.create_sql)
|
40
43
|
end
|
41
44
|
|
42
|
-
def
|
43
|
-
|
45
|
+
def copier
|
46
|
+
Copy.new(
|
44
47
|
aws_access_key_id: config.aws_access_key_id,
|
45
48
|
aws_secret_access_key: config.aws_secret_access_key,
|
46
49
|
data_source: config.copy_data_source,
|
47
50
|
table_name: config.table_name
|
48
51
|
)
|
52
|
+
end
|
53
|
+
|
54
|
+
def copy
|
49
55
|
destination_connection.exec(copier.copy_sql)
|
50
56
|
end
|
51
57
|
|
@@ -10,7 +10,8 @@ module RedshiftExtractor; class Unload
|
|
10
10
|
end
|
11
11
|
|
12
12
|
def unload_sql
|
13
|
-
"UNLOAD('#{escaped_extract_sql}') to '#{s3_destination}'
|
13
|
+
"UNLOAD('#{escaped_extract_sql}') to '#{s3_destination}'"\
|
14
|
+
" CREDENTIALS '#{credentials}' MANIFEST GZIP ADDQUOTES ESCAPE;"
|
14
15
|
end
|
15
16
|
|
16
17
|
def escaped_extract_sql
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: redshift_extractor
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- MrPowers
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-10-
|
11
|
+
date: 2015-10-31 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|