redshift_extractor 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +60 -15
- data/lib/redshift_extractor/copy.rb +3 -1
- data/lib/redshift_extractor/extractor.rb +9 -3
- data/lib/redshift_extractor/unload.rb +2 -1
- data/lib/redshift_extractor/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d6b1a776fcff84a096fd9e510d1774fcd006191d
|
4
|
+
data.tar.gz: 0a69e8eb4ad78653b47a0151c9e6bd1d3389dbeb
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 32c29672a4b5ed08a44d56ccdd6e4447b0cddf48f1a5b534b0102ac6235cd4c3e927ae8bd9acfaf262f856bbef1a59131902122350d0c77329d65d1a92685bb1
|
7
|
+
data.tar.gz: d2ef768e4d19c90b0cb71617e019ba5365e05a65b2ad5f7f8e8d1acf49d4315cbe27d46518b1d7d25ed13c605d4f2b6028ca5df7f341a3079d87138ac3c392a4
|
data/README.md
CHANGED
@@ -1,40 +1,85 @@
|
|
1
1
|
# RedshiftExtractor
|
2
2
|
|
3
|
-
|
3
|
+
redshift_extractor moves data from one Amazon Redshift cluster to another. Here is how it works:
|
4
4
|
|
5
|
-
|
5
|
+
- Source database
|
6
6
|
|
7
|
-
|
7
|
+
1. [UNLOAD](http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) - runs a SELECT query and exports the results to CSV files in S3.
|
8
8
|
|
9
|
-
|
9
|
+
- Destination database
|
10
|
+
|
11
|
+
2. Drop - Drops a database table (the table in the destination database where the data will be stored).
|
12
|
+
|
13
|
+
3. Create - Creates a database table.
|
14
|
+
|
15
|
+
4. [COPY](http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) - Loads data from S3 into a Redshift database.
|
16
|
+
|
17
|
+
One database connection is established with the source database to UNLOAD the data to S3. After the data is UNLOADed, a second database connection is establed with the destination database to drop/create the database table that will store the data. The final step is to COPY the data from the S3 files to the destination table.
|
18
|
+
|
19
|
+
## Running the Code
|
20
|
+
|
21
|
+
The RedshiftExtractor::Extractor class is instantiated with a long hash of arguments.
|
10
22
|
|
11
23
|
```ruby
|
12
|
-
|
24
|
+
args = {
|
25
|
+
database_config_source: "database_config_source",
|
26
|
+
database_config_destination: "database_config_destination",
|
27
|
+
unload_s3_destination: "unload_s3_destination",
|
28
|
+
unload_select_sql: "unload_select_sql",
|
29
|
+
table_name: "table_name",
|
30
|
+
create_sql: "create_sql",
|
31
|
+
copy_data_source: "copy_data_source",
|
32
|
+
aws_access_key_id: "aws_access_key_id",
|
33
|
+
aws_secret_access_key: "aws_secret_access_key"
|
34
|
+
}
|
35
|
+
|
36
|
+
extractor = RedshiftExtractor::Extractor.new(args)
|
37
|
+
extractor.run
|
13
38
|
```
|
14
39
|
|
15
|
-
|
40
|
+
Here is a description of the parameters:
|
16
41
|
|
17
|
-
|
42
|
+
- database_config_source: A hash that's acceptable for the [Ruby Postgres gem](https://bitbucket.org/ged/ruby-pg/wiki/Home). Here's an example:
|
43
|
+
|
44
|
+
```ruby
|
45
|
+
{
|
46
|
+
dbname: "db_name",
|
47
|
+
user: "username",
|
48
|
+
password: "password",
|
49
|
+
host: "host",
|
50
|
+
sslmode: 'require',
|
51
|
+
port: 5439
|
52
|
+
}
|
53
|
+
```
|
54
|
+
|
55
|
+
- unload_s3_destination: A S3 path, something like `"s3://bucket_name/something_else/"`
|
18
56
|
|
19
|
-
|
57
|
+
- unload_select_sql: A SQL SELECT query that will be run on the source table
|
20
58
|
|
21
|
-
|
59
|
+
- table_name: The table that will be dropped, recreated, and populated with data from the COPY command
|
22
60
|
|
23
|
-
|
61
|
+
- create_sql: The SQL that creates the table_name table (this SQL is run to recreate the table in the step above)
|
24
62
|
|
25
|
-
|
63
|
+
- copy_data_source: This is typically `"#{unload_s3_destination}manifest"`. The UNLOAD command automatically creates a manifest file that can be used by the COPY command to load the data.
|
26
64
|
|
27
|
-
|
65
|
+
- aws_keys: The keys you get from AWS.
|
28
66
|
|
29
|
-
|
67
|
+
## Installation
|
30
68
|
|
31
|
-
|
69
|
+
Add this line to your application's Gemfile:
|
70
|
+
|
71
|
+
```ruby
|
72
|
+
gem 'redshift_extractor'
|
73
|
+
```
|
74
|
+
|
75
|
+
And then execute:
|
76
|
+
|
77
|
+
$ bundle
|
32
78
|
|
33
79
|
## Contributing
|
34
80
|
|
35
81
|
Bug reports and pull requests are welcome on GitHub at https://github.com/MrPowers/redshift_extractor.
|
36
82
|
|
37
|
-
|
38
83
|
## License
|
39
84
|
|
40
85
|
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
|
@@ -10,7 +10,9 @@ module RedshiftExtractor; class Copy
|
|
10
10
|
end
|
11
11
|
|
12
12
|
def copy_sql
|
13
|
-
"copy #{table_name} from '#{data_source}'
|
13
|
+
"copy #{table_name} from '#{data_source}'"\
|
14
|
+
" credentials '#{credentials}'"\
|
15
|
+
" manifest dateformat 'auto' timeformat 'auto' blanksasnull emptyasnull escape gzip removequotes delimiter '|';"
|
14
16
|
end
|
15
17
|
|
16
18
|
def credentials
|
@@ -30,8 +30,11 @@ module RedshiftExtractor; class Extractor
|
|
30
30
|
source_connection.exec(unloader.unload_sql)
|
31
31
|
end
|
32
32
|
|
33
|
+
def dropper
|
34
|
+
Drop.new(table_name: config.table_name)
|
35
|
+
end
|
36
|
+
|
33
37
|
def drop
|
34
|
-
dropper = Drop.new(table_name: config.table_name)
|
35
38
|
destination_connection.exec(dropper.drop_sql)
|
36
39
|
end
|
37
40
|
|
@@ -39,13 +42,16 @@ module RedshiftExtractor; class Extractor
|
|
39
42
|
destination_connection.exec(config.create_sql)
|
40
43
|
end
|
41
44
|
|
42
|
-
def
|
43
|
-
|
45
|
+
def copier
|
46
|
+
Copy.new(
|
44
47
|
aws_access_key_id: config.aws_access_key_id,
|
45
48
|
aws_secret_access_key: config.aws_secret_access_key,
|
46
49
|
data_source: config.copy_data_source,
|
47
50
|
table_name: config.table_name
|
48
51
|
)
|
52
|
+
end
|
53
|
+
|
54
|
+
def copy
|
49
55
|
destination_connection.exec(copier.copy_sql)
|
50
56
|
end
|
51
57
|
|
@@ -10,7 +10,8 @@ module RedshiftExtractor; class Unload
|
|
10
10
|
end
|
11
11
|
|
12
12
|
def unload_sql
|
13
|
-
"UNLOAD('#{escaped_extract_sql}') to '#{s3_destination}'
|
13
|
+
"UNLOAD('#{escaped_extract_sql}') to '#{s3_destination}'"\
|
14
|
+
" CREDENTIALS '#{credentials}' MANIFEST GZIP ADDQUOTES ESCAPE;"
|
14
15
|
end
|
15
16
|
|
16
17
|
def escaped_extract_sql
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: redshift_extractor
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- MrPowers
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-10-
|
11
|
+
date: 2015-10-31 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|