RubyGems - embulk-output-redshift - Versions diffs - 0.2.4 → 0.3.0 - Mend

embulk-output-redshift 0.2.4 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/README.md +97 -47
data/build.gradle +9 -9
data/classpath/embulk-output-jdbc-0.3.0.jar +0 -0
data/classpath/embulk-output-postgresql-0.3.0.jar +0 -0
data/classpath/embulk-output-redshift-0.3.0.jar +0 -0
data/lib/embulk/output/redshift.rb +3 -3
data/src/main/java/org/embulk/output/RedshiftOutputPlugin.java +39 -8
data/src/main/java/org/embulk/output/redshift/RedshiftCopyBatchInsert.java +214 -216
data/src/main/java/org/embulk/output/redshift/RedshiftOutputConnection.java +122 -122
data/src/main/java/org/embulk/output/redshift/RedshiftOutputConnector.java +40 -40
metadata +5 -5
data/classpath/embulk-output-jdbc-0.2.4.jar +0 -0
data/classpath/embulk-output-postgresql-0.2.4.jar +0 -0
data/classpath/embulk-output-redshift-0.2.4.jar +0 -0

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 58358c6f921e03a15a58929b8241a40cc97f5ff1
-  data.tar.gz: a7b29d7fdd1247da37dac5687c132eef280e6b30
+  metadata.gz: f739710245a2663409cf49beb31c8e18cb148684
+  data.tar.gz: fe5268568d22eed5d8fc40e9117684ba5989a715
 SHA512:
-  metadata.gz: e2f87d3cc3f9a1d5aa1c94b8a9b00fabc8deef097a2c9e6b8da7d3ddf81a3a5907094ef22ec675f49d4567d274a848ce6161424f9ba5edbbd885d98a9a6940ec
-  data.tar.gz: 37db160a1993e53804837b379530ffb746f9c77e535e4d5f6f723a060e0135be1854e8c85dde3ea7ba23351791cf9ac4a1d704c8631fbdf2674fd5bd2ebf203b
+  metadata.gz: 9d8808f711394ed62b840faa26d1472f871e7c7e71b069b9279a5342b1580b9d496c8d1103984f97e942a715d16f91bd402a2baed03c3dc68625d0727761064b
+  data.tar.gz: a65ba3a389a4f3e80cc179c18d5ed0d8e0e62a550776dd73b0888ae4b2832ccd96f9b3e680d202723ab667d4104aaa62af25dc7e9e51699634dc767d0f578701

data/README.md CHANGED Viewed

@@ -1,47 +1,97 @@
-# Redshift output plugins for Embulk
-Redshift output plugins for Embulk loads records to Redshift.
-## Overview
-* **Plugin type**: output
-* **Load all or nothing**: depnds on the mode:
-  * **insert**: no
-  * **replace**: yes
-* **Resume supported**: no
-## Configuration
-- **host**: database host name (string, required)
-- **port**: database port number (integer, default: 5439)
-- **user**: database login user name (string, required)
-- **password**: database login password (string, default: "")
-- **database**: destination database name (string, required)
-- **schema**: destination schema name (string, default: "public")
-- **table**: destination table name (string, required)
-- **mode**: "replace" or "insert" (string, required)
-- **batch_size**: size of a single batch insert (integer, default: 16777216)
-- **options**: extra connection properties (hash, default: {})
-### Example
-```yaml
-out:
-  type: redshift
-  host: myinstance.us-west-2.redshift.amazonaws.com
-  user: pg
-  password: ""
-  database: my_database
-  table: my_table
-  access_key_id: ABCXYZ123ABCXYZ123
-  secret_access_key: AbCxYz123aBcXyZ123
-  s3_bucket: my-redshift-transfer-bucket
-  iam_user_name: my-s3-read-only
-  mode: insert
-```
-### Build
-```
-$ ./gradlew gem
-```
+# Redshift output plugins for Embulk
+Redshift output plugins for Embulk loads records to Redshift.
+## Overview
+* **Plugin type**: output
+* **Load all or nothing**: depnds on the mode. see bellow.
+* **Resume supported**: depnds on the mode. see bellow.
+## Configuration
+- **host**: database host name (string, required)
+- **port**: database port number (integer, default: 5439)
+- **user**: database login user name (string, required)
+- **password**: database login password (string, default: "")
+- **database**: destination database name (string, required)
+- **schema**: destination schema name (string, default: "public")
+- **table**: destination table name (string, required)
+- **options**: extra connection properties (hash, default: {})
+- **mode**: "replace" or "insert" (string, required)
+- **batch_size**: size of a single batch insert (integer, default: 16777216)
+- **default_timezone**: If input column type (embulk type) is timestamp and destination column type is `string` or `nstring`, this plugin needs to format the timestamp into a string. This default_timezone option is used to control the timezone. You can overwrite timezone for each columns using column_options option. (string, default: `UTC`)
+- **column_options**: advanced: a key-value pairs where key is a column name and value is options for the column.
+  - **type**: type of a column when this plugin creates new tables (e.g. `VARCHAR(255)`, `INTEGER NOT NULL UNIQUE`). This used when this plugin creates intermediate tables (insert, truncate_insert and merge modes), when it creates the target table (insert_direct and replace modes), and when it creates nonexistent target table automatically. (string, default: depends on input column type. `BIGINT` if input column type is long, `BOOLEAN` if boolean, `DOUBLE PRECISION` if double, `CLOB` if string, `TIMESTAMP` if timestamp)
+  - **value_type**: This plugin converts input column type (embulk type) into a database type to build a INSERT statement. This value_type option controls the type of the value in a INSERT statement. (string, default: depends on input column type. Available values options are: `byte`, `short`, `int`, `long`, `double`, `float`, `boolean`, `string`, `nstring`, `date`, `time`, `timestamp`, `decimal`, `null`, `pass`)
+  - **timestamp_format**: If input column type (embulk type) is timestamp and value_type is `string` or `nstring`, this plugin needs to format the timestamp value into a string. This timestamp_format option is used to control the format of the timestamp. (string, default: `%Y-%m-%d %H:%M:%S.%6N`)
+  - **timezone**: If input column type (embulk type) is timestamp and value_type is `string` or `nstring`, this plugin needs to format the timestamp value into a string. And if the input column type is timestamp and value_type is `date`, this plugin needs to consider timezone. In those cases, this timezone option is used to control the timezone. (string, value of default_timezone option is used by default)
+### Modes
+* **insert**:
+  * Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs `INSERT INTO <target_table> SELECT * FROM <intermediate_table_1> UNION ALL SELECT * FROM <intermediate_table_2> UNION ALL ...` query.
+  * Transactional: Yes. This mode successfully writes all rows, or fails with writing zero rows.
+  * Resumable: Yes.
+* **insert_direct**:
+  * Behavior: This mode inserts rows to the target table directly.
+  * Transactional: No. If fails, the target table could have some rows inserted.
+  * Resumable: No.
+* **truncate_insert**:
+  * Behavior: Same with `insert` mode excepting that it truncates the target table right before the last `INSERT ...` query.
+  * Transactional: Yes.
+  * Resumable: Yes.
+* **merge**:
+  * Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs `INSERT INTO <target_table> SELECT * FROM <intermediate_table_1> UNION ALL SELECT * FROM <intermediate_table_2> UNION ALL ... ON DUPLICATE KEY UPDATE ...` query.
+  * Transactional: Yes.
+  * Resumable: Yes.
+* **replace**:
+  * Behavior: Same with `insert` mode excepting that it truncates the target table right before the last `INSERT ...` query.
+  * Transactional: Yes.
+  * Resumable: No.
+### Example
+```yaml
+out:
+  type: redshift
+  host: myinstance.us-west-2.redshift.amazonaws.com
+  user: pg
+  password: ""
+  database: my_database
+  table: my_table
+  access_key_id: ABCXYZ123ABCXYZ123
+  secret_access_key: AbCxYz123aBcXyZ123
+  s3_bucket: my-redshift-transfer-bucket
+  iam_user_name: my-s3-read-only
+  mode: insert
+```
+Advanced configuration:
+```yaml
+out:
+  type: redshift
+  host: myinstance.us-west-2.redshift.amazonaws.com
+  user: pg
+  password: ""
+  database: my_database
+  table: my_table
+  access_key_id: ABCXYZ123ABCXYZ123
+  secret_access_key: AbCxYz123aBcXyZ123
+  s3_bucket: my-redshift-transfer-bucket
+  iam_user_name: my-s3-read-only
+  options: {loglevel: 2}
+  mode: insert_direct
+  column_options:
+    my_col_1: {type: 'VARCHAR(255)'}
+    my_col_3: {type: 'INT NOT NULL'}
+    my_col_4: {value_type: string, timestamp_format: `%Y-%m-%d %H:%M:%S %z`, timezone: '-0700'}
+    my_col_5: {type: 'DECIMAL(18,9)', value_type: pass}
+```
+### Build
+```
+$ ./gradlew gem
+```

data/build.gradle CHANGED Viewed

@@ -1,9 +1,9 @@
-dependencies {
-    compile project(':embulk-output-jdbc')
-    compile project(':embulk-output-postgresql')
-    compile "com.amazonaws:aws-java-sdk-s3:1.9.17"
-    compile "com.amazonaws:aws-java-sdk-sts:1.9.17"
-    testCompile project(':embulk-output-jdbc').sourceSets.test.output
-}
+dependencies {
+    compile project(':embulk-output-jdbc')
+    compile project(':embulk-output-postgresql')
+    compile "com.amazonaws:aws-java-sdk-s3:1.9.17"
+    compile "com.amazonaws:aws-java-sdk-sts:1.9.17"
+    testCompile project(':embulk-output-jdbc').sourceSets.test.output
+}

data/classpath/embulk-output-jdbc-0.3.0.jar ADDED Viewed

Binary file

data/classpath/embulk-output-postgresql-0.3.0.jar ADDED Viewed

Binary file

data/classpath/embulk-output-redshift-0.3.0.jar ADDED Viewed

Binary file

data/lib/embulk/output/redshift.rb CHANGED Viewed

@@ -1,3 +1,3 @@
-Embulk::JavaPlugin.register_output(
-  :redshift, "org.embulk.output.RedshiftOutputPlugin",
-  File.expand_path('../../../../classpath', __FILE__))
+Embulk::JavaPlugin.register_output(
+  :redshift, "org.embulk.output.RedshiftOutputPlugin",
+  File.expand_path('../../../../classpath', __FILE__))

data/src/main/java/org/embulk/output/RedshiftOutputPlugin.java CHANGED Viewed

@@ -1,10 +1,14 @@
 package org.embulk.output;
+import java.util.List;
 import java.util.Properties;
 import java.io.IOException;
 import java.sql.SQLException;
 import org.slf4j.Logger;
+import com.google.common.base.Optional;
+import com.google.common.collect.ImmutableSet;
 import com.amazonaws.auth.AWSCredentials;
+import com.amazonaws.auth.AWSCredentialsProvider;
 import com.amazonaws.auth.BasicAWSCredentials;
 import org.embulk.spi.Exec;
 import org.embulk.config.Config;
@@ -61,6 +65,15 @@ public class RedshiftOutputPlugin
         return RedshiftPluginTask.class;
     }
+    @Override
+    protected Features getFeatures(PluginTask task)
+    {
+        return new Features()
+            .setMaxTableNameLength(30)
+            .setSupportedModes(ImmutableSet.of(Mode.INSERT, Mode.INSERT_DIRECT, Mode.MERGE, Mode.TRUNCATE_INSERT, Mode.REPLACE))
+            .setIgnoreMergeKeys(false);
+    }
     @Override
     protected RedshiftOutputConnector getConnector(PluginTask task, boolean retryableMetadataOperation)
     {
@@ -70,8 +83,6 @@ public class RedshiftOutputPlugin
                 t.getHost(), t.getPort(), t.getDatabase());
         Properties props = new Properties();
-        props.setProperty("user", t.getUser());
-        props.setProperty("password", t.getPassword());
         props.setProperty("loginTimeout",   "300"); // seconds
         props.setProperty("socketTimeout", "1800"); // seconds
@@ -98,19 +109,39 @@ public class RedshiftOutputPlugin
         props.putAll(t.getOptions());
+        props.setProperty("user", t.getUser());
+        logger.info("Connecting to {} options {}", url, props);
+        props.setProperty("password", t.getPassword());
         return new RedshiftOutputConnector(url, props, t.getSchema());
     }
+    private static AWSCredentialsProvider getAWSCredentialsProvider(RedshiftPluginTask task)
+    {
+        final AWSCredentials creds = new BasicAWSCredentials(
+                task.getAccessKeyId(), task.getSecretAccessKey());
+        return new AWSCredentialsProvider() {
+            @Override
+            public AWSCredentials getCredentials()
+            {
+                return creds;
+            }
+            @Override
+            public void refresh()
+            {
+            }
+        };
+    }
     @Override
-    protected BatchInsert newBatchInsert(PluginTask task) throws IOException, SQLException
+    protected BatchInsert newBatchInsert(PluginTask task, Optional<List<String>> mergeKeys) throws IOException, SQLException
     {
-        if (task.getMode().isMerge()) {
-            throw new UnsupportedOperationException("mode 'merge' is not implemented for this type");
+        if (mergeKeys.isPresent()) {
+            throw new UnsupportedOperationException("Redshift output plugin doesn't support 'merge_direct' mode. Use 'merge' mode instead.");
         }
         RedshiftPluginTask t = (RedshiftPluginTask) task;
-        AWSCredentials creds = new BasicAWSCredentials(
-                t.getAccessKeyId(), t.getSecretAccessKey());
         return new RedshiftCopyBatchInsert(getConnector(task, true),
-                creds, t.getS3Bucket(), t.getIamUserName());
+                getAWSCredentialsProvider(t), t.getS3Bucket(), t.getIamUserName());
     }
 }

data/src/main/java/org/embulk/output/redshift/RedshiftCopyBatchInsert.java CHANGED Viewed

@@ -1,216 +1,214 @@
-package org.embulk.output.redshift;
-import java.util.zip.GZIPOutputStream;
-import java.util.concurrent.Callable;
-import java.util.UUID;
-import java.io.File;
-import java.io.IOException;
-import java.io.FileOutputStream;
-import java.io.OutputStreamWriter;
-import java.io.Closeable;
-import java.io.Writer;
-import java.io.BufferedWriter;
-import java.sql.Connection;
-import java.sql.SQLException;
-import com.amazonaws.auth.AWSCredentials;
-import com.amazonaws.auth.BasicSessionCredentials;
-import com.amazonaws.auth.policy.Policy;
-import com.amazonaws.auth.policy.Resource;
-import com.amazonaws.auth.policy.Statement;
-import com.amazonaws.auth.policy.Statement.Effect;
-import com.amazonaws.auth.policy.actions.S3Actions;
-import com.amazonaws.services.s3.AmazonS3Client;
-import com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient;
-import com.amazonaws.services.securitytoken.model.GetFederationTokenRequest;
-import com.amazonaws.services.securitytoken.model.GetFederationTokenResult;
-import com.amazonaws.services.securitytoken.model.Credentials;
-import org.slf4j.Logger;
-import org.embulk.spi.Exec;
-import org.embulk.output.jdbc.JdbcSchema;
-import org.embulk.output.postgresql.AbstractPostgreSQLCopyBatchInsert;
-public class RedshiftCopyBatchInsert
-        extends AbstractPostgreSQLCopyBatchInsert
-{
-    private final Logger logger = Exec.getLogger(RedshiftCopyBatchInsert.class);
-    private final RedshiftOutputConnector connector;
-    private final AWSCredentials awsCredentials;
-    private final String s3BucketName;
-    private final String iamReaderUserName;
-    private final AmazonS3Client s3;
-    private final AWSSecurityTokenServiceClient sts;
-    private RedshiftOutputConnection connection = null;
-    private String copySqlBeforeFrom = null;
-    private long totalRows;
-    private int fileCount;
-    public static final String COPY_AFTER_FROM = "GZIP DELIMITER '\\t' NULL '\\N' ESCAPE TRUNCATECOLUMNS ACCEPTINVCHARS STATUPDATE OFF COMPUPDATE OFF";
-    public RedshiftCopyBatchInsert(RedshiftOutputConnector connector,
-            AWSCredentials awsCredentials, String s3BucketName,
-            String iamReaderUserName) throws IOException, SQLException
-    {
-        super();
-        this.connector = connector;
-        this.awsCredentials = awsCredentials;
-        this.s3BucketName = s3BucketName;
-        this.iamReaderUserName = iamReaderUserName;
-        this.s3 = new AmazonS3Client(awsCredentials);  // TODO options
-        this.sts = new AWSSecurityTokenServiceClient(awsCredentials);  // options
-    }
-    @Override
-    public void prepare(String loadTable, JdbcSchema insertSchema) throws SQLException
-    {
-        this.connection = connector.connect(true);
-        this.copySqlBeforeFrom = connection.buildCopySQLBeforeFrom(loadTable, insertSchema);
-        logger.info("Copy SQL: "+copySqlBeforeFrom+" ? "+COPY_AFTER_FROM);
-    }
-    @Override
-    protected BufferedWriter openWriter(File newFile) throws IOException
-    {
-        // Redshift supports gzip
-        return new BufferedWriter(
-                new OutputStreamWriter(
-                    new GZIPOutputStream(new FileOutputStream(newFile)),
-                    FILE_CHARSET)
-                );
-    }
-    @Override
-    public void flush() throws IOException, SQLException
-    {
-        File file = closeCurrentFile();  // flush buffered data in writer
-        // TODO multi-threading
-        new UploadAndCopyTask(file, batchRows, UUID.randomUUID().toString()).call();
-        new DeleteFileFinalizer(file).close();
-        fileCount++;
-        totalRows += batchRows;
-        batchRows = 0;
-        openNewFile();
-        file.delete();
-    }
-    @Override
-    public void finish() throws IOException, SQLException
-    {
-        super.finish();
-        logger.info("Loaded {} files.", fileCount);
-    }
-    @Override
-    public void close() throws IOException, SQLException
-    {
-        s3.shutdown();
-        closeCurrentFile().delete();
-        if (connection != null) {
-            connection.close();
-            connection = null;
-        }
-    }
-    private BasicSessionCredentials generateReaderSessionCredentials(String s3KeyName)
-    {
-        Policy policy = new Policy()
-            .withStatements(
-                    new Statement(Effect.Allow)
-                        .withActions(S3Actions.ListObjects)
-                        .withResources(new Resource("arn:aws:s3:::"+s3BucketName)),
-                    new Statement(Effect.Allow)
-                        .withActions(S3Actions.GetObject)
-                        .withResources(new Resource("arn:aws:s3:::"+s3BucketName+"/"+s3KeyName))  // TODO encode file name using percent encoding
-                    );
-        GetFederationTokenRequest req = new GetFederationTokenRequest();
-        req.setDurationSeconds(86400);  // 3600 - 129600
-        req.setName(iamReaderUserName);
-        req.setPolicy(policy.toJson());
-        GetFederationTokenResult res = sts.getFederationToken(req);
-        Credentials c = res.getCredentials();
-        return new BasicSessionCredentials(
-                c.getAccessKeyId(),
-                c.getSecretAccessKey(),
-                c.getSessionToken());
-    }
-    private class UploadAndCopyTask implements Callable<Void>
-    {
-        private final File file;
-        private final int batchRows;
-        private final String s3KeyName;
-        public UploadAndCopyTask(File file, int batchRows, String s3KeyName)
-        {
-            this.file = file;
-            this.batchRows = batchRows;
-            this.s3KeyName = s3KeyName;
-        }
-        public Void call() throws SQLException {
-            logger.info(String.format("Uploading file id %s to S3 (%,d bytes %,d rows)",
-                        s3KeyName, file.length(), batchRows));
-            s3.putObject(s3BucketName, s3KeyName, file);
-            RedshiftOutputConnection con = connector.connect(true);
-            try {
-                logger.info("Running COPY from file {}", s3KeyName);
-                // create temporary credential right before COPY operation because
-                // it has timeout.
-                // TODO skip this step if iamReaderUserName is not set
-                BasicSessionCredentials creds = generateReaderSessionCredentials(s3KeyName);
-                long startTime = System.currentTimeMillis();
-                con.runCopy(buildCopySQL(creds));
-                double seconds = (System.currentTimeMillis() - startTime) / 1000.0;
-                logger.info(String.format("Loaded file %s (%.2f seconds for COPY)", s3KeyName, seconds));
-            } finally {
-                con.close();
-            }
-            return null;
-        }
-        private String buildCopySQL(BasicSessionCredentials creds)
-        {
-            StringBuilder sb = new StringBuilder();
-            sb.append(copySqlBeforeFrom);
-            sb.append(" FROM 's3://");
-            sb.append(s3BucketName);
-            sb.append("/");
-            sb.append(s3KeyName);
-            sb.append("' CREDENTIALS '");
-            sb.append("aws_access_key_id=");
-            sb.append(creds.getAWSAccessKeyId());
-            sb.append(";aws_secret_access_key=");
-            sb.append(creds.getAWSSecretKey());
-            sb.append(";token=");
-            sb.append(creds.getSessionToken());
-            sb.append("' ");
-            sb.append(COPY_AFTER_FROM);
-            return sb.toString();
-        }
-    }
-    private static class DeleteFileFinalizer implements Closeable
-    {
-        private File file;
-        public DeleteFileFinalizer(File file) {
-            this.file = file;
-        }
-        @Override
-        public void close() throws IOException {
-            file.delete();
-        }
-    }
-}
+package org.embulk.output.redshift;
+import java.util.zip.GZIPOutputStream;
+import java.util.concurrent.Callable;
+import java.util.UUID;
+import java.io.File;
+import java.io.IOException;
+import java.io.FileOutputStream;
+import java.io.OutputStreamWriter;
+import java.io.Closeable;
+import java.io.Writer;
+import java.io.BufferedWriter;
+import java.sql.Connection;
+import java.sql.SQLException;
+import com.amazonaws.auth.AWSCredentialsProvider;
+import com.amazonaws.auth.BasicSessionCredentials;
+import com.amazonaws.auth.policy.Policy;
+import com.amazonaws.auth.policy.Resource;
+import com.amazonaws.auth.policy.Statement;
+import com.amazonaws.auth.policy.Statement.Effect;
+import com.amazonaws.auth.policy.actions.S3Actions;
+import com.amazonaws.services.s3.AmazonS3Client;
+import com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient;
+import com.amazonaws.services.securitytoken.model.GetFederationTokenRequest;
+import com.amazonaws.services.securitytoken.model.GetFederationTokenResult;
+import com.amazonaws.services.securitytoken.model.Credentials;
+import org.slf4j.Logger;
+import org.embulk.spi.Exec;
+import org.embulk.output.jdbc.JdbcSchema;
+import org.embulk.output.postgresql.AbstractPostgreSQLCopyBatchInsert;
+public class RedshiftCopyBatchInsert
+        extends AbstractPostgreSQLCopyBatchInsert
+{
+    private final Logger logger = Exec.getLogger(RedshiftCopyBatchInsert.class);
+    private final RedshiftOutputConnector connector;
+    private final String s3BucketName;
+    private final String iamReaderUserName;
+    private final AmazonS3Client s3;
+    private final AWSSecurityTokenServiceClient sts;
+    private RedshiftOutputConnection connection = null;
+    private String copySqlBeforeFrom = null;
+    private long totalRows;
+    private int fileCount;
+    public static final String COPY_AFTER_FROM = "GZIP DELIMITER '\\t' NULL '\\N' ESCAPE TRUNCATECOLUMNS ACCEPTINVCHARS STATUPDATE OFF COMPUPDATE OFF";
+    public RedshiftCopyBatchInsert(RedshiftOutputConnector connector,
+            AWSCredentialsProvider credentialsProvider, String s3BucketName,
+            String iamReaderUserName) throws IOException, SQLException
+    {
+        super();
+        this.connector = connector;
+        this.s3BucketName = s3BucketName;
+        this.iamReaderUserName = iamReaderUserName;
+        this.s3 = new AmazonS3Client(credentialsProvider);  // TODO options
+        this.sts = new AWSSecurityTokenServiceClient(credentialsProvider);  // options
+    }
+    @Override
+    public void prepare(String loadTable, JdbcSchema insertSchema) throws SQLException
+    {
+        this.connection = connector.connect(true);
+        this.copySqlBeforeFrom = connection.buildCopySQLBeforeFrom(loadTable, insertSchema);
+        logger.info("Copy SQL: "+copySqlBeforeFrom+" ? "+COPY_AFTER_FROM);
+    }
+    @Override
+    protected BufferedWriter openWriter(File newFile) throws IOException
+    {
+        // Redshift supports gzip
+        return new BufferedWriter(
+                new OutputStreamWriter(
+                    new GZIPOutputStream(new FileOutputStream(newFile)),
+                    FILE_CHARSET)
+                );
+    }
+    @Override
+    public void flush() throws IOException, SQLException
+    {
+        File file = closeCurrentFile();  // flush buffered data in writer
+        // TODO multi-threading
+        new UploadAndCopyTask(file, batchRows, UUID.randomUUID().toString()).call();
+        new DeleteFileFinalizer(file).close();
+        fileCount++;
+        totalRows += batchRows;
+        batchRows = 0;
+        openNewFile();
+        file.delete();
+    }
+    @Override
+    public void finish() throws IOException, SQLException
+    {
+        super.finish();
+        logger.info("Loaded {} files.", fileCount);
+    }
+    @Override
+    public void close() throws IOException, SQLException
+    {
+        s3.shutdown();
+        closeCurrentFile().delete();
+        if (connection != null) {
+            connection.close();
+            connection = null;
+        }
+    }
+    private BasicSessionCredentials generateReaderSessionCredentials(String s3KeyName)
+    {
+        Policy policy = new Policy()
+            .withStatements(
+                    new Statement(Effect.Allow)
+                        .withActions(S3Actions.ListObjects)
+                        .withResources(new Resource("arn:aws:s3:::"+s3BucketName)),
+                    new Statement(Effect.Allow)
+                        .withActions(S3Actions.GetObject)
+                        .withResources(new Resource("arn:aws:s3:::"+s3BucketName+"/"+s3KeyName))  // TODO encode file name using percent encoding
+                    );
+        GetFederationTokenRequest req = new GetFederationTokenRequest();
+        req.setDurationSeconds(86400);  // 3600 - 129600
+        req.setName(iamReaderUserName);
+        req.setPolicy(policy.toJson());
+        GetFederationTokenResult res = sts.getFederationToken(req);
+        Credentials c = res.getCredentials();
+        return new BasicSessionCredentials(
+                c.getAccessKeyId(),
+                c.getSecretAccessKey(),
+                c.getSessionToken());
+    }
+    private class UploadAndCopyTask implements Callable<Void>
+    {
+        private final File file;
+        private final int batchRows;
+        private final String s3KeyName;
+        public UploadAndCopyTask(File file, int batchRows, String s3KeyName)
+        {
+            this.file = file;
+            this.batchRows = batchRows;
+            this.s3KeyName = s3KeyName;
+        }
+        public Void call() throws SQLException {
+            logger.info(String.format("Uploading file id %s to S3 (%,d bytes %,d rows)",
+                        s3KeyName, file.length(), batchRows));
+            s3.putObject(s3BucketName, s3KeyName, file);
+            RedshiftOutputConnection con = connector.connect(true);
+            try {
+                logger.info("Running COPY from file {}", s3KeyName);
+                // create temporary credential right before COPY operation because
+                // it has timeout.
+                // TODO skip this step if iamReaderUserName is not set
+                BasicSessionCredentials creds = generateReaderSessionCredentials(s3KeyName);
+                long startTime = System.currentTimeMillis();
+                con.runCopy(buildCopySQL(creds));
+                double seconds = (System.currentTimeMillis() - startTime) / 1000.0;
+                logger.info(String.format("Loaded file %s (%.2f seconds for COPY)", s3KeyName, seconds));
+            } finally {
+                con.close();
+            }
+            return null;
+        }
+        private String buildCopySQL(BasicSessionCredentials creds)
+        {
+            StringBuilder sb = new StringBuilder();
+            sb.append(copySqlBeforeFrom);
+            sb.append(" FROM 's3://");
+            sb.append(s3BucketName);
+            sb.append("/");
+            sb.append(s3KeyName);
+            sb.append("' CREDENTIALS '");
+            sb.append("aws_access_key_id=");
+            sb.append(creds.getAWSAccessKeyId());
+            sb.append(";aws_secret_access_key=");
+            sb.append(creds.getAWSSecretKey());
+            sb.append(";token=");
+            sb.append(creds.getSessionToken());
+            sb.append("' ");
+            sb.append(COPY_AFTER_FROM);
+            return sb.toString();
+        }
+    }
+    private static class DeleteFileFinalizer implements Closeable
+    {
+        private File file;
+        public DeleteFileFinalizer(File file) {
+            this.file = file;
+        }
+        @Override
+        public void close() throws IOException {
+            file.delete();
+        }
+    }
+}

data/src/main/java/org/embulk/output/redshift/RedshiftOutputConnection.java CHANGED Viewed

@@ -1,122 +1,122 @@
-package org.embulk.output.redshift;
-import java.sql.Connection;
-import java.sql.SQLException;
-import java.sql.Statement;
-import org.slf4j.Logger;
-import org.embulk.spi.Exec;
-import org.embulk.output.jdbc.JdbcOutputConnection;
-import org.embulk.output.jdbc.JdbcColumn;
-import org.embulk.output.jdbc.JdbcSchema;
-public class RedshiftOutputConnection
-        extends JdbcOutputConnection
-{
-    private final Logger logger = Exec.getLogger(RedshiftOutputConnection.class);
-    public RedshiftOutputConnection(Connection connection, String schemaName, boolean autoCommit)
-            throws SQLException
-    {
-        super(connection, schemaName);
-        connection.setAutoCommit(autoCommit);
-    }
-    // Redshift does not support DROP TABLE IF EXISTS.
-    // Here runs DROP TABLE and ignores errors.
-    @Override
-    public void dropTableIfExists(String tableName) throws SQLException
-    {
-        Statement stmt = connection.createStatement();
-        try {
-            String sql = String.format("DROP TABLE IF EXISTS %s", quoteIdentifierString(tableName));
-            executeUpdate(stmt, sql);
-            commitIfNecessary(connection);
-        } catch (SQLException ex) {
-            // ignore errors.
-            // TODO here should ignore only 'table "XXX" does not exist' errors.
-            SQLException ignored = safeRollback(connection, ex);
-        } finally {
-            stmt.close();
-        }
-    }
-    // Redshift does not support DROP TABLE IF EXISTS.
-    // Dropping part runs DROP TABLE and ignores errors.
-    @Override
-    public void replaceTable(String fromTable, JdbcSchema schema, String toTable) throws SQLException
-    {
-        Statement stmt = connection.createStatement();
-        try {
-            try {
-                StringBuilder sb = new StringBuilder();
-                sb.append("DROP TABLE ");
-                quoteIdentifierString(sb, toTable);
-                String sql = sb.toString();
-                executeUpdate(stmt, sql);
-            } catch (SQLException ex) {
-                // ignore errors.
-                // TODO here should ignore only 'table "XXX" does not exist' errors.
-                // rollback or comimt is required to recover failed transaction
-                SQLException ignored = safeRollback(connection, ex);
-            }
-            {
-                StringBuilder sb = new StringBuilder();
-                sb.append("ALTER TABLE ");
-                quoteIdentifierString(sb, fromTable);
-                sb.append(" RENAME TO ");
-                quoteIdentifierString(sb, toTable);
-                String sql = sb.toString();
-                executeUpdate(stmt, sql);
-            }
-            commitIfNecessary(connection);
-        } catch (SQLException ex) {
-            throw safeRollback(connection, ex);
-        } finally {
-            stmt.close();
-        }
-    }
-    @Override
-    protected String convertTypeName(String typeName)
-    {
-        // Redshift does not support TEXT type.
-        switch(typeName) {
-        case "CLOB":
-            return "VARCHAR(65535)";
-        case "TEXT":
-            return "VARCHAR(65535)";
-        case "BLOB":
-            return "BYTEA";
-        default:
-            return typeName;
-        }
-    }
-    public String buildCopySQLBeforeFrom(String tableName, JdbcSchema tableSchema)
-    {
-        StringBuilder sb = new StringBuilder();
-        sb.append("COPY ");
-        quoteIdentifierString(sb, tableName);
-        sb.append(" (");
-        for(int i=0; i < tableSchema.getCount(); i++) {
-            if(i != 0) { sb.append(", "); }
-            quoteIdentifierString(sb, tableSchema.getColumnName(i));
-        }
-        sb.append(")");
-        return sb.toString();
-    }
-    public void runCopy(String sql) throws SQLException
-    {
-        Statement stmt = connection.createStatement();
-        try {
-            stmt.executeUpdate(sql);
-        } finally {
-            stmt.close();
-        }
-    }
-}
+package org.embulk.output.redshift;
+import java.sql.Connection;
+import java.sql.SQLException;
+import java.sql.Statement;
+import org.slf4j.Logger;
+import org.embulk.spi.Exec;
+import org.embulk.output.jdbc.JdbcOutputConnection;
+import org.embulk.output.jdbc.JdbcColumn;
+import org.embulk.output.jdbc.JdbcSchema;
+public class RedshiftOutputConnection
+        extends JdbcOutputConnection
+{
+    private final Logger logger = Exec.getLogger(RedshiftOutputConnection.class);
+    public RedshiftOutputConnection(Connection connection, String schemaName, boolean autoCommit)
+            throws SQLException
+    {
+        super(connection, schemaName);
+        connection.setAutoCommit(autoCommit);
+    }
+    // Redshift does not support DROP TABLE IF EXISTS.
+    // Here runs DROP TABLE and ignores errors.
+    @Override
+    public void dropTableIfExists(String tableName) throws SQLException
+    {
+        Statement stmt = connection.createStatement();
+        try {
+            String sql = String.format("DROP TABLE IF EXISTS %s", quoteIdentifierString(tableName));
+            executeUpdate(stmt, sql);
+            commitIfNecessary(connection);
+        } catch (SQLException ex) {
+            // ignore errors.
+            // TODO here should ignore only 'table "XXX" does not exist' errors.
+            SQLException ignored = safeRollback(connection, ex);
+        } finally {
+            stmt.close();
+        }
+    }
+    // Redshift does not support DROP TABLE IF EXISTS.
+    // Dropping part runs DROP TABLE and ignores errors.
+    @Override
+    public void replaceTable(String fromTable, JdbcSchema schema, String toTable) throws SQLException
+    {
+        Statement stmt = connection.createStatement();
+        try {
+            try {
+                StringBuilder sb = new StringBuilder();
+                sb.append("DROP TABLE ");
+                quoteIdentifierString(sb, toTable);
+                String sql = sb.toString();
+                executeUpdate(stmt, sql);
+            } catch (SQLException ex) {
+                // ignore errors.
+                // TODO here should ignore only 'table "XXX" does not exist' errors.
+                // rollback or comimt is required to recover failed transaction
+                SQLException ignored = safeRollback(connection, ex);
+            }
+            {
+                StringBuilder sb = new StringBuilder();
+                sb.append("ALTER TABLE ");
+                quoteIdentifierString(sb, fromTable);
+                sb.append(" RENAME TO ");
+                quoteIdentifierString(sb, toTable);
+                String sql = sb.toString();
+                executeUpdate(stmt, sql);
+            }
+            commitIfNecessary(connection);
+        } catch (SQLException ex) {
+            throw safeRollback(connection, ex);
+        } finally {
+            stmt.close();
+        }
+    }
+    @Override
+    protected String buildColumnTypeName(JdbcColumn c)
+    {
+        // Redshift does not support TEXT type.
+        switch(c.getSimpleTypeName()) {
+        case "CLOB":
+            return "VARCHAR(65535)";
+        case "TEXT":
+            return "VARCHAR(65535)";
+        case "BLOB":
+            return "BYTEA";
+        default:
+            return super.buildColumnTypeName(c);
+        }
+    }
+    public String buildCopySQLBeforeFrom(String tableName, JdbcSchema tableSchema)
+    {
+        StringBuilder sb = new StringBuilder();
+        sb.append("COPY ");
+        quoteIdentifierString(sb, tableName);
+        sb.append(" (");
+        for(int i=0; i < tableSchema.getCount(); i++) {
+            if(i != 0) { sb.append(", "); }
+            quoteIdentifierString(sb, tableSchema.getColumnName(i));
+        }
+        sb.append(")");
+        return sb.toString();
+    }
+    public void runCopy(String sql) throws SQLException
+    {
+        Statement stmt = connection.createStatement();
+        try {
+            stmt.executeUpdate(sql);
+        } finally {
+            stmt.close();
+        }
+    }
+}

data/src/main/java/org/embulk/output/redshift/RedshiftOutputConnector.java CHANGED Viewed

@@ -1,40 +1,40 @@
-package org.embulk.output.redshift;
-import java.util.Properties;
-import java.sql.Driver;
-import java.sql.Connection;
-import java.sql.SQLException;
-import org.embulk.output.jdbc.JdbcOutputConnector;
-import org.embulk.output.jdbc.JdbcOutputConnection;
-public class RedshiftOutputConnector
-        implements JdbcOutputConnector
-{
-    private static final Driver driver = new org.postgresql.Driver();
-    private final String url;
-    private final Properties properties;
-    private final String schemaName;
-    public RedshiftOutputConnector(String url, Properties properties, String schemaName)
-    {
-        this.url = url;
-        this.properties = properties;
-        this.schemaName = schemaName;
-    }
-    @Override
-    public RedshiftOutputConnection connect(boolean autoCommit) throws SQLException
-    {
-        Connection c = driver.connect(url, properties);
-        try {
-            RedshiftOutputConnection con = new RedshiftOutputConnection(c, schemaName, autoCommit);
-            c = null;
-            return con;
-        } finally {
-            if (c != null) {
-                c.close();
-            }
-        }
-    }
-}
+package org.embulk.output.redshift;
+import java.util.Properties;
+import java.sql.Driver;
+import java.sql.Connection;
+import java.sql.SQLException;
+import org.embulk.output.jdbc.JdbcOutputConnector;
+import org.embulk.output.jdbc.JdbcOutputConnection;
+public class RedshiftOutputConnector
+        implements JdbcOutputConnector
+{
+    private static final Driver driver = new org.postgresql.Driver();
+    private final String url;
+    private final Properties properties;
+    private final String schemaName;
+    public RedshiftOutputConnector(String url, Properties properties, String schemaName)
+    {
+        this.url = url;
+        this.properties = properties;
+        this.schemaName = schemaName;
+    }
+    @Override
+    public RedshiftOutputConnection connect(boolean autoCommit) throws SQLException
+    {
+        Connection c = driver.connect(url, properties);
+        try {
+            RedshiftOutputConnection con = new RedshiftOutputConnection(c, schemaName, autoCommit);
+            c = null;
+            return con;
+        } finally {
+            if (c != null) {
+                c.close();
+            }
+        }
+    }
+}

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: embulk-output-redshift
 version: !ruby/object:Gem::Version
-  version: 0.2.4
+  version: 0.3.0
 platform: ruby
 authors:
 - Sadayuki Furuhashi
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2015-05-12 00:00:00.000000000 Z
+date: 2015-05-19 00:00:00.000000000 Z
 dependencies: []
 description: Inserts or updates records to a table.
 email:
@@ -30,9 +30,9 @@ files:
 - classpath/aws-java-sdk-sts-1.9.17.jar
 - classpath/commons-codec-1.6.jar
 - classpath/commons-logging-1.1.3.jar
-- classpath/embulk-output-jdbc-0.2.4.jar
-- classpath/embulk-output-postgresql-0.2.4.jar
-- classpath/embulk-output-redshift-0.2.4.jar
+- classpath/embulk-output-jdbc-0.3.0.jar
+- classpath/embulk-output-postgresql-0.3.0.jar
+- classpath/embulk-output-redshift-0.3.0.jar
 - classpath/httpclient-4.3.4.jar
 - classpath/httpcore-4.3.2.jar
 - classpath/jna-4.1.0.jar

data/classpath/embulk-output-jdbc-0.2.4.jar DELETED Viewed

Binary file

data/classpath/embulk-output-postgresql-0.2.4.jar DELETED Viewed

Binary file

data/classpath/embulk-output-redshift-0.2.4.jar DELETED Viewed

Binary file