embulk-output-snowflake 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 383ccdda739d086fd4ec83452382f75139b859b7
4
+ data.tar.gz: e115339e03ef287b89638c0411ffbafc24e4c2f9
5
+ SHA512:
6
+ metadata.gz: da715c55401a0bf7b7009293196db2a115330e79ac935aa92ea3aa296d513c98d119886c2c743384c4bde23f823339543792bd16afcf3a4141aaa50e728c62df
7
+ data.tar.gz: 9de41d872c721e23de9e501fe89cbb6fea440cd8eaf0f9483b7ba31b1096e7ef4439a67a2ed9dcfa573c61dcf259df5929935567dcf82f6d0dcd6027c981b81b
@@ -0,0 +1,101 @@
1
+ # Generic JDBC output plugin for Embulk
2
+
3
+ Snowflake output plugin for Embulk loads records to a Snowflake using a JDBC driver.
4
+
5
+ ## Overview
6
+
7
+ * **Plugin type**: output
8
+ * **Load all or nothing**: depends on the mode. see below.
9
+ * **Resume supported**: depends on the mode. see below.
10
+
11
+ ## Configuration
12
+
13
+ - **driver_path**: path to the jar file of the Snowflake JDBC driver (e.g. 'snowflake-jdbc-3.8.0.jar') (string, optional)
14
+ - **url**: URL of the JDBC connection (e.g. 'jdbc:snowflake://host.eu-central-1.snowflakecomputing.com/?db=development') (string, required)
15
+ - **user**: database login user name (string, optional)
16
+ - **password**: database login password (string, optional)
17
+ - **schema**: destination schema name (string, default: use default schema)
18
+ - **table**: destination table name (string, required)
19
+ - **create_table_constraint**: table constraint added to `CREATE TABLE` statement, like `CREATE TABLE <table_name> (<column1> <type1>, <column2> <type2>, ..., <create_table_constraint>) <create_table_option>`.
20
+ - **create_table_option**: table option added to `CREATE TABLE` statement, like `CREATE TABLE <table_name> (<column1> <type1>, <column2> <type2>, ..., <create_table_constraint>) <create_table_option>`.
21
+ - **transaction_isolation**: transaction isolation level for each connection ("read_uncommitted", "read_committed", "repeatable_read" or "serializable"). if not specified, database default value will be used.
22
+ - **options**: extra JDBC properties (hash, default: {})
23
+ - **retry_limit**: max retry count for database operations (integer, default: 12). When intermediate table to create already created by another process, this plugin will retry with another table name to avoid collision.
24
+ - **retry_wait**: initial retry wait time in milliseconds (integer, default: 1000 (1 second))
25
+ - **max_retry_wait**: upper limit of retry wait, which will be doubled at every retry (integer, default: 1800000 (30 minutes))
26
+ - **mode**: "insert", "insert_direct", "truncate_insert", or "replace". See below (string, required)
27
+ - **batch_size**: size of a single batch insert (integer, default: 16777216)
28
+ - **max_table_name_length**: maximum length of table name in this RDBMS (integer, default: 256)
29
+ - **default_timezone**: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp into a SQL string. This default_timezone option is used to control the timezone. You can overwrite timezone for each columns using column_options option. (string, default: `UTC`)
30
+ - **column_options**: advanced: a key-value pairs where key is a column name and value is options for the column.
31
+ - **type**: type of a column when this plugin creates new tables (e.g. `VARCHAR(255)`, `INTEGER NOT NULL UNIQUE`). This used when this plugin creates intermediate tables (insert and truncate_insert modes), when it creates the target table (replace mode), and when it creates nonexistent target table automatically. (string, default: depends on input column type. `BIGINT` if input column type is long, `BOOLEAN` if boolean, `DOUBLE PRECISION` if double, `CLOB` if string, `TIMESTAMP` if timestamp)
32
+ - **value_type**: This plugin converts input column type (embulk type) into a database type to build a INSERT statement. This value_type option controls the type of the value in a INSERT statement. (string, default: depends on the sql type of the column. Available values options are: `byte`, `short`, `int`, `long`, `double`, `float`, `boolean`, `string`, `nstring`, `date`, `time`, `timestamp`, `decimal`, `json`, `null`, `pass`)
33
+ - **timestamp_format**: If input column type (embulk type) is timestamp and value_type is `string` or `nstring`, this plugin needs to format the timestamp value into a string. This timestamp_format option is used to control the format of the timestamp. (string, default: `%Y-%m-%d %H:%M:%S.%6N`)
34
+ - **timezone**: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp value into a SQL string. In this cases, this timezone option is used to control the timezone. (string, value of default_timezone option is used by default)
35
+ - **before_load**: if set, this SQL will be executed before loading all records. In truncate_insert mode, the SQL will be executed after truncating. replace mode doesn't support this option.
36
+ - **after_load**: if set, this SQL will be executed after loading all records.
37
+
38
+ ## Modes
39
+
40
+ * **insert**:
41
+ * Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs `INSERT INTO <target_table> SELECT * FROM <intermediate_table_1> UNION ALL SELECT * FROM <intermediate_table_2> UNION ALL ...` query. If the target table doesn't exist, it is created automatically.
42
+ * Transactional: Yes. This mode successfully writes all rows, or fails with writing zero rows.
43
+ * Resumable: Yes.
44
+ * **insert_direct**:
45
+ * Behavior: This mode inserts rows to the target table directly. If the target table doesn't exist, it is created automatically.
46
+ * Transactional: No. If fails, the target table could have some rows inserted.
47
+ * Resumable: No.
48
+ * **truncate_insert**:
49
+ * Behavior: Same with `insert` mode excepting that it truncates the target table right before the last `INSERT ...` query.
50
+ * Transactional: Yes.
51
+ * Resumable: Yes.
52
+ * **replace**:
53
+ * Behavior: This mode writes rows to an intermediate table first. If all those tasks run correctly, drops the target table and alters the name of the intermediate table into the target table name.
54
+ * Transactional: No. If fails, the target table could be dropped.
55
+ * Resumable: No.
56
+ * **merge**:
57
+ * Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, merges the intermediate tables into the target table. Namely, if primary keys of a record in the intermediate tables already exist in the target table, the target record is updated by the intermediate record, otherwise the intermediate record is inserted. If the target table doesn't exist, it is created automatically.
58
+ * Transactional: Yes.
59
+ * Resumable: Yes.
60
+ * **merge_direct**:
61
+ * Behavior: This mode merges rows to the target table directly. Namely, if primary keys of an input record already exist in the target table, the target record is updated by the input record, otherwise the input record is inserted. If the target table doesn't exist, it is created automatically.
62
+ * Transactional: No.
63
+ * Resumable: No.
64
+
65
+ ## Example
66
+
67
+ ```yaml
68
+ out:
69
+ type: snowflake
70
+ driver_path: /opt/snowflake-jdbc-3.8.0.jar
71
+ url: jdbc:snowflake://host.eu-central-1.snowflakecomputing.com/?db=development&warehouse=dwh&role=bi&schema=dwh
72
+ user: myuser
73
+ password: "mypassword"
74
+ table: my_table
75
+ mode: insert
76
+ ```
77
+
78
+ Advanced configuration:
79
+
80
+ ```yaml
81
+ out:
82
+ type: snowflake
83
+ driver_path: /opt/snowflake-jdbc-3.8.0.jar
84
+ url: jdbc:snowflake://host.eu-central-1.snowflakecomputing.com/?db=development&warehouse=dwh&role=bi&schema=dwh
85
+ user: myuser
86
+ password: "mypassword"
87
+ table: my_table
88
+ options: {loglevel: 2}
89
+ mode: insert_direct
90
+ column_options:
91
+ my_col_1: {type: 'VARCHAR(255)'}
92
+ my_col_3: {type: 'INT NOT NULL'}
93
+ my_col_4: {value_type: string, timestamp_format: `%Y-%m-%d %H:%M:%S %z`, timezone: '-0700'}
94
+ my_col_5: {type: 'DECIMAL(18,9)', value_type: pass}
95
+ ```
96
+
97
+ ## Build
98
+
99
+ ```
100
+ $ ./gradlew gem
101
+ ```
@@ -0,0 +1,5 @@
1
+ dependencies {
2
+ compile project(':embulk-output-jdbc')
3
+
4
+ testCompile project(':embulk-output-jdbc').sourceSets.test.output
5
+ }
@@ -0,0 +1,3 @@
1
+ Embulk::JavaPlugin.register_output(
2
+ :snowflake, "org.embulk.output.SnowflakeOutputPlugin",
3
+ File.expand_path('../../../../classpath', __FILE__))
@@ -0,0 +1,165 @@
1
+ package org.embulk.output;
2
+
3
+ import java.sql.*;
4
+ import java.util.Properties;
5
+ import java.util.List;
6
+ import java.io.IOException;
7
+ import java.util.Locale;
8
+
9
+ import com.google.common.base.Optional;
10
+ import com.google.common.collect.ImmutableSet;
11
+ import com.google.common.collect.ImmutableList;
12
+
13
+
14
+ import org.embulk.config.Config;
15
+ import org.embulk.config.ConfigDefault;
16
+ import org.embulk.output.jdbc.*;
17
+ import org.embulk.output.snowflake.SnowflakeOutputConnector;
18
+ import org.embulk.spi.Schema;
19
+ import org.embulk.output.snowflake.SnowflakeOutputConnection;
20
+
21
+ public class SnowflakeOutputPlugin
22
+ extends AbstractJdbcOutputPlugin
23
+ {
24
+ public interface GenericPluginTask extends PluginTask
25
+ {
26
+ @Config("driver_path")
27
+ @ConfigDefault("null")
28
+ public Optional<String> getDriverPath();
29
+
30
+ @Config("url")
31
+ public String getUrl();
32
+
33
+ @Config("user")
34
+ @ConfigDefault("null")
35
+ public Optional<String> getUser();
36
+
37
+ @Config("password")
38
+ @ConfigDefault("null")
39
+ public Optional<String> getPassword();
40
+
41
+ @Config("schema")
42
+ @ConfigDefault("null")
43
+ public Optional<String> getSchema();
44
+
45
+ @Config("max_table_name_length")
46
+ @ConfigDefault("30")
47
+ public int getMaxTableNameLength();
48
+ }
49
+
50
+ @Override
51
+ protected Class<? extends PluginTask> getTaskClass()
52
+ {
53
+ return GenericPluginTask.class;
54
+ }
55
+
56
+ @Override
57
+ protected Features getFeatures(PluginTask task)
58
+ {
59
+ GenericPluginTask t = (GenericPluginTask) task;
60
+ return new Features()
61
+ .setMaxTableNameLength(t.getMaxTableNameLength())
62
+ .setSupportedModes(ImmutableSet.of(Mode.INSERT, Mode.INSERT_DIRECT, Mode.TRUNCATE_INSERT, Mode.REPLACE));
63
+ }
64
+
65
+ @Override
66
+ protected SnowflakeOutputConnector getConnector(PluginTask task, boolean retryableMetadataOperation)
67
+ {
68
+ GenericPluginTask t = (GenericPluginTask) task;
69
+
70
+ if (t.getDriverPath().isPresent()) {
71
+ addDriverJarToClasspath(t.getDriverPath().get());
72
+ }
73
+
74
+ Properties props = new Properties();
75
+
76
+ props.putAll(t.getOptions());
77
+
78
+ if (t.getUser().isPresent()) {
79
+ props.setProperty("user", t.getUser().get());
80
+ }
81
+ if (t.getPassword().isPresent()) {
82
+ props.setProperty("password", t.getPassword().get());
83
+ }
84
+ logConnectionProperties(t.getUrl(), props);
85
+
86
+ return new SnowflakeOutputConnector(t.getUrl(), props,
87
+ t.getSchema().orNull(), t.getTransactionIsolation());
88
+ }
89
+
90
+ @Override
91
+ protected BatchInsert newBatchInsert(PluginTask task, Optional<MergeConfig> mergeConfig) throws IOException, SQLException
92
+ {
93
+ return new StandardBatchInsert(getConnector(task, true), mergeConfig);
94
+ }
95
+
96
+ @Override
97
+ protected void doBegin(JdbcOutputConnection con,
98
+ PluginTask task, final Schema schema, int taskCount) throws SQLException
99
+ {
100
+
101
+ SnowflakeOutputConnection snowflakeCon = (SnowflakeOutputConnection)con;
102
+
103
+ super.doBegin(snowflakeCon, task, schema, taskCount);
104
+ }
105
+
106
+ @Override
107
+ public Optional<JdbcSchema> newJdbcSchemaFromTableIfExists(JdbcOutputConnection connection,
108
+ TableIdentifier table) throws SQLException
109
+ {
110
+ if (!connection.tableExists(table)) {
111
+ // DatabaseMetaData.getPrimaryKeys fails if table does not exist
112
+ return Optional.absent();
113
+ }
114
+
115
+ DatabaseMetaData dbm = connection.getMetaData();
116
+ String escape = dbm.getSearchStringEscape();
117
+ String catalog = dbm.getConnection().getCatalog();
118
+
119
+ ResultSet rs = dbm.getPrimaryKeys(catalog, table.getSchemaName(), table.getTableName());
120
+ ImmutableSet.Builder<String> primaryKeysBuilder = ImmutableSet.builder();
121
+ try {
122
+ while(rs.next()) {
123
+ primaryKeysBuilder.add(rs.getString("COLUMN_NAME"));
124
+ }
125
+ } finally {
126
+ rs.close();
127
+ }
128
+ ImmutableSet<String> primaryKeys = primaryKeysBuilder.build();
129
+
130
+ ImmutableList.Builder<JdbcColumn> builder = ImmutableList.builder();
131
+ rs = dbm.getColumns(
132
+ catalog,
133
+ JdbcUtils.escapeSearchString(table.getSchemaName(), escape),
134
+ JdbcUtils.escapeSearchString(table.getTableName(), escape),
135
+ null);
136
+ try {
137
+ while (rs.next()) {
138
+ String columnName = rs.getString("COLUMN_NAME");
139
+ String simpleTypeName = rs.getString("TYPE_NAME").toUpperCase(Locale.ENGLISH);
140
+ boolean isUniqueKey = primaryKeys.contains(columnName);
141
+ int sqlType = rs.getInt("DATA_TYPE");
142
+ int colSize = rs.getInt("COLUMN_SIZE");
143
+ int decDigit = rs.getInt("DECIMAL_DIGITS");
144
+ if (rs.wasNull()) {
145
+ decDigit = -1;
146
+ }
147
+ int charOctetLength = rs.getInt("CHAR_OCTET_LENGTH");
148
+ boolean isNotNull = "NO".equals(rs.getString("IS_NULLABLE"));
149
+ //rs.getString("COLUMN_DEF") // or null // TODO
150
+ builder.add(JdbcColumn.newGenericTypeColumn(
151
+ columnName, sqlType, simpleTypeName, colSize, decDigit, charOctetLength, isNotNull, isUniqueKey));
152
+ // We can't get declared column name using JDBC API.
153
+ // Subclasses need to overwrite it.
154
+ }
155
+ } finally {
156
+ rs.close();
157
+ }
158
+ List<JdbcColumn> columns = builder.build();
159
+ if (columns.isEmpty()) {
160
+ return Optional.absent();
161
+ } else {
162
+ return Optional.of(new JdbcSchema(columns));
163
+ }
164
+ }
165
+ }
@@ -0,0 +1,51 @@
1
+ package org.embulk.output.snowflake;
2
+
3
+ import java.sql.Connection;
4
+ import java.sql.ResultSet;
5
+ import java.sql.SQLException;
6
+ import java.sql.Statement;
7
+
8
+ import org.embulk.output.jdbc.JdbcOutputConnection;
9
+ import org.embulk.output.jdbc.JdbcUtils;
10
+ import org.embulk.output.jdbc.TableIdentifier;
11
+
12
+ public class SnowflakeOutputConnection
13
+ extends JdbcOutputConnection
14
+ {
15
+ public SnowflakeOutputConnection(Connection connection, String schemaName)
16
+ throws SQLException
17
+ {
18
+ super(connection, schemaName);
19
+ }
20
+
21
+ @Override
22
+ public boolean tableExists(TableIdentifier table) throws SQLException
23
+ {
24
+ String schemaName = JdbcUtils.escapeSearchString(table.getSchemaName(), connection.getMetaData().getSearchStringEscape());
25
+ String database = connection.getCatalog();
26
+ try (ResultSet rs = connection.getMetaData().getTables(database, schemaName, table.getTableName(), null)) {
27
+ return rs.next();
28
+ }
29
+ }
30
+
31
+ @Override
32
+ public boolean tableExists(String tableName) throws SQLException
33
+ {
34
+ return tableExists(new TableIdentifier(connection.getCatalog(), schemaName, tableName));
35
+ }
36
+
37
+ @Override
38
+ protected void setSearchPath(String schema) throws SQLException
39
+ {
40
+ Statement stmt = connection.createStatement();
41
+ try {
42
+ String sql = "USE SCHEMA " + quoteIdentifierString(schema);
43
+ executeUpdate(stmt, sql);
44
+ commitIfNecessary(connection);
45
+ } finally {
46
+ stmt.close();
47
+ }
48
+ }
49
+
50
+
51
+ }
@@ -0,0 +1,48 @@
1
+ package org.embulk.output.snowflake;
2
+
3
+ import com.google.common.base.Optional;
4
+ import org.embulk.output.jdbc.AbstractJdbcOutputConnector;
5
+ import org.embulk.output.jdbc.JdbcOutputConnection;
6
+ import org.embulk.output.jdbc.TransactionIsolation;
7
+
8
+ import java.sql.Connection;
9
+ import java.sql.DriverManager;
10
+ import java.sql.SQLException;
11
+ import java.util.Properties;
12
+
13
+ public class SnowflakeOutputConnector
14
+ extends AbstractJdbcOutputConnector
15
+ {
16
+ private final String url;
17
+ private final Properties properties;
18
+ private final String schemaName;
19
+
20
+ public SnowflakeOutputConnector(String url, Properties properties, String schemaName,
21
+ Optional<TransactionIsolation> transactionIsolation)
22
+ {
23
+ super(transactionIsolation);
24
+ try {
25
+ Class.forName("com.snowflake.client.jdbc.SnowflakeDriver");
26
+ } catch (Exception ex) {
27
+ throw new RuntimeException(ex);
28
+ }
29
+ this.url = url;
30
+ this.properties = properties;
31
+ this.schemaName = schemaName;
32
+ }
33
+
34
+ @Override
35
+ protected JdbcOutputConnection connect() throws SQLException
36
+ {
37
+ Connection c = DriverManager.getConnection(url, properties);
38
+ try {
39
+ SnowflakeOutputConnection con = new SnowflakeOutputConnection(c, schemaName);
40
+ c = null;
41
+ return con;
42
+ } finally {
43
+ if (c != null) {
44
+ c.close();
45
+ }
46
+ }
47
+ }
48
+ }
metadata ADDED
@@ -0,0 +1,52 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: embulk-output-snowflake
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.1
5
+ platform: ruby
6
+ authors:
7
+ - Piotr Wilkowski
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2019-05-29 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: Inserts or updates records to a table.
14
+ email:
15
+ - piwil@wp.pl
16
+ executables: []
17
+ extensions: []
18
+ extra_rdoc_files: []
19
+ files:
20
+ - README.md
21
+ - build.gradle
22
+ - classpath/embulk-output-jdbc-0.1.1.jar
23
+ - classpath/embulk-output-snowflake-0.1.1.jar
24
+ - lib/embulk/output/snowflake.rb
25
+ - src/main/java/org/embulk/output/SnowflakeOutputPlugin.java
26
+ - src/main/java/org/embulk/output/snowflake/SnowflakeOutputConnection.java
27
+ - src/main/java/org/embulk/output/snowflake/SnowflakeOutputConnector.java
28
+ homepage: ''
29
+ licenses:
30
+ - Apache 2.0
31
+ metadata: {}
32
+ post_install_message:
33
+ rdoc_options: []
34
+ require_paths:
35
+ - lib
36
+ required_ruby_version: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ required_rubygems_version: !ruby/object:Gem::Requirement
42
+ requirements:
43
+ - - ">="
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ requirements: []
47
+ rubyforge_project:
48
+ rubygems_version: 2.4.8
49
+ signing_key:
50
+ specification_version: 4
51
+ summary: Snowflake output plugin for Embulk
52
+ test_files: []