embulk-output-elasticsearch_1.x 0.1.8

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: f7d5473cbc2fdad178a9d50b9b9b5e4c416290dc
4
+ data.tar.gz: edabbc01a70b8350e5963d668602fe5e5e923fbb
5
+ SHA512:
6
+ metadata.gz: 4b455b0eda32674f9e6c64d5af0c1a5b191b824b59d82b9180d5c3d1c0ad220b409c959fa8b125b757d5aa6d4b0fec185626f31b21a2726e8372b34ca772d5d4
7
+ data.tar.gz: f077a539cc227e714110ffe60e2e19ebe53ab9b527c907a005483d30c07ccc549323f115921ad6c8b30ff3281a36373ec3f6603aa0ffe423e54a5a894c145194
data/.gitignore ADDED
@@ -0,0 +1,7 @@
1
+ *~
2
+ *.iml
3
+ .idea
4
+ build/
5
+ /classpath/
6
+ /.gradle
7
+ /pkg/
data/CHANGELOG.md ADDED
@@ -0,0 +1,51 @@
1
+ ## 0.3.1 - 2016-06-21
2
+
3
+ * [maintenance] Update Elasticsearch client to 2.3.3 [#25](https://github.com/muga/embulk-output-elasticsearch/pull/25)
4
+
5
+ ## 0.3.0 - 2016-02-22
6
+
7
+ * [maintenance] Upgrade Embulk v08 [#21](https://github.com/muga/embulk-output-elasticsearch/pull/21)
8
+
9
+ ## 0.2.1 - 2016-02-05
10
+
11
+ * [maintenance] Fix bug. Force to fail jobs if nodes down while executing [#19](https://github.com/muga/embulk-output-elasticsearch/pull/19)
12
+
13
+ ## 0.2.0 - 2016-01-26
14
+
15
+ * [new feature] Support Elasticsearch 2.x [#12](https://github.com/muga/embulk-output-elasticsearch/pull/12)
16
+ * [new feature] Added replace mode [#15](https://github.com/muga/embulk-output-elasticsearch/pull/15)
17
+ * [maintenance] Fix id param's behavior [#14](https://github.com/muga/embulk-output-elasticsearch/pull/14)
18
+ * [maintenance] Added unit tests [#17](https://github.com/muga/embulk-output-elasticsearch/pull/17)
19
+ * [maintenance] Upgraded Embulk to v0.7.7
20
+
21
+ ## 0.1.8 - 2015-08-19
22
+
23
+ * [maintenance] Upgraded Embulk to v0.7.0
24
+ * [maintenance] Upgraded Elasticsearch to v1.5.2
25
+
26
+ ## 0.1.7 - 2015-05-09
27
+
28
+ * [maintenance] Fixed handling null value [#10](https://github.com/muga/embulk-output-elasticsearch/pull/10)
29
+
30
+ ## 0.1.6 - 2015-04-14
31
+
32
+ * [new feature] Added bulk_size parameter [#8](https://github.com/muga/embulk-output-elasticsearch/pull/8)
33
+
34
+ ## 0.1.5 - 2015-03-26
35
+
36
+ * [new feature] Added cluster_name parameter [#7](https://github.com/muga/embulk-output-elasticsearch/pull/7)
37
+
38
+ ## 0.1.4 - 2015-03-19
39
+
40
+ * [maintenance] Fixed parameter names index_name to index, doc_id_column to id. [#5](https://github.com/muga/embulk-output-elasticsearch/pull/5)
41
+ * [maintenance] Fixed typo at parameter [#6](https://github.com/muga/embulk-output-elasticsearch/pull/6)
42
+
43
+ ## 0.1.3 - 2015-02-25
44
+
45
+ * [new feature] Supported timestamp column [#4](https://github.com/muga/embulk-output-elasticsearch/pull/4)
46
+
47
+ ## 0.1.2 - 2015-02-24
48
+
49
+ ## 0.1.1 - 2015-02-16
50
+
51
+ ## 0.1.0 - 2015-02-16
data/README.md ADDED
@@ -0,0 +1,117 @@
1
+ # Elasticsearch output plugin for Embulk
2
+
3
+ **Notice** This plugin doesn't support [Amazon(AWS) Elasticsearch Service](https://aws.amazon.com/elasticsearch-service/).
4
+ Plugin uses [Transport Client](https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.0/transport-client.html) but AWS Elasticsearch doesn't support this method.
5
+ > The service supports HTTP on port 80, but does not support TCP transport.
6
+ - *[Amazon Elasticsearch Service Limits](http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-limits.html)*
7
+
8
+ ## Overview
9
+
10
+ * **Plugin type**: output
11
+ * **Rollback supported**: no
12
+ * **Resume supported**: no
13
+ * **Cleanup supported**: no
14
+
15
+ ## Configuration
16
+
17
+ - **mode**: "insert" or "replace". See below(string, optional, default is insert)
18
+ - **nodes**: list of nodes. nodes are pairs of host and port (list, required)
19
+ - **cluster_name**: name of the cluster (string, default is "elasticsearch")
20
+ - **index**: index name (string, required)
21
+ - **index_type**: index type (string, required)
22
+ - **id**: document id column (string, default is null)
23
+ - **bulk_actions**: Sets when to flush a new bulk request based on the number of actions currently added. (int, default is 1000)
24
+ - **bulk_size**: Sets when to flush a new bulk request based on the size of actions currently added. (long, default is 5242880)
25
+ - **concurrent_requests**: concurrent_requests (int, default is 5)
26
+
27
+ ### Modes
28
+
29
+ #### insert:
30
+
31
+ default.
32
+ This mode writes data to existing index.
33
+
34
+ #### replace:
35
+
36
+ 1. Create new temporary index
37
+ 2. Insert data into the new index
38
+ 3. replace the alias with the new index. If alias doesn't exists, plugin will create new alias.
39
+ 4. Delete existing (old) index if exists
40
+
41
+ Index should not exists with the same name as the alias
42
+
43
+ ```yaml
44
+ out:
45
+ type: elasticsearch
46
+ mode: replace
47
+ nodes:
48
+ - {host: localhost, port: 9300}
49
+ index: <alias name> # plugin generates index name like <index>_%Y%m%d-%H%M%S
50
+ index_type: <index type>
51
+ ```
52
+
53
+ ## Example
54
+
55
+ ```yaml
56
+ out:
57
+ type: elasticsearch
58
+ mode: insert
59
+ nodes:
60
+ - {host: localhost, port: 9300}
61
+ index: <index name>
62
+ index_type: <index type>
63
+ ```
64
+
65
+ ## Build
66
+
67
+ ```
68
+ $ ./gradlew gem # -t to watch change of files and rebuild continuously
69
+ ```
70
+
71
+ ## Test
72
+
73
+ ```
74
+ $ ./gradlew test # -t to watch change of files and rebuild continuously
75
+ ```
76
+
77
+ To run unit tests, we need to configure the following environment variables.
78
+
79
+ When environment variables are not set, skip almost test cases.
80
+
81
+ ```
82
+ ES_HOST
83
+ ES_PORT(optional, if needed, default: 9300)
84
+ ES_INDEX
85
+ ES_INDEX_TYPE
86
+ ```
87
+
88
+ If you're using Mac OS X El Capitan and GUI Applications(IDE), like as follows.
89
+ ```
90
+ $ vi ~/Library/LaunchAgents/environment.plist
91
+ <?xml version="1.0" encoding="UTF-8"?>
92
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
93
+ <plist version="1.0">
94
+ <dict>
95
+ <key>Label</key>
96
+ <string>my.startup</string>
97
+ <key>ProgramArguments</key>
98
+ <array>
99
+ <string>sh</string>
100
+ <string>-c</string>
101
+ <string>
102
+ launchctl setenv ES_HOST example.com
103
+ launchctl setenv ES_PORT 9300
104
+ launchctl setenv ES_INDEX embulk
105
+ launchctl setenv ES_INDEX_TYPE embulk
106
+ </string>
107
+ </array>
108
+ <key>RunAtLoad</key>
109
+ <true/>
110
+ </dict>
111
+ </plist>
112
+
113
+ $ launchctl load ~/Library/LaunchAgents/environment.plist
114
+ $ launchctl getenv ES_INDEX //try to get value.
115
+
116
+ Then start your applications.
117
+ ```
data/build.gradle ADDED
@@ -0,0 +1,70 @@
1
+ plugins {
2
+ id "com.jfrog.bintray" version "1.1"
3
+ id "com.github.jruby-gradle.base" version "0.1.5"
4
+ id "java"
5
+ }
6
+ import com.github.jrubygradle.JRubyExec
7
+ repositories {
8
+ mavenCentral()
9
+ jcenter()
10
+ mavenLocal()
11
+ }
12
+ configurations {
13
+ provided
14
+ }
15
+
16
+ version = "0.1.8"
17
+
18
+ compileJava.options.encoding = 'UTF-8' // source encoding
19
+ sourceCompatibility = 1.7
20
+ targetCompatibility = 1.7
21
+
22
+ dependencies {
23
+ compile "org.embulk:embulk-core:0.8.9"
24
+ provided "org.embulk:embulk-core:0.8.9"
25
+ compile 'org.elasticsearch:elasticsearch:1.7.2'
26
+ testCompile "junit:junit:4.+"
27
+ testCompile "org.mockito:mockito-core:1.+"
28
+ }
29
+
30
+ task classpath(type: Copy, dependsOn: ["jar"]) {
31
+ doFirst { file("classpath").deleteDir() }
32
+ from (configurations.runtime - configurations.provided + files(jar.archivePath))
33
+ into "classpath"
34
+ }
35
+ clean { delete 'classpath' }
36
+
37
+ //task copyDependencies(type:Copy) {
38
+ // new File("$buildDir/libs/dependencies").mkdirs()
39
+ // into "$buildDir/libs/dependencies" from configurations.runtime
40
+ //}
41
+
42
+ task gem(type: JRubyExec, dependsOn: ["build", "gemspec", "classpath"]) {
43
+ jrubyArgs "-rrubygems/gem_runner", "-eGem::GemRunner.new.run(ARGV)", "build"
44
+ script "build/gemspec"
45
+ doLast { ant.move(file: "${project.name}-${project.version}.gem", todir: "pkg") }
46
+ }
47
+
48
+ task gemspec << { file("build/gemspec").write($/
49
+ Gem::Specification.new do |spec|
50
+ spec.name = "${project.name}"
51
+ spec.version = "${project.version}"
52
+ spec.authors = ["Muga Nishizawa", "Shinji Ikeda"]
53
+ spec.summary = %[Elasticsearch 1.x output plugin for Embulk]
54
+ spec.description = %[Elasticsearch 1.x output plugin is an Embulk plugin that loads records to Elasticsearch read by any input plugins. Search the input plugins by "embulk-input" keyword.]
55
+ spec.email = ["muga.nishizawa@gmail.com", "gm.ikeda@gmail.com"]
56
+ spec.licenses = ["Apache 2.0"]
57
+ spec.homepage = "https://github.com/shinjiikeda/embulk-output-elasticsearch"
58
+
59
+ spec.files = `git ls-files`.split("\n") + Dir["classpath/*.jar"]
60
+ spec.test_files = spec.files.grep(%r"^(test|spec)/")
61
+ spec.require_paths = ["lib"]
62
+ spec.executables = spec.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
63
+ spec.has_rdoc = false
64
+
65
+ spec.add_development_dependency "bundler", [">= 1.0"]
66
+ spec.add_development_dependency "rake", [">= 10.0"]
67
+ spec.add_development_dependency "test-unit", ["~> 3.0.2"]
68
+ end
69
+ /$)
70
+ }
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
@@ -0,0 +1,6 @@
1
+ #Tue Aug 11 00:26:20 PDT 2015
2
+ distributionBase=GRADLE_USER_HOME
3
+ distributionPath=wrapper/dists
4
+ zipStoreBase=GRADLE_USER_HOME
5
+ zipStorePath=wrapper/dists
6
+ distributionUrl=https\://services.gradle.org/distributions/gradle-2.6-bin.zip
data/gradlew ADDED
@@ -0,0 +1,164 @@
1
+ #!/usr/bin/env bash
2
+
3
+ ##############################################################################
4
+ ##
5
+ ## Gradle start up script for UN*X
6
+ ##
7
+ ##############################################################################
8
+
9
+ # Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
10
+ DEFAULT_JVM_OPTS=""
11
+
12
+ APP_NAME="Gradle"
13
+ APP_BASE_NAME=`basename "$0"`
14
+
15
+ # Use the maximum available, or set MAX_FD != -1 to use that value.
16
+ MAX_FD="maximum"
17
+
18
+ warn ( ) {
19
+ echo "$*"
20
+ }
21
+
22
+ die ( ) {
23
+ echo
24
+ echo "$*"
25
+ echo
26
+ exit 1
27
+ }
28
+
29
+ # OS specific support (must be 'true' or 'false').
30
+ cygwin=false
31
+ msys=false
32
+ darwin=false
33
+ case "`uname`" in
34
+ CYGWIN* )
35
+ cygwin=true
36
+ ;;
37
+ Darwin* )
38
+ darwin=true
39
+ ;;
40
+ MINGW* )
41
+ msys=true
42
+ ;;
43
+ esac
44
+
45
+ # For Cygwin, ensure paths are in UNIX format before anything is touched.
46
+ if $cygwin ; then
47
+ [ -n "$JAVA_HOME" ] && JAVA_HOME=`cygpath --unix "$JAVA_HOME"`
48
+ fi
49
+
50
+ # Attempt to set APP_HOME
51
+ # Resolve links: $0 may be a link
52
+ PRG="$0"
53
+ # Need this for relative symlinks.
54
+ while [ -h "$PRG" ] ; do
55
+ ls=`ls -ld "$PRG"`
56
+ link=`expr "$ls" : '.*-> \(.*\)$'`
57
+ if expr "$link" : '/.*' > /dev/null; then
58
+ PRG="$link"
59
+ else
60
+ PRG=`dirname "$PRG"`"/$link"
61
+ fi
62
+ done
63
+ SAVED="`pwd`"
64
+ cd "`dirname \"$PRG\"`/" >&-
65
+ APP_HOME="`pwd -P`"
66
+ cd "$SAVED" >&-
67
+
68
+ CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar
69
+
70
+ # Determine the Java command to use to start the JVM.
71
+ if [ -n "$JAVA_HOME" ] ; then
72
+ if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
73
+ # IBM's JDK on AIX uses strange locations for the executables
74
+ JAVACMD="$JAVA_HOME/jre/sh/java"
75
+ else
76
+ JAVACMD="$JAVA_HOME/bin/java"
77
+ fi
78
+ if [ ! -x "$JAVACMD" ] ; then
79
+ die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME
80
+
81
+ Please set the JAVA_HOME variable in your environment to match the
82
+ location of your Java installation."
83
+ fi
84
+ else
85
+ JAVACMD="java"
86
+ which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
87
+
88
+ Please set the JAVA_HOME variable in your environment to match the
89
+ location of your Java installation."
90
+ fi
91
+
92
+ # Increase the maximum file descriptors if we can.
93
+ if [ "$cygwin" = "false" -a "$darwin" = "false" ] ; then
94
+ MAX_FD_LIMIT=`ulimit -H -n`
95
+ if [ $? -eq 0 ] ; then
96
+ if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
97
+ MAX_FD="$MAX_FD_LIMIT"
98
+ fi
99
+ ulimit -n $MAX_FD
100
+ if [ $? -ne 0 ] ; then
101
+ warn "Could not set maximum file descriptor limit: $MAX_FD"
102
+ fi
103
+ else
104
+ warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
105
+ fi
106
+ fi
107
+
108
+ # For Darwin, add options to specify how the application appears in the dock
109
+ if $darwin; then
110
+ GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
111
+ fi
112
+
113
+ # For Cygwin, switch paths to Windows format before running java
114
+ if $cygwin ; then
115
+ APP_HOME=`cygpath --path --mixed "$APP_HOME"`
116
+ CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`
117
+
118
+ # We build the pattern for arguments to be converted via cygpath
119
+ ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
120
+ SEP=""
121
+ for dir in $ROOTDIRSRAW ; do
122
+ ROOTDIRS="$ROOTDIRS$SEP$dir"
123
+ SEP="|"
124
+ done
125
+ OURCYGPATTERN="(^($ROOTDIRS))"
126
+ # Add a user-defined pattern to the cygpath arguments
127
+ if [ "$GRADLE_CYGPATTERN" != "" ] ; then
128
+ OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
129
+ fi
130
+ # Now convert the arguments - kludge to limit ourselves to /bin/sh
131
+ i=0
132
+ for arg in "$@" ; do
133
+ CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
134
+ CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option
135
+
136
+ if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition
137
+ eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
138
+ else
139
+ eval `echo args$i`="\"$arg\""
140
+ fi
141
+ i=$((i+1))
142
+ done
143
+ case $i in
144
+ (0) set -- ;;
145
+ (1) set -- "$args0" ;;
146
+ (2) set -- "$args0" "$args1" ;;
147
+ (3) set -- "$args0" "$args1" "$args2" ;;
148
+ (4) set -- "$args0" "$args1" "$args2" "$args3" ;;
149
+ (5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
150
+ (6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
151
+ (7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
152
+ (8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
153
+ (9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
154
+ esac
155
+ fi
156
+
157
+ # Split up the JVM_OPTS And GRADLE_OPTS values into an array, following the shell quoting and substitution rules
158
+ function splitJvmOpts() {
159
+ JVM_OPTS=("$@")
160
+ }
161
+ eval splitJvmOpts $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS
162
+ JVM_OPTS[${#JVM_OPTS[*]}]="-Dorg.gradle.appname=$APP_BASE_NAME"
163
+
164
+ exec "$JAVACMD" "${JVM_OPTS[@]}" -classpath "$CLASSPATH" org.gradle.wrapper.GradleWrapperMain "$@"
data/gradlew.bat ADDED
@@ -0,0 +1,90 @@
1
+ @if "%DEBUG%" == "" @echo off
2
+ @rem ##########################################################################
3
+ @rem
4
+ @rem Gradle startup script for Windows
5
+ @rem
6
+ @rem ##########################################################################
7
+
8
+ @rem Set local scope for the variables with windows NT shell
9
+ if "%OS%"=="Windows_NT" setlocal
10
+
11
+ @rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
12
+ set DEFAULT_JVM_OPTS=
13
+
14
+ set DIRNAME=%~dp0
15
+ if "%DIRNAME%" == "" set DIRNAME=.
16
+ set APP_BASE_NAME=%~n0
17
+ set APP_HOME=%DIRNAME%
18
+
19
+ @rem Find java.exe
20
+ if defined JAVA_HOME goto findJavaFromJavaHome
21
+
22
+ set JAVA_EXE=java.exe
23
+ %JAVA_EXE% -version >NUL 2>&1
24
+ if "%ERRORLEVEL%" == "0" goto init
25
+
26
+ echo.
27
+ echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
28
+ echo.
29
+ echo Please set the JAVA_HOME variable in your environment to match the
30
+ echo location of your Java installation.
31
+
32
+ goto fail
33
+
34
+ :findJavaFromJavaHome
35
+ set JAVA_HOME=%JAVA_HOME:"=%
36
+ set JAVA_EXE=%JAVA_HOME%/bin/java.exe
37
+
38
+ if exist "%JAVA_EXE%" goto init
39
+
40
+ echo.
41
+ echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME%
42
+ echo.
43
+ echo Please set the JAVA_HOME variable in your environment to match the
44
+ echo location of your Java installation.
45
+
46
+ goto fail
47
+
48
+ :init
49
+ @rem Get command-line arguments, handling Windowz variants
50
+
51
+ if not "%OS%" == "Windows_NT" goto win9xME_args
52
+ if "%@eval[2+2]" == "4" goto 4NT_args
53
+
54
+ :win9xME_args
55
+ @rem Slurp the command line arguments.
56
+ set CMD_LINE_ARGS=
57
+ set _SKIP=2
58
+
59
+ :win9xME_args_slurp
60
+ if "x%~1" == "x" goto execute
61
+
62
+ set CMD_LINE_ARGS=%*
63
+ goto execute
64
+
65
+ :4NT_args
66
+ @rem Get arguments from the 4NT Shell from JP Software
67
+ set CMD_LINE_ARGS=%$
68
+
69
+ :execute
70
+ @rem Setup the command line
71
+
72
+ set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar
73
+
74
+ @rem Execute Gradle
75
+ "%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %CMD_LINE_ARGS%
76
+
77
+ :end
78
+ @rem End local scope for the variables with windows NT shell
79
+ if "%ERRORLEVEL%"=="0" goto mainEnd
80
+
81
+ :fail
82
+ rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
83
+ rem the _cmd.exe /c_ return code!
84
+ if not "" == "%GRADLE_EXIT_CONSOLE%" exit 1
85
+ exit /b 1
86
+
87
+ :mainEnd
88
+ if "%OS%"=="Windows_NT" endlocal
89
+
90
+ :omega
@@ -0,0 +1,3 @@
1
+ Embulk::JavaPlugin.register_output(
2
+ :elasticsearch, "org.embulk.output.elasticsearch.ElasticsearchOutputPlugin",
3
+ File.expand_path('../../../../classpath', __FILE__))
data/settings.gradle ADDED
@@ -0,0 +1 @@
1
+ rootProject.name = 'embulk-output-elasticsearch_1.x'
@@ -0,0 +1,431 @@
1
+ package org.embulk.output.elasticsearch;
2
+
3
+ import com.google.common.base.Optional;
4
+ import com.google.common.base.Throwables;
5
+ import com.google.common.collect.ImmutableList;
6
+ import com.google.inject.Inject;
7
+ import org.elasticsearch.action.bulk.BulkItemResponse;
8
+ import org.elasticsearch.action.bulk.BulkProcessor;
9
+ import org.elasticsearch.action.bulk.BulkRequest;
10
+ import org.elasticsearch.action.bulk.BulkResponse;
11
+ import org.elasticsearch.action.index.IndexRequest;
12
+ import org.elasticsearch.client.Client;
13
+ import org.elasticsearch.client.Requests;
14
+ import org.elasticsearch.client.transport.TransportClient;
15
+ import org.elasticsearch.common.unit.ByteSizeValue;
16
+ import org.elasticsearch.common.unit.ByteSizeUnit;
17
+ import org.elasticsearch.common.settings.ImmutableSettings;
18
+ import org.elasticsearch.common.settings.Settings;
19
+ import org.elasticsearch.common.transport.InetSocketTransportAddress;
20
+ import org.elasticsearch.common.xcontent.XContentBuilder;
21
+ import org.elasticsearch.common.xcontent.XContentFactory;
22
+ import org.elasticsearch.node.Node;
23
+ import org.elasticsearch.node.NodeBuilder;
24
+ import org.embulk.config.TaskReport;
25
+ import org.embulk.config.Config;
26
+ import org.embulk.config.ConfigDefault;
27
+ import org.embulk.config.ConfigDiff;
28
+ import org.embulk.config.ConfigSource;
29
+ import org.embulk.config.Task;
30
+ import org.embulk.config.TaskSource;
31
+ import org.embulk.spi.Column;
32
+ import org.embulk.spi.Exec;
33
+ import org.embulk.spi.OutputPlugin;
34
+ import org.embulk.spi.Page;
35
+ import org.embulk.spi.PageReader;
36
+ import org.embulk.spi.Schema;
37
+ import org.embulk.spi.ColumnVisitor;
38
+ import org.embulk.spi.TransactionalPageOutput;
39
+ import org.embulk.spi.type.Types;
40
+ import org.slf4j.Logger;
41
+
42
+ import java.io.IOException;
43
+ import java.util.Date;
44
+ import java.util.List;
45
+ import java.util.concurrent.TimeUnit;
46
+
47
+ import static com.google.common.base.Preconditions.checkState;
48
+
49
+ public class ElasticsearchOutputPlugin
50
+ implements OutputPlugin
51
+ {
52
+ public interface NodeAddressTask
53
+ extends Task
54
+ {
55
+ @Config("host")
56
+ public String getHost();
57
+
58
+ @Config("port")
59
+ @ConfigDefault("9300")
60
+ public int getPort();
61
+ }
62
+
63
+ public interface PluginTask
64
+ extends Task
65
+ {
66
+ @Config("nodes")
67
+ public List<NodeAddressTask> getNodes();
68
+
69
+ @Config("cluster_name")
70
+ @ConfigDefault("\"elasticsearch\"")
71
+ public String getClusterName();
72
+
73
+ @Config("index")
74
+ public String getIndex();
75
+
76
+ @Config("index_type")
77
+ public String getType();
78
+
79
+ @Config("id")
80
+ @ConfigDefault("null")
81
+ public Optional<String> getId();
82
+
83
+ @Config("bulk_actions")
84
+ @ConfigDefault("1000")
85
+ public int getBulkActions();
86
+
87
+ @Config("bulk_size")
88
+ @ConfigDefault("5242880")
89
+ public long getBulkSize();
90
+
91
+ @Config("concurrent_requests")
92
+ @ConfigDefault("5")
93
+ public int getConcurrentRequests();
94
+ }
95
+
96
+ private final Logger log;
97
+
98
+ @Inject
99
+ public ElasticsearchOutputPlugin()
100
+ {
101
+ log = Exec.getLogger(getClass());
102
+ }
103
+
104
+ @Override
105
+ public ConfigDiff transaction(ConfigSource config, Schema schema,
106
+ int processorCount, Control control)
107
+ {
108
+ final PluginTask task = config.loadConfig(PluginTask.class);
109
+
110
+ // confirm that a client can be initialized
111
+ try (Client client = createClient(task)) {
112
+ }
113
+
114
+ try {
115
+ control.run(task.dump());
116
+ } catch (Exception e) {
117
+ throw Throwables.propagate(e);
118
+ }
119
+
120
+ ConfigDiff nextConfig = Exec.newConfigDiff();
121
+ return nextConfig;
122
+ }
123
+
124
+ @Override
125
+ public ConfigDiff resume(TaskSource taskSource,
126
+ Schema schema, int processorCount,
127
+ OutputPlugin.Control control)
128
+ {
129
+ // TODO
130
+ return Exec.newConfigDiff();
131
+ }
132
+
133
+ @Override
134
+ public void cleanup(TaskSource taskSource,
135
+ Schema schema, int processorCount,
136
+ List<TaskReport> successTaskReports)
137
+ { }
138
+
139
+ private Client createClient(final PluginTask task)
140
+ {
141
+ // @see http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/client.html
142
+ Settings settings = ImmutableSettings.settingsBuilder()
143
+ .classLoader(Settings.class.getClassLoader())
144
+ .put("cluster.name", task.getClusterName())
145
+ .build();
146
+ TransportClient client = new TransportClient(settings);
147
+ List<NodeAddressTask> nodes = task.getNodes();
148
+ for (NodeAddressTask node : nodes) {
149
+ client.addTransportAddress(new InetSocketTransportAddress(node.getHost(), node.getPort()));
150
+ }
151
+ return client;
152
+ }
153
+
154
+ private BulkProcessor newBulkProcessor(final PluginTask task, final Client client)
155
+ {
156
+ return BulkProcessor.builder(client, new BulkProcessor.Listener() {
157
+ @Override
158
+ public void beforeBulk(long executionId, BulkRequest request)
159
+ {
160
+ log.info("Execute {} bulk actions", request.numberOfActions());
161
+ }
162
+
163
+ @Override
164
+ public void afterBulk(long executionId, BulkRequest request, BulkResponse response)
165
+ {
166
+ if (response.hasFailures()) {
167
+ long items = 0;
168
+ if (log.isDebugEnabled()) {
169
+ for (BulkItemResponse item : response.getItems()) {
170
+ if (item.isFailed()) {
171
+ items += 1;
172
+ log.debug(" Error for {}/{}/{} for {} operation: {}",
173
+ item.getIndex(), item.getType(), item.getId(),
174
+ item.getOpType(), item.getFailureMessage());
175
+ }
176
+ }
177
+ }
178
+ log.warn("{} bulk actions failed: {}", items, response.buildFailureMessage());
179
+ } else {
180
+ log.info("{} bulk actions succeeded", request.numberOfActions());
181
+ }
182
+ }
183
+
184
+ @Override
185
+ public void afterBulk(long executionId, BulkRequest request, Throwable failure)
186
+ {
187
+ log.warn("Got the error during bulk processing", failure);
188
+ }
189
+ }).setBulkActions(task.getBulkActions())
190
+ .setBulkSize(new ByteSizeValue(task.getBulkSize()))
191
+ .setConcurrentRequests(task.getConcurrentRequests())
192
+ .build();
193
+ }
194
+
195
+ @Override
196
+ public TransactionalPageOutput open(TaskSource taskSource, Schema schema,
197
+ int processorIndex)
198
+ {
199
+ final PluginTask task = taskSource.loadTask(PluginTask.class);
200
+
201
+ Client client = createClient(task);
202
+ BulkProcessor bulkProcessor = newBulkProcessor(task, client);
203
+ ElasticsearchPageOutput pageOutput = new ElasticsearchPageOutput(task, client, bulkProcessor);
204
+ pageOutput.open(schema);
205
+ return pageOutput;
206
+ }
207
+
208
+ public static class ElasticsearchPageOutput implements TransactionalPageOutput
209
+ {
210
+ private Logger log;
211
+
212
+ private Client client;
213
+ private BulkProcessor bulkProcessor;
214
+
215
+ private PageReader pageReader;
216
+ private Column idColumn;
217
+
218
+ private final String index;
219
+ private final String type;
220
+ private final String id;
221
+
222
+ public ElasticsearchPageOutput(PluginTask task, Client client, BulkProcessor bulkProcessor)
223
+ {
224
+ this.log = Exec.getLogger(getClass());
225
+
226
+ this.client = client;
227
+ this.bulkProcessor = bulkProcessor;
228
+
229
+ this.index = task.getIndex();
230
+ this.type = task.getType();
231
+ this.id = task.getId().orNull();
232
+ }
233
+
234
+ void open(final Schema schema)
235
+ {
236
+ pageReader = new PageReader(schema);
237
+ idColumn = (id == null) ? null : schema.lookupColumn(id);
238
+ }
239
+
240
+ @Override
241
+ public void add(Page page)
242
+ {
243
+ pageReader.setPage(page);
244
+
245
+ while (pageReader.nextRecord()) {
246
+ try {
247
+ final XContentBuilder contextBuilder = XContentFactory.jsonBuilder().startObject(); // TODO reusable??
248
+ pageReader.getSchema().visitColumns(new ColumnVisitor() {
249
+ @Override
250
+ public void booleanColumn(Column column) {
251
+ try {
252
+ if (pageReader.isNull(column)) {
253
+ contextBuilder.nullField(column.getName());
254
+ } else {
255
+ contextBuilder.field(column.getName(), pageReader.getBoolean(column));
256
+ }
257
+ } catch (IOException e) {
258
+ try {
259
+ contextBuilder.nullField(column.getName());
260
+ } catch (IOException ex) {
261
+ throw Throwables.propagate(ex);
262
+ }
263
+ }
264
+ }
265
+
266
+ @Override
267
+ public void longColumn(Column column) {
268
+ try {
269
+ if (pageReader.isNull(column)) {
270
+ contextBuilder.nullField(column.getName());
271
+ } else {
272
+ contextBuilder.field(column.getName(), pageReader.getLong(column));
273
+ }
274
+ } catch (IOException e) {
275
+ try {
276
+ contextBuilder.nullField(column.getName());
277
+ } catch (IOException ex) {
278
+ throw Throwables.propagate(ex);
279
+ }
280
+ }
281
+ }
282
+
283
+ @Override
284
+ public void doubleColumn(Column column) {
285
+ try {
286
+ if (pageReader.isNull(column)) {
287
+ contextBuilder.nullField(column.getName());
288
+ } else {
289
+ contextBuilder.field(column.getName(), pageReader.getDouble(column));
290
+ }
291
+ } catch (IOException e) {
292
+ try {
293
+ contextBuilder.nullField(column.getName());
294
+ } catch (IOException ex) {
295
+ throw Throwables.propagate(ex);
296
+ }
297
+ }
298
+ }
299
+
300
+ @Override
301
+ public void stringColumn(Column column) {
302
+ try {
303
+ if (pageReader.isNull(column)) {
304
+ contextBuilder.nullField(column.getName());
305
+ } else {
306
+ contextBuilder.field(column.getName(), pageReader.getString(column));
307
+ }
308
+ } catch (IOException e) {
309
+ try {
310
+ contextBuilder.nullField(column.getName());
311
+ } catch (IOException ex) {
312
+ throw Throwables.propagate(ex);
313
+ }
314
+ }
315
+ }
316
+
317
+ @Override
318
+ public void jsonColumn(Column column) {
319
+ try {
320
+ if (pageReader.isNull(column)) {
321
+ contextBuilder.nullField(column.getName());
322
+ } else {
323
+ contextBuilder.field(column.getName(), pageReader.getJson(column).toJson());
324
+ }
325
+ } catch (IOException e) {
326
+ try {
327
+ contextBuilder.nullField(column.getName());
328
+ } catch (IOException ex) {
329
+ throw Throwables.propagate(ex);
330
+ }
331
+ }
332
+ }
333
+
334
+ @Override
335
+ public void timestampColumn(Column column) {
336
+ try {
337
+ if (pageReader.isNull(column)) {
338
+ contextBuilder.nullField(column.getName());
339
+ } else {
340
+ contextBuilder.field(column.getName(), new Date(pageReader.getTimestamp(column).toEpochMilli()));
341
+ }
342
+ } catch (IOException e) {
343
+ try {
344
+ contextBuilder.nullField(column.getName());
345
+ } catch (IOException ex) {
346
+ throw Throwables.propagate(ex);
347
+ }
348
+ }
349
+ }
350
+ });
351
+
352
+ contextBuilder.endObject();
353
+ bulkProcessor.add(newIndexRequest(getIdValue(idColumn)).source(contextBuilder));
354
+
355
+ } catch (IOException e) {
356
+ Throwables.propagate(e); // TODO error handling
357
+ }
358
+ }
359
+ }
360
+
361
+ private String getIdValue(Column inputColumn) {
362
+ if (inputColumn == null) return null;
363
+ if (pageReader.isNull(inputColumn)) return null;
364
+ String idValue = null;
365
+ if (Types.STRING.equals(inputColumn.getType())) {
366
+ idValue = pageReader.getString(inputColumn);
367
+ } else if (Types.BOOLEAN.equals(inputColumn.getType())) {
368
+ idValue = pageReader.getBoolean(inputColumn) + "";
369
+ } else if (Types.DOUBLE.equals(inputColumn.getType())) {
370
+ idValue = pageReader.getDouble(inputColumn) + "";
371
+ } else if (Types.LONG.equals(inputColumn.getType())) {
372
+ idValue = pageReader.getLong(inputColumn) + "";
373
+ } else if (Types.TIMESTAMP.equals(inputColumn.getType())) {
374
+ idValue = pageReader.getTimestamp(inputColumn).toString();
375
+ } else {
376
+ idValue = null;
377
+ }
378
+ return idValue;
379
+ }
380
+
381
+ private IndexRequest newIndexRequest(String idValue)
382
+ {
383
+ return Requests.indexRequest(index).type(type).id(idValue);
384
+ }
385
+
386
+ @Override
387
+ public void finish()
388
+ {
389
+ try {
390
+ bulkProcessor.flush();
391
+ } finally {
392
+ close();
393
+ }
394
+ }
395
+
396
+ @Override
397
+ public void close()
398
+ {
399
+ if (bulkProcessor != null) {
400
+ try {
401
+ while (!bulkProcessor.awaitClose(3, TimeUnit.SECONDS)) {
402
+ log.debug("wait for closing the bulk processing..");
403
+ }
404
+ } catch (InterruptedException e) {
405
+ Thread.currentThread().interrupt();
406
+ }
407
+ bulkProcessor = null;
408
+ }
409
+
410
+ if (client != null) {
411
+ client.close(); // ElasticsearchException
412
+ client = null;
413
+ }
414
+ }
415
+
416
+ @Override
417
+ public void abort()
418
+ {
419
+ // TODO do nothing
420
+ }
421
+
422
+ @Override
423
+ public TaskReport commit()
424
+ {
425
+ TaskReport report = Exec.newTaskReport();
426
+ // TODO
427
+ return report;
428
+ }
429
+
430
+ }
431
+ }
@@ -0,0 +1,5 @@
1
+ package org.embulk.output.elasticsearch;
2
+
3
+ public class TestElasticsearchOutputPlugin
4
+ {
5
+ }
@@ -0,0 +1,5 @@
1
+ id,account,time,purchase,flg,score,comment
2
+ 1,32864,2015-01-27 19:23:49,20150127,1,123.45,embulk
3
+ 2,14824,2015-01-27 19:01:23,20150127,0,234,56,embulk
4
+ 3,27559,2015-01-28 02:20:02,20150128,1,678.90,embulk
5
+ 4,11270,2015-01-29 11:54:36,20150129,0,100.00,embulk
metadata ADDED
@@ -0,0 +1,116 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: embulk-output-elasticsearch_1.x
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.8
5
+ platform: ruby
6
+ authors:
7
+ - Muga Nishizawa
8
+ - Shinji Ikeda
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2016-06-24 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - '>='
18
+ - !ruby/object:Gem::Version
19
+ version: '1.0'
20
+ name: bundler
21
+ prerelease: false
22
+ type: :development
23
+ version_requirements: !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - '>='
26
+ - !ruby/object:Gem::Version
27
+ version: '1.0'
28
+ - !ruby/object:Gem::Dependency
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - '>='
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ name: rake
35
+ prerelease: false
36
+ type: :development
37
+ version_requirements: !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - '>='
40
+ - !ruby/object:Gem::Version
41
+ version: '10.0'
42
+ - !ruby/object:Gem::Dependency
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ~>
46
+ - !ruby/object:Gem::Version
47
+ version: 3.0.2
48
+ name: test-unit
49
+ prerelease: false
50
+ type: :development
51
+ version_requirements: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ~>
54
+ - !ruby/object:Gem::Version
55
+ version: 3.0.2
56
+ description: Elasticsearch 1.x output plugin is an Embulk plugin that loads records to Elasticsearch read by any input plugins. Search the input plugins by "embulk-input" keyword.
57
+ email:
58
+ - muga.nishizawa@gmail.com
59
+ - gm.ikeda@gmail.com
60
+ executables: []
61
+ extensions: []
62
+ extra_rdoc_files: []
63
+ files:
64
+ - .gitignore
65
+ - CHANGELOG.md
66
+ - README.md
67
+ - build.gradle
68
+ - gradle/wrapper/gradle-wrapper.jar
69
+ - gradle/wrapper/gradle-wrapper.properties
70
+ - gradlew
71
+ - gradlew.bat
72
+ - lib/embulk/output/elasticsearch.rb
73
+ - settings.gradle
74
+ - src/main/java/org/embulk/output/elasticsearch/ElasticsearchOutputPlugin.java
75
+ - src/test/java/org/embulk/output/elasticsearch/TestElasticsearchOutputPlugin.java
76
+ - src/test/resources/sample_01.csv
77
+ - classpath/elasticsearch-1.7.2.jar
78
+ - classpath/embulk-output-elasticsearch_1.x-0.1.8.jar
79
+ - classpath/lucene-analyzers-common-4.10.4.jar
80
+ - classpath/lucene-core-4.10.4.jar
81
+ - classpath/lucene-grouping-4.10.4.jar
82
+ - classpath/lucene-highlighter-4.10.4.jar
83
+ - classpath/lucene-join-4.10.4.jar
84
+ - classpath/lucene-memory-4.10.4.jar
85
+ - classpath/lucene-misc-4.10.4.jar
86
+ - classpath/lucene-queries-4.10.4.jar
87
+ - classpath/lucene-queryparser-4.10.4.jar
88
+ - classpath/lucene-sandbox-4.10.4.jar
89
+ - classpath/lucene-spatial-4.10.4.jar
90
+ - classpath/lucene-suggest-4.10.4.jar
91
+ - classpath/spatial4j-0.4.1.jar
92
+ homepage: https://github.com/shinjiikeda/embulk-output-elasticsearch
93
+ licenses:
94
+ - Apache 2.0
95
+ metadata: {}
96
+ post_install_message:
97
+ rdoc_options: []
98
+ require_paths:
99
+ - lib
100
+ required_ruby_version: !ruby/object:Gem::Requirement
101
+ requirements:
102
+ - - '>='
103
+ - !ruby/object:Gem::Version
104
+ version: '0'
105
+ required_rubygems_version: !ruby/object:Gem::Requirement
106
+ requirements:
107
+ - - '>='
108
+ - !ruby/object:Gem::Version
109
+ version: '0'
110
+ requirements: []
111
+ rubyforge_project:
112
+ rubygems_version: 2.1.9
113
+ signing_key:
114
+ specification_version: 4
115
+ summary: Elasticsearch 1.x output plugin for Embulk
116
+ test_files: []