embulk 0.2.0 → 0.2.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +8 -8
- data/ChangeLog +9 -0
- data/README.md +14 -4
- data/bin/embulk +6 -10
- data/build.gradle +2 -1
- data/embulk-cli/pom.xml +1 -1
- data/embulk-core/pom.xml +6 -1
- data/embulk-core/src/main/java/org/embulk/EmbulkService.java +2 -2
- data/embulk-core/src/main/java/org/embulk/exec/LocalExecutor.java +3 -1
- data/embulk-core/src/main/java/org/embulk/spi/Buffer.java +3 -0
- data/embulk-core/src/main/java/org/embulk/spi/Extension.java +7 -5
- data/embulk-core/src/main/java/org/embulk/spi/type/AbstractType.java +3 -0
- data/embulk-core/src/main/java/org/embulk/spi/util/LineEncoder.java +7 -2
- data/embulk-standards/pom.xml +1 -1
- data/embulk-standards/src/main/java/org/embulk/standards/CsvParserPlugin.java +0 -5
- data/embulk-standards/src/main/java/org/embulk/standards/CsvTokenizer.java +1 -2
- data/lib/embulk/schema.rb +11 -7
- data/lib/embulk/version.rb +1 -1
- data/pom.xml +10 -2
- metadata +6 -7
- data/examples/config.yml +0 -34
- data/examples/csv/sample.csv.gz +0 -0
checksums.yaml
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
3
|
metadata.gz: !binary |-
|
4
|
-
|
4
|
+
NGUwOTc0ZDE1MWZlZjJhYjdhNmJmMjQwZjliOWU3MmEyYmM5ZTczNQ==
|
5
5
|
data.tar.gz: !binary |-
|
6
|
-
|
6
|
+
ZjA1YTE5NDlhZGViMTU1NjVmOTBhZDVlZDY5NGZjODI0NGU5OGViZA==
|
7
7
|
SHA512:
|
8
8
|
metadata.gz: !binary |-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
NDY3NzQ1NTkxNTk5MzAzMGQ2ZmIzYjM0YjMyMTczOGM1YjhmYzFkYTg0YTY3
|
10
|
+
ZDBlNzdiYWIwZmVkMWU5YzA3NTEyYzA2ZGI3YjQyMTQ5ZDI2MWI3ZWEwZTM0
|
11
|
+
YjQ3MTllMTZkYzdlYTM2YjNlZDVjNGIwYjEwNDVhNjlmN2IxYTk=
|
12
12
|
data.tar.gz: !binary |-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
MGZjMzM1NmVhNzdhZDhjODg3ZWZiNGRmOWQwMTU5MzUwZmEwYTBkMDY1MTgz
|
14
|
+
NDBhOTAwM2Y3NDNjM2VlZTE1YjRkZjA4MWNiZjZjN2QzOTBjYTliMzJlYTgw
|
15
|
+
OGY2ZGZmMDJmMTI4ZWU1YjNmMTMxNTc5NDdjN2NiODkxYzQ4MmI=
|
data/ChangeLog
CHANGED
@@ -1,4 +1,13 @@
|
|
1
1
|
|
2
|
+
2015-01-29 version 0.2.1:
|
3
|
+
|
4
|
+
* Fixed LineEncoder#finish to flush all remaining buffer (reported by @aibou)
|
5
|
+
* Fixed NextConfig to be merged to in: or out: rather than the top-level
|
6
|
+
(reported by enukane) [#41]
|
7
|
+
* ./bin/embulk shows warns to run `rake` if ./classpath doesn't exist
|
8
|
+
* Embulk::PageBuilder#add accepts nil
|
9
|
+
|
10
|
+
|
2
11
|
2015-01-26 version 0.2.0:
|
3
12
|
|
4
13
|
* Changed JRuby InputPlugin API to use #run instead of .run
|
data/README.md
CHANGED
@@ -4,14 +4,24 @@ A plugin-based parallel bulk data loader that makes painful data integration wor
|
|
4
4
|
|
5
5
|
## What's Embulk?
|
6
6
|
|
7
|
-
|
7
|
+
Embulk is a plugin-based parallel bulk data loader that helps **data transfer** between various **storages**, **databases**, **NoSQL** and **cloud services**.
|
8
|
+
|
9
|
+
You can install input and output plugins to integrate many other file formats and storages.
|
10
|
+
|
11
|
+
You also can release plugins to share your efforts of data cleaning, error handling, transaction control, and retrying.
|
12
|
+
Packaging effrots into plugins **brings OSS-style development to the data scripts** which **was tend to be one-time adhoc scripts**.
|
13
|
+
|
14
|
+
[Embuk, an open-source plugin-based parallel bulk data loader](http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed) at Slideshare
|
15
|
+
|
16
|
+
[![Embulk](https://gist.githubusercontent.com/frsyuki/f322a77ee2766a508ba9/raw/e8539b6b4fda1b3357e8c79d3966aa8148dbdbd3/embulk-overview.png)](http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed/12)
|
17
|
+
|
8
18
|
|
9
19
|
## Quick Start
|
10
20
|
|
11
21
|
The single-file package is the simplest way to try Embulk. You can download the latest embulk-VERSION.jar from [the releases page](https://bintray.com/embulk/maven/embulk/view#files) and run it with java:
|
12
22
|
|
13
23
|
```
|
14
|
-
wget https://bintray.com/artifact/download/embulk/maven/embulk-0.2.
|
24
|
+
wget https://bintray.com/artifact/download/embulk/maven/embulk-0.2.1.jar -O embulk.jar
|
15
25
|
java -jar embulk.jar --help
|
16
26
|
```
|
17
27
|
|
@@ -27,14 +37,14 @@ java -jar embulk.jar run config.yml
|
|
27
37
|
### Using plugins
|
28
38
|
|
29
39
|
You can use plugins to load data from/to various systems and file formats.
|
30
|
-
An example is [embulk-output-postgres-json]() plugin. It outputs data into PostgreSQL server using "json" column type.
|
40
|
+
An example is [embulk-output-postgres-json](https://github.com/frsyuki/embulk-plugin-postgres-json) plugin. It outputs data into PostgreSQL server using "json" column type.
|
31
41
|
|
32
42
|
```
|
33
43
|
java -jar embulk.jar gem install embulk-output-postgres-json
|
34
44
|
java -jar embulk.jar gem list
|
35
45
|
```
|
36
46
|
|
37
|
-
You can search plugins on RubyGems: [search for "embulk-"](https://rubygems.org/search?utf8=%E2%9C%93&query=embulk-).
|
47
|
+
You can search plugins on RubyGems: [search for "embulk-plugin"](https://rubygems.org/search?utf8=%E2%9C%93&query=embulk-plugin).
|
38
48
|
|
39
49
|
### Using plugin bundle
|
40
50
|
|
data/bin/embulk
CHANGED
@@ -37,20 +37,16 @@ unless java_cmd
|
|
37
37
|
end
|
38
38
|
end
|
39
39
|
|
40
|
-
embulk_home =
|
41
|
-
unless embulk_home
|
42
|
-
embulk_home = File.dirname(File.dirname(__FILE__))
|
43
|
-
end
|
44
|
-
ENV['EMBULK_HOME'] = File.expand_path(embulk_home)
|
45
|
-
|
40
|
+
embulk_home = File.dirname(File.dirname(__FILE__))
|
46
41
|
classpath_dir = File.join(embulk_home, 'classpath')
|
47
42
|
lib_dir = File.join(embulk_home, 'lib')
|
48
43
|
|
49
|
-
jruby_complete = Dir.entries(classpath_dir).find {|jar| jar =~ /jruby-complete-[\d\.]+\.jar/ }
|
44
|
+
jruby_complete = Dir.entries(classpath_dir).find {|jar| jar =~ /jruby-complete-[\d\.]+\.jar/ } rescue nil
|
50
45
|
unless jruby_complete
|
51
|
-
STDERR.puts "Could not find jruby-complete at
|
52
|
-
|
53
|
-
|
46
|
+
STDERR.puts "Could not find jruby-complete at #{embulk_home}/classpath directory."
|
47
|
+
if embulk_home == '.'
|
48
|
+
STDERR.puts "Did you run \`rake\`? You need to build java code and create ./classpath directory first."
|
49
|
+
end
|
54
50
|
raise SystemExit.new(1)
|
55
51
|
end
|
56
52
|
|
data/build.gradle
CHANGED
@@ -23,7 +23,7 @@ allprojects {
|
|
23
23
|
apply plugin: 'com.jfrog.bintray'
|
24
24
|
|
25
25
|
group = 'org.embulk'
|
26
|
-
version = '0.2.
|
26
|
+
version = '0.2.1'
|
27
27
|
|
28
28
|
// to upload artifacts to Bintray by gradle-bintray-plugin
|
29
29
|
// $ gradle bintrayUpload
|
@@ -92,6 +92,7 @@ subprojects {
|
|
92
92
|
'org.slf4j:slf4j-api:1.7.9',
|
93
93
|
'org.slf4j:slf4j-log4j12:1.7.9',
|
94
94
|
'org.jruby:jruby-complete:1.7.16.1',
|
95
|
+
'com.google.code.findbugs:annotations:3.0.0',
|
95
96
|
'org.yaml:snakeyaml:1.14',
|
96
97
|
'javax.validation:validation-api:1.1.0.Final',
|
97
98
|
'org.apache.bval:bval-jsr303:0.5',
|
data/embulk-cli/pom.xml
CHANGED
data/embulk-core/pom.xml
CHANGED
@@ -5,7 +5,7 @@
|
|
5
5
|
<parent>
|
6
6
|
<groupId>org.embulk</groupId>
|
7
7
|
<artifactId>embulk-parent</artifactId>
|
8
|
-
<version>0.2.
|
8
|
+
<version>0.2.1-SNAPSHOT</version>
|
9
9
|
</parent>
|
10
10
|
|
11
11
|
<artifactId>embulk-core</artifactId>
|
@@ -117,6 +117,11 @@
|
|
117
117
|
<artifactId>jruby-complete</artifactId>
|
118
118
|
</dependency>
|
119
119
|
|
120
|
+
<dependency>
|
121
|
+
<groupId>com.google.code.findbugs</groupId>
|
122
|
+
<artifactId>annotations</artifactId>
|
123
|
+
</dependency>
|
124
|
+
|
120
125
|
<!-- for guess_charset plugin -->
|
121
126
|
<dependency>
|
122
127
|
<groupId>com.ibm.icu</groupId>
|
@@ -23,11 +23,11 @@ public class EmbulkService
|
|
23
23
|
modules.add(new ExtensionServiceLoaderModule(systemConfig));
|
24
24
|
modules.add(new BuiltinPluginSourceModule());
|
25
25
|
modules.add(new JRubyScriptingModule(systemConfig));
|
26
|
-
modules.addAll(getAdditionalModules());
|
26
|
+
modules.addAll(getAdditionalModules(systemConfig));
|
27
27
|
injector = Guice.createInjector(modules.build());
|
28
28
|
}
|
29
29
|
|
30
|
-
protected Iterable<? extends Module> getAdditionalModules()
|
30
|
+
protected Iterable<? extends Module> getAdditionalModules(ConfigSource systemConfig)
|
31
31
|
{
|
32
32
|
return ImmutableList.of();
|
33
33
|
}
|
@@ -109,7 +109,9 @@ public class LocalExecutor
|
|
109
109
|
if (outputNextConfig == null) {
|
110
110
|
outputNextConfig = Exec.newNextConfig();
|
111
111
|
}
|
112
|
-
NextConfig nextConfig =
|
112
|
+
NextConfig nextConfig = Exec.newNextConfig();
|
113
|
+
nextConfig.getNestedOrSetEmpty("in").merge(inputNextConfig);
|
114
|
+
nextConfig.getNestedOrSetEmpty("out").merge(outputNextConfig);
|
113
115
|
return new ExecuteResult(nextConfig);
|
114
116
|
}
|
115
117
|
}
|
@@ -2,6 +2,8 @@ package org.embulk.spi;
|
|
2
2
|
|
3
3
|
import java.util.Arrays;
|
4
4
|
|
5
|
+
import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
|
6
|
+
|
5
7
|
public class Buffer
|
6
8
|
{
|
7
9
|
public static final Buffer EMPTY = Buffer.allocate(0);
|
@@ -48,6 +50,7 @@ public class Buffer
|
|
48
50
|
return new Buffer(src, offset, size).limit(size);
|
49
51
|
}
|
50
52
|
|
53
|
+
@SuppressFBWarnings(value = "EI_EXPOSE_REP")
|
51
54
|
public byte[] array()
|
52
55
|
{
|
53
56
|
return array;
|
@@ -13,6 +13,7 @@ import org.embulk.config.ConfigSource;
|
|
13
13
|
*
|
14
14
|
* An example extention to add a custom PluginSource will be as following:
|
15
15
|
*
|
16
|
+
* <code>
|
16
17
|
* class MyPluginSourceExtension
|
17
18
|
* implements Extension, Module
|
18
19
|
* {
|
@@ -22,19 +23,20 @@ import org.embulk.config.ConfigSource;
|
|
22
23
|
* // ...
|
23
24
|
* }
|
24
25
|
*
|
25
|
-
* @Override
|
26
|
+
* {@literal @}Override
|
26
27
|
* public void configure(Binder binder)
|
27
28
|
* {
|
28
|
-
* Multibinder
|
29
|
+
* Multibinder<PluginSource> multibinder = Multibinder.newSetBinder(binder, PluginSource.class);
|
29
30
|
* multibinder.addBinding().to(MyPluginSource.class);
|
30
31
|
* }
|
31
32
|
*
|
32
|
-
* @Override
|
33
|
-
* public List
|
33
|
+
* {@literal @}Override
|
34
|
+
* public List<Module> getModules()
|
34
35
|
* {
|
35
|
-
* return ImmutableList
|
36
|
+
* return ImmutableList.<Module>of(this);
|
36
37
|
* }
|
37
38
|
* }
|
39
|
+
* </code>
|
38
40
|
*/
|
39
41
|
public interface Extension
|
40
42
|
{
|
@@ -1,5 +1,7 @@
|
|
1
1
|
package org.embulk.spi.type;
|
2
2
|
|
3
|
+
import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
|
4
|
+
|
3
5
|
public abstract class AbstractType
|
4
6
|
implements Type
|
5
7
|
{
|
@@ -32,6 +34,7 @@ public abstract class AbstractType
|
|
32
34
|
return fixedStorageSize;
|
33
35
|
}
|
34
36
|
|
37
|
+
@SuppressFBWarnings(value = "EQ_UNUSUAL")
|
35
38
|
@Override
|
36
39
|
public boolean equals(Object o)
|
37
40
|
{
|
@@ -92,8 +92,13 @@ public class LineEncoder
|
|
92
92
|
|
93
93
|
public void finish()
|
94
94
|
{
|
95
|
-
|
96
|
-
|
95
|
+
try {
|
96
|
+
writer.flush(); // flush all remaining buffer in writer because FileOutputOutputStream.close() doesn't flush buffer
|
97
|
+
outputStream.finish();
|
98
|
+
writer.close();
|
99
|
+
} catch (IOException ex) {
|
100
|
+
throw new RuntimeException(ex);
|
101
|
+
}
|
97
102
|
}
|
98
103
|
|
99
104
|
@Override
|
data/embulk-standards/pom.xml
CHANGED
@@ -205,11 +205,6 @@ public class CsvParserPlugin
|
|
205
205
|
private static String nextColumn(Schema schema, CsvTokenizer tokenizer, String nullStringOrNull)
|
206
206
|
{
|
207
207
|
String v = tokenizer.nextColumn();
|
208
|
-
if (v == null) {
|
209
|
-
throw new RuntimeException(String.format("Expected %d columns but line %d has fewer number of columns",
|
210
|
-
schema.getColumnCount(), tokenizer.getCurrentLineNumber()));
|
211
|
-
}
|
212
|
-
|
213
208
|
if (!v.isEmpty()) {
|
214
209
|
if (v.equals(nullStringOrNull)) {
|
215
210
|
return null;
|
@@ -5,7 +5,6 @@ import java.util.List;
|
|
5
5
|
import java.util.ArrayList;
|
6
6
|
import java.util.Deque;
|
7
7
|
import java.util.ArrayDeque;
|
8
|
-
import java.util.Iterator;
|
9
8
|
import org.embulk.spi.util.LineDecoder;
|
10
9
|
|
11
10
|
public class CsvTokenizer
|
@@ -71,7 +70,7 @@ public class CsvTokenizer
|
|
71
70
|
quotedValueLines.clear();
|
72
71
|
}
|
73
72
|
recordState = RecordState.END;
|
74
|
-
return
|
73
|
+
return skippedLine;
|
75
74
|
}
|
76
75
|
|
77
76
|
public boolean nextFile()
|
data/lib/embulk/schema.rb
CHANGED
@@ -33,21 +33,25 @@ module Embulk
|
|
33
33
|
record_writer_script = "lambda do |builder,record|\n"
|
34
34
|
record_writer_script << "java_timestamp_class = ::Embulk::Java::Timestamp\n"
|
35
35
|
each do |column|
|
36
|
-
|
36
|
+
idx = column.index
|
37
|
+
column_script = "if record[#{idx}].nil?\n" <<
|
38
|
+
"builder.setNull(#{idx})\n" <<
|
39
|
+
"else\n" <<
|
37
40
|
case column.type
|
38
41
|
when :boolean
|
39
|
-
"builder.setBoolean(#{
|
42
|
+
"builder.setBoolean(#{idx}, record[#{idx}])"
|
40
43
|
when :long
|
41
|
-
"builder.setLong(#{
|
44
|
+
"builder.setLong(#{idx}, record[#{idx}])"
|
42
45
|
when :double
|
43
|
-
"builder.setDouble(#{
|
46
|
+
"builder.setDouble(#{idx}, record[#{idx}])"
|
44
47
|
when :string
|
45
|
-
"builder.setString(#{
|
48
|
+
"builder.setString(#{idx}, record[#{idx}])"
|
46
49
|
when :timestamp
|
47
|
-
"builder.setTimestamp(#{
|
50
|
+
"builder.setTimestamp(#{idx}, java_timestamp_class.fromRubyTime(record[#{idx}]))"
|
48
51
|
else
|
49
52
|
raise "Unknown type #{column.type.inspect}"
|
50
|
-
end
|
53
|
+
end <<
|
54
|
+
"end\n"
|
51
55
|
record_writer_script << column_script << "\n"
|
52
56
|
end
|
53
57
|
record_writer_script << "builder.addRecord\n"
|
data/lib/embulk/version.rb
CHANGED
data/pom.xml
CHANGED
@@ -4,7 +4,7 @@
|
|
4
4
|
|
5
5
|
<groupId>org.embulk</groupId>
|
6
6
|
<artifactId>embulk-parent</artifactId>
|
7
|
-
<version>0.2.
|
7
|
+
<version>0.2.1-SNAPSHOT</version>
|
8
8
|
<packaging>pom</packaging>
|
9
9
|
|
10
10
|
<name>Embulk</name>
|
@@ -222,6 +222,14 @@
|
|
222
222
|
<version>1.9.5</version>
|
223
223
|
<scope>test</scope>
|
224
224
|
</dependency>
|
225
|
+
|
226
|
+
<dependency>
|
227
|
+
<groupId>com.google.code.findbugs</groupId>
|
228
|
+
<artifactId>annotations</artifactId>
|
229
|
+
<version>3.0.0</version>
|
230
|
+
<scope>compile</scope>
|
231
|
+
</dependency>
|
232
|
+
|
225
233
|
</dependencies>
|
226
234
|
</dependencyManagement>
|
227
235
|
|
@@ -410,7 +418,7 @@
|
|
410
418
|
<plugin>
|
411
419
|
<groupId>org.codehaus.mojo</groupId>
|
412
420
|
<artifactId>findbugs-maven-plugin</artifactId>
|
413
|
-
<version>
|
421
|
+
<version>3.0.0</version>
|
414
422
|
<configuration>
|
415
423
|
<skip>${project.check.skip-findbugs}</skip>
|
416
424
|
<jvmArgs>-Xmx${project.build.jvmsize}</jvmArgs>
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: embulk
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Sadayuki Furuhashi
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-01-
|
11
|
+
date: 2015-01-29 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -113,6 +113,7 @@ files:
|
|
113
113
|
- Rakefile
|
114
114
|
- bin/embulk
|
115
115
|
- build.gradle
|
116
|
+
- classpath/annotations-3.0.0.jar
|
116
117
|
- classpath/aopalliance-1.0.jar
|
117
118
|
- classpath/aws-java-sdk-1.5.2.jar
|
118
119
|
- classpath/bval-core-0.5.jar
|
@@ -121,9 +122,9 @@ files:
|
|
121
122
|
- classpath/commons-codec-1.3.jar
|
122
123
|
- classpath/commons-lang3-3.1.jar
|
123
124
|
- classpath/commons-logging-1.2.jar
|
124
|
-
- classpath/embulk-cli-0.2.
|
125
|
-
- classpath/embulk-core-0.2.
|
126
|
-
- classpath/embulk-standards-0.2.
|
125
|
+
- classpath/embulk-cli-0.2.1-SNAPSHOT.jar
|
126
|
+
- classpath/embulk-core-0.2.1-SNAPSHOT.jar
|
127
|
+
- classpath/embulk-standards-0.2.1-SNAPSHOT.jar
|
127
128
|
- classpath/guava-17.0.jar
|
128
129
|
- classpath/guice-3.0.jar
|
129
130
|
- classpath/guice-multibindings-3.0.jar
|
@@ -305,8 +306,6 @@ files:
|
|
305
306
|
- embulk-standards/src/test/java/org/embulk/standards/TestCsvTokenizer.java
|
306
307
|
- embulk-standards/src/test/java/org/embulk/standards/TestS3FileInputPlugin.java
|
307
308
|
- embulk.gemspec
|
308
|
-
- examples/config.yml
|
309
|
-
- examples/csv/sample.csv.gz
|
310
309
|
- gradle/wrapper/gradle-wrapper.jar
|
311
310
|
- gradle/wrapper/gradle-wrapper.properties
|
312
311
|
- gradlew
|
data/examples/config.yml
DELETED
@@ -1,34 +0,0 @@
|
|
1
|
-
#exec:
|
2
|
-
# transaction_time: 2015-01-17 00:00:00 UTC
|
3
|
-
# transaction_timezone: UTC
|
4
|
-
in:
|
5
|
-
type: file
|
6
|
-
paths:
|
7
|
-
- examples/csv/
|
8
|
-
#parser:
|
9
|
-
# type: csv
|
10
|
-
# newline: LF
|
11
|
-
# columns:
|
12
|
-
# - {name: date_code, type: string}
|
13
|
-
# - {name: customer_code, type: long}
|
14
|
-
# - {name: product_code, type: string}
|
15
|
-
# - {name: employee_code, type: string}
|
16
|
-
# file_decoders:
|
17
|
-
# - type: gzip
|
18
|
-
out:
|
19
|
-
type: file
|
20
|
-
directory: tmp/
|
21
|
-
file_name: output
|
22
|
-
file_ext: csv
|
23
|
-
#encoders:
|
24
|
-
# - type: gzip
|
25
|
-
# level: 9
|
26
|
-
formatter:
|
27
|
-
type: csv
|
28
|
-
timezone: Etc/UTC
|
29
|
-
#in:
|
30
|
-
# type: file
|
31
|
-
# paths:
|
32
|
-
# - benchmark/
|
33
|
-
#out:
|
34
|
-
# type: "null"
|
data/examples/csv/sample.csv.gz
DELETED
Binary file
|