embulk 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +8 -8
- data/ChangeLog +9 -0
- data/README.md +14 -4
- data/bin/embulk +6 -10
- data/build.gradle +2 -1
- data/embulk-cli/pom.xml +1 -1
- data/embulk-core/pom.xml +6 -1
- data/embulk-core/src/main/java/org/embulk/EmbulkService.java +2 -2
- data/embulk-core/src/main/java/org/embulk/exec/LocalExecutor.java +3 -1
- data/embulk-core/src/main/java/org/embulk/spi/Buffer.java +3 -0
- data/embulk-core/src/main/java/org/embulk/spi/Extension.java +7 -5
- data/embulk-core/src/main/java/org/embulk/spi/type/AbstractType.java +3 -0
- data/embulk-core/src/main/java/org/embulk/spi/util/LineEncoder.java +7 -2
- data/embulk-standards/pom.xml +1 -1
- data/embulk-standards/src/main/java/org/embulk/standards/CsvParserPlugin.java +0 -5
- data/embulk-standards/src/main/java/org/embulk/standards/CsvTokenizer.java +1 -2
- data/lib/embulk/schema.rb +11 -7
- data/lib/embulk/version.rb +1 -1
- data/pom.xml +10 -2
- metadata +6 -7
- data/examples/config.yml +0 -34
- data/examples/csv/sample.csv.gz +0 -0
checksums.yaml
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
3
|
metadata.gz: !binary |-
|
4
|
-
|
4
|
+
NGUwOTc0ZDE1MWZlZjJhYjdhNmJmMjQwZjliOWU3MmEyYmM5ZTczNQ==
|
5
5
|
data.tar.gz: !binary |-
|
6
|
-
|
6
|
+
ZjA1YTE5NDlhZGViMTU1NjVmOTBhZDVlZDY5NGZjODI0NGU5OGViZA==
|
7
7
|
SHA512:
|
8
8
|
metadata.gz: !binary |-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
NDY3NzQ1NTkxNTk5MzAzMGQ2ZmIzYjM0YjMyMTczOGM1YjhmYzFkYTg0YTY3
|
10
|
+
ZDBlNzdiYWIwZmVkMWU5YzA3NTEyYzA2ZGI3YjQyMTQ5ZDI2MWI3ZWEwZTM0
|
11
|
+
YjQ3MTllMTZkYzdlYTM2YjNlZDVjNGIwYjEwNDVhNjlmN2IxYTk=
|
12
12
|
data.tar.gz: !binary |-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
MGZjMzM1NmVhNzdhZDhjODg3ZWZiNGRmOWQwMTU5MzUwZmEwYTBkMDY1MTgz
|
14
|
+
NDBhOTAwM2Y3NDNjM2VlZTE1YjRkZjA4MWNiZjZjN2QzOTBjYTliMzJlYTgw
|
15
|
+
OGY2ZGZmMDJmMTI4ZWU1YjNmMTMxNTc5NDdjN2NiODkxYzQ4MmI=
|
data/ChangeLog
CHANGED
@@ -1,4 +1,13 @@
|
|
1
1
|
|
2
|
+
2015-01-29 version 0.2.1:
|
3
|
+
|
4
|
+
* Fixed LineEncoder#finish to flush all remaining buffer (reported by @aibou)
|
5
|
+
* Fixed NextConfig to be merged to in: or out: rather than the top-level
|
6
|
+
(reported by enukane) [#41]
|
7
|
+
* ./bin/embulk shows warns to run `rake` if ./classpath doesn't exist
|
8
|
+
* Embulk::PageBuilder#add accepts nil
|
9
|
+
|
10
|
+
|
2
11
|
2015-01-26 version 0.2.0:
|
3
12
|
|
4
13
|
* Changed JRuby InputPlugin API to use #run instead of .run
|
data/README.md
CHANGED
@@ -4,14 +4,24 @@ A plugin-based parallel bulk data loader that makes painful data integration wor
|
|
4
4
|
|
5
5
|
## What's Embulk?
|
6
6
|
|
7
|
-
|
7
|
+
Embulk is a plugin-based parallel bulk data loader that helps **data transfer** between various **storages**, **databases**, **NoSQL** and **cloud services**.
|
8
|
+
|
9
|
+
You can install input and output plugins to integrate many other file formats and storages.
|
10
|
+
|
11
|
+
You also can release plugins to share your efforts of data cleaning, error handling, transaction control, and retrying.
|
12
|
+
Packaging effrots into plugins **brings OSS-style development to the data scripts** which **was tend to be one-time adhoc scripts**.
|
13
|
+
|
14
|
+
[Embuk, an open-source plugin-based parallel bulk data loader](http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed) at Slideshare
|
15
|
+
|
16
|
+
[](http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed/12)
|
17
|
+
|
8
18
|
|
9
19
|
## Quick Start
|
10
20
|
|
11
21
|
The single-file package is the simplest way to try Embulk. You can download the latest embulk-VERSION.jar from [the releases page](https://bintray.com/embulk/maven/embulk/view#files) and run it with java:
|
12
22
|
|
13
23
|
```
|
14
|
-
wget https://bintray.com/artifact/download/embulk/maven/embulk-0.2.
|
24
|
+
wget https://bintray.com/artifact/download/embulk/maven/embulk-0.2.1.jar -O embulk.jar
|
15
25
|
java -jar embulk.jar --help
|
16
26
|
```
|
17
27
|
|
@@ -27,14 +37,14 @@ java -jar embulk.jar run config.yml
|
|
27
37
|
### Using plugins
|
28
38
|
|
29
39
|
You can use plugins to load data from/to various systems and file formats.
|
30
|
-
An example is [embulk-output-postgres-json]() plugin. It outputs data into PostgreSQL server using "json" column type.
|
40
|
+
An example is [embulk-output-postgres-json](https://github.com/frsyuki/embulk-plugin-postgres-json) plugin. It outputs data into PostgreSQL server using "json" column type.
|
31
41
|
|
32
42
|
```
|
33
43
|
java -jar embulk.jar gem install embulk-output-postgres-json
|
34
44
|
java -jar embulk.jar gem list
|
35
45
|
```
|
36
46
|
|
37
|
-
You can search plugins on RubyGems: [search for "embulk-"](https://rubygems.org/search?utf8=%E2%9C%93&query=embulk-).
|
47
|
+
You can search plugins on RubyGems: [search for "embulk-plugin"](https://rubygems.org/search?utf8=%E2%9C%93&query=embulk-plugin).
|
38
48
|
|
39
49
|
### Using plugin bundle
|
40
50
|
|
data/bin/embulk
CHANGED
@@ -37,20 +37,16 @@ unless java_cmd
|
|
37
37
|
end
|
38
38
|
end
|
39
39
|
|
40
|
-
embulk_home =
|
41
|
-
unless embulk_home
|
42
|
-
embulk_home = File.dirname(File.dirname(__FILE__))
|
43
|
-
end
|
44
|
-
ENV['EMBULK_HOME'] = File.expand_path(embulk_home)
|
45
|
-
|
40
|
+
embulk_home = File.dirname(File.dirname(__FILE__))
|
46
41
|
classpath_dir = File.join(embulk_home, 'classpath')
|
47
42
|
lib_dir = File.join(embulk_home, 'lib')
|
48
43
|
|
49
|
-
jruby_complete = Dir.entries(classpath_dir).find {|jar| jar =~ /jruby-complete-[\d\.]+\.jar/ }
|
44
|
+
jruby_complete = Dir.entries(classpath_dir).find {|jar| jar =~ /jruby-complete-[\d\.]+\.jar/ } rescue nil
|
50
45
|
unless jruby_complete
|
51
|
-
STDERR.puts "Could not find jruby-complete at
|
52
|
-
|
53
|
-
|
46
|
+
STDERR.puts "Could not find jruby-complete at #{embulk_home}/classpath directory."
|
47
|
+
if embulk_home == '.'
|
48
|
+
STDERR.puts "Did you run \`rake\`? You need to build java code and create ./classpath directory first."
|
49
|
+
end
|
54
50
|
raise SystemExit.new(1)
|
55
51
|
end
|
56
52
|
|
data/build.gradle
CHANGED
@@ -23,7 +23,7 @@ allprojects {
|
|
23
23
|
apply plugin: 'com.jfrog.bintray'
|
24
24
|
|
25
25
|
group = 'org.embulk'
|
26
|
-
version = '0.2.
|
26
|
+
version = '0.2.1'
|
27
27
|
|
28
28
|
// to upload artifacts to Bintray by gradle-bintray-plugin
|
29
29
|
// $ gradle bintrayUpload
|
@@ -92,6 +92,7 @@ subprojects {
|
|
92
92
|
'org.slf4j:slf4j-api:1.7.9',
|
93
93
|
'org.slf4j:slf4j-log4j12:1.7.9',
|
94
94
|
'org.jruby:jruby-complete:1.7.16.1',
|
95
|
+
'com.google.code.findbugs:annotations:3.0.0',
|
95
96
|
'org.yaml:snakeyaml:1.14',
|
96
97
|
'javax.validation:validation-api:1.1.0.Final',
|
97
98
|
'org.apache.bval:bval-jsr303:0.5',
|
data/embulk-cli/pom.xml
CHANGED
data/embulk-core/pom.xml
CHANGED
@@ -5,7 +5,7 @@
|
|
5
5
|
<parent>
|
6
6
|
<groupId>org.embulk</groupId>
|
7
7
|
<artifactId>embulk-parent</artifactId>
|
8
|
-
<version>0.2.
|
8
|
+
<version>0.2.1-SNAPSHOT</version>
|
9
9
|
</parent>
|
10
10
|
|
11
11
|
<artifactId>embulk-core</artifactId>
|
@@ -117,6 +117,11 @@
|
|
117
117
|
<artifactId>jruby-complete</artifactId>
|
118
118
|
</dependency>
|
119
119
|
|
120
|
+
<dependency>
|
121
|
+
<groupId>com.google.code.findbugs</groupId>
|
122
|
+
<artifactId>annotations</artifactId>
|
123
|
+
</dependency>
|
124
|
+
|
120
125
|
<!-- for guess_charset plugin -->
|
121
126
|
<dependency>
|
122
127
|
<groupId>com.ibm.icu</groupId>
|
@@ -23,11 +23,11 @@ public class EmbulkService
|
|
23
23
|
modules.add(new ExtensionServiceLoaderModule(systemConfig));
|
24
24
|
modules.add(new BuiltinPluginSourceModule());
|
25
25
|
modules.add(new JRubyScriptingModule(systemConfig));
|
26
|
-
modules.addAll(getAdditionalModules());
|
26
|
+
modules.addAll(getAdditionalModules(systemConfig));
|
27
27
|
injector = Guice.createInjector(modules.build());
|
28
28
|
}
|
29
29
|
|
30
|
-
protected Iterable<? extends Module> getAdditionalModules()
|
30
|
+
protected Iterable<? extends Module> getAdditionalModules(ConfigSource systemConfig)
|
31
31
|
{
|
32
32
|
return ImmutableList.of();
|
33
33
|
}
|
@@ -109,7 +109,9 @@ public class LocalExecutor
|
|
109
109
|
if (outputNextConfig == null) {
|
110
110
|
outputNextConfig = Exec.newNextConfig();
|
111
111
|
}
|
112
|
-
NextConfig nextConfig =
|
112
|
+
NextConfig nextConfig = Exec.newNextConfig();
|
113
|
+
nextConfig.getNestedOrSetEmpty("in").merge(inputNextConfig);
|
114
|
+
nextConfig.getNestedOrSetEmpty("out").merge(outputNextConfig);
|
113
115
|
return new ExecuteResult(nextConfig);
|
114
116
|
}
|
115
117
|
}
|
@@ -2,6 +2,8 @@ package org.embulk.spi;
|
|
2
2
|
|
3
3
|
import java.util.Arrays;
|
4
4
|
|
5
|
+
import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
|
6
|
+
|
5
7
|
public class Buffer
|
6
8
|
{
|
7
9
|
public static final Buffer EMPTY = Buffer.allocate(0);
|
@@ -48,6 +50,7 @@ public class Buffer
|
|
48
50
|
return new Buffer(src, offset, size).limit(size);
|
49
51
|
}
|
50
52
|
|
53
|
+
@SuppressFBWarnings(value = "EI_EXPOSE_REP")
|
51
54
|
public byte[] array()
|
52
55
|
{
|
53
56
|
return array;
|
@@ -13,6 +13,7 @@ import org.embulk.config.ConfigSource;
|
|
13
13
|
*
|
14
14
|
* An example extention to add a custom PluginSource will be as following:
|
15
15
|
*
|
16
|
+
* <code>
|
16
17
|
* class MyPluginSourceExtension
|
17
18
|
* implements Extension, Module
|
18
19
|
* {
|
@@ -22,19 +23,20 @@ import org.embulk.config.ConfigSource;
|
|
22
23
|
* // ...
|
23
24
|
* }
|
24
25
|
*
|
25
|
-
* @Override
|
26
|
+
* {@literal @}Override
|
26
27
|
* public void configure(Binder binder)
|
27
28
|
* {
|
28
|
-
* Multibinder
|
29
|
+
* Multibinder<PluginSource> multibinder = Multibinder.newSetBinder(binder, PluginSource.class);
|
29
30
|
* multibinder.addBinding().to(MyPluginSource.class);
|
30
31
|
* }
|
31
32
|
*
|
32
|
-
* @Override
|
33
|
-
* public List
|
33
|
+
* {@literal @}Override
|
34
|
+
* public List<Module> getModules()
|
34
35
|
* {
|
35
|
-
* return ImmutableList
|
36
|
+
* return ImmutableList.<Module>of(this);
|
36
37
|
* }
|
37
38
|
* }
|
39
|
+
* </code>
|
38
40
|
*/
|
39
41
|
public interface Extension
|
40
42
|
{
|
@@ -1,5 +1,7 @@
|
|
1
1
|
package org.embulk.spi.type;
|
2
2
|
|
3
|
+
import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
|
4
|
+
|
3
5
|
public abstract class AbstractType
|
4
6
|
implements Type
|
5
7
|
{
|
@@ -32,6 +34,7 @@ public abstract class AbstractType
|
|
32
34
|
return fixedStorageSize;
|
33
35
|
}
|
34
36
|
|
37
|
+
@SuppressFBWarnings(value = "EQ_UNUSUAL")
|
35
38
|
@Override
|
36
39
|
public boolean equals(Object o)
|
37
40
|
{
|
@@ -92,8 +92,13 @@ public class LineEncoder
|
|
92
92
|
|
93
93
|
public void finish()
|
94
94
|
{
|
95
|
-
|
96
|
-
|
95
|
+
try {
|
96
|
+
writer.flush(); // flush all remaining buffer in writer because FileOutputOutputStream.close() doesn't flush buffer
|
97
|
+
outputStream.finish();
|
98
|
+
writer.close();
|
99
|
+
} catch (IOException ex) {
|
100
|
+
throw new RuntimeException(ex);
|
101
|
+
}
|
97
102
|
}
|
98
103
|
|
99
104
|
@Override
|
data/embulk-standards/pom.xml
CHANGED
@@ -205,11 +205,6 @@ public class CsvParserPlugin
|
|
205
205
|
private static String nextColumn(Schema schema, CsvTokenizer tokenizer, String nullStringOrNull)
|
206
206
|
{
|
207
207
|
String v = tokenizer.nextColumn();
|
208
|
-
if (v == null) {
|
209
|
-
throw new RuntimeException(String.format("Expected %d columns but line %d has fewer number of columns",
|
210
|
-
schema.getColumnCount(), tokenizer.getCurrentLineNumber()));
|
211
|
-
}
|
212
|
-
|
213
208
|
if (!v.isEmpty()) {
|
214
209
|
if (v.equals(nullStringOrNull)) {
|
215
210
|
return null;
|
@@ -5,7 +5,6 @@ import java.util.List;
|
|
5
5
|
import java.util.ArrayList;
|
6
6
|
import java.util.Deque;
|
7
7
|
import java.util.ArrayDeque;
|
8
|
-
import java.util.Iterator;
|
9
8
|
import org.embulk.spi.util.LineDecoder;
|
10
9
|
|
11
10
|
public class CsvTokenizer
|
@@ -71,7 +70,7 @@ public class CsvTokenizer
|
|
71
70
|
quotedValueLines.clear();
|
72
71
|
}
|
73
72
|
recordState = RecordState.END;
|
74
|
-
return
|
73
|
+
return skippedLine;
|
75
74
|
}
|
76
75
|
|
77
76
|
public boolean nextFile()
|
data/lib/embulk/schema.rb
CHANGED
@@ -33,21 +33,25 @@ module Embulk
|
|
33
33
|
record_writer_script = "lambda do |builder,record|\n"
|
34
34
|
record_writer_script << "java_timestamp_class = ::Embulk::Java::Timestamp\n"
|
35
35
|
each do |column|
|
36
|
-
|
36
|
+
idx = column.index
|
37
|
+
column_script = "if record[#{idx}].nil?\n" <<
|
38
|
+
"builder.setNull(#{idx})\n" <<
|
39
|
+
"else\n" <<
|
37
40
|
case column.type
|
38
41
|
when :boolean
|
39
|
-
"builder.setBoolean(#{
|
42
|
+
"builder.setBoolean(#{idx}, record[#{idx}])"
|
40
43
|
when :long
|
41
|
-
"builder.setLong(#{
|
44
|
+
"builder.setLong(#{idx}, record[#{idx}])"
|
42
45
|
when :double
|
43
|
-
"builder.setDouble(#{
|
46
|
+
"builder.setDouble(#{idx}, record[#{idx}])"
|
44
47
|
when :string
|
45
|
-
"builder.setString(#{
|
48
|
+
"builder.setString(#{idx}, record[#{idx}])"
|
46
49
|
when :timestamp
|
47
|
-
"builder.setTimestamp(#{
|
50
|
+
"builder.setTimestamp(#{idx}, java_timestamp_class.fromRubyTime(record[#{idx}]))"
|
48
51
|
else
|
49
52
|
raise "Unknown type #{column.type.inspect}"
|
50
|
-
end
|
53
|
+
end <<
|
54
|
+
"end\n"
|
51
55
|
record_writer_script << column_script << "\n"
|
52
56
|
end
|
53
57
|
record_writer_script << "builder.addRecord\n"
|
data/lib/embulk/version.rb
CHANGED
data/pom.xml
CHANGED
@@ -4,7 +4,7 @@
|
|
4
4
|
|
5
5
|
<groupId>org.embulk</groupId>
|
6
6
|
<artifactId>embulk-parent</artifactId>
|
7
|
-
<version>0.2.
|
7
|
+
<version>0.2.1-SNAPSHOT</version>
|
8
8
|
<packaging>pom</packaging>
|
9
9
|
|
10
10
|
<name>Embulk</name>
|
@@ -222,6 +222,14 @@
|
|
222
222
|
<version>1.9.5</version>
|
223
223
|
<scope>test</scope>
|
224
224
|
</dependency>
|
225
|
+
|
226
|
+
<dependency>
|
227
|
+
<groupId>com.google.code.findbugs</groupId>
|
228
|
+
<artifactId>annotations</artifactId>
|
229
|
+
<version>3.0.0</version>
|
230
|
+
<scope>compile</scope>
|
231
|
+
</dependency>
|
232
|
+
|
225
233
|
</dependencies>
|
226
234
|
</dependencyManagement>
|
227
235
|
|
@@ -410,7 +418,7 @@
|
|
410
418
|
<plugin>
|
411
419
|
<groupId>org.codehaus.mojo</groupId>
|
412
420
|
<artifactId>findbugs-maven-plugin</artifactId>
|
413
|
-
<version>
|
421
|
+
<version>3.0.0</version>
|
414
422
|
<configuration>
|
415
423
|
<skip>${project.check.skip-findbugs}</skip>
|
416
424
|
<jvmArgs>-Xmx${project.build.jvmsize}</jvmArgs>
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: embulk
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Sadayuki Furuhashi
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-01-
|
11
|
+
date: 2015-01-29 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -113,6 +113,7 @@ files:
|
|
113
113
|
- Rakefile
|
114
114
|
- bin/embulk
|
115
115
|
- build.gradle
|
116
|
+
- classpath/annotations-3.0.0.jar
|
116
117
|
- classpath/aopalliance-1.0.jar
|
117
118
|
- classpath/aws-java-sdk-1.5.2.jar
|
118
119
|
- classpath/bval-core-0.5.jar
|
@@ -121,9 +122,9 @@ files:
|
|
121
122
|
- classpath/commons-codec-1.3.jar
|
122
123
|
- classpath/commons-lang3-3.1.jar
|
123
124
|
- classpath/commons-logging-1.2.jar
|
124
|
-
- classpath/embulk-cli-0.2.
|
125
|
-
- classpath/embulk-core-0.2.
|
126
|
-
- classpath/embulk-standards-0.2.
|
125
|
+
- classpath/embulk-cli-0.2.1-SNAPSHOT.jar
|
126
|
+
- classpath/embulk-core-0.2.1-SNAPSHOT.jar
|
127
|
+
- classpath/embulk-standards-0.2.1-SNAPSHOT.jar
|
127
128
|
- classpath/guava-17.0.jar
|
128
129
|
- classpath/guice-3.0.jar
|
129
130
|
- classpath/guice-multibindings-3.0.jar
|
@@ -305,8 +306,6 @@ files:
|
|
305
306
|
- embulk-standards/src/test/java/org/embulk/standards/TestCsvTokenizer.java
|
306
307
|
- embulk-standards/src/test/java/org/embulk/standards/TestS3FileInputPlugin.java
|
307
308
|
- embulk.gemspec
|
308
|
-
- examples/config.yml
|
309
|
-
- examples/csv/sample.csv.gz
|
310
309
|
- gradle/wrapper/gradle-wrapper.jar
|
311
310
|
- gradle/wrapper/gradle-wrapper.properties
|
312
311
|
- gradlew
|
data/examples/config.yml
DELETED
@@ -1,34 +0,0 @@
|
|
1
|
-
#exec:
|
2
|
-
# transaction_time: 2015-01-17 00:00:00 UTC
|
3
|
-
# transaction_timezone: UTC
|
4
|
-
in:
|
5
|
-
type: file
|
6
|
-
paths:
|
7
|
-
- examples/csv/
|
8
|
-
#parser:
|
9
|
-
# type: csv
|
10
|
-
# newline: LF
|
11
|
-
# columns:
|
12
|
-
# - {name: date_code, type: string}
|
13
|
-
# - {name: customer_code, type: long}
|
14
|
-
# - {name: product_code, type: string}
|
15
|
-
# - {name: employee_code, type: string}
|
16
|
-
# file_decoders:
|
17
|
-
# - type: gzip
|
18
|
-
out:
|
19
|
-
type: file
|
20
|
-
directory: tmp/
|
21
|
-
file_name: output
|
22
|
-
file_ext: csv
|
23
|
-
#encoders:
|
24
|
-
# - type: gzip
|
25
|
-
# level: 9
|
26
|
-
formatter:
|
27
|
-
type: csv
|
28
|
-
timezone: Etc/UTC
|
29
|
-
#in:
|
30
|
-
# type: file
|
31
|
-
# paths:
|
32
|
-
# - benchmark/
|
33
|
-
#out:
|
34
|
-
# type: "null"
|
data/examples/csv/sample.csv.gz
DELETED
Binary file
|