embulk 0.2.0 → 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- NDBmNzE4N2NiODA5MmM5NzBlMjFjMTg5ZDNkMjIxM2VmYzUzM2I5NA==
4
+ NGUwOTc0ZDE1MWZlZjJhYjdhNmJmMjQwZjliOWU3MmEyYmM5ZTczNQ==
5
5
  data.tar.gz: !binary |-
6
- ZmRmZDUwNTU0ZmE1NjNhZDc5NzQyMzEwMGM2ODBmNjM1ZTY5ZmY3NA==
6
+ ZjA1YTE5NDlhZGViMTU1NjVmOTBhZDVlZDY5NGZjODI0NGU5OGViZA==
7
7
  SHA512:
8
8
  metadata.gz: !binary |-
9
- Njk2ZDU4MDNkODFmZTJiNmEyNmNhNWIwOWYzZjViYWI0YmYxNmMwMjBhNDQ5
10
- ODMwY2YxOGI0ZWFiNjExNTdjMGE5ZmVlMDA2YWUwYTY2YWUxZDRjYTQ0OTY5
11
- NzI0MGVkODQwYWI1YjUzNzdhNmNkMmVhMGRmZjRlZWIxZmQxYzU=
9
+ NDY3NzQ1NTkxNTk5MzAzMGQ2ZmIzYjM0YjMyMTczOGM1YjhmYzFkYTg0YTY3
10
+ ZDBlNzdiYWIwZmVkMWU5YzA3NTEyYzA2ZGI3YjQyMTQ5ZDI2MWI3ZWEwZTM0
11
+ YjQ3MTllMTZkYzdlYTM2YjNlZDVjNGIwYjEwNDVhNjlmN2IxYTk=
12
12
  data.tar.gz: !binary |-
13
- N2Q3NTEwN2NlMDg5OGQ1NmRlNmVkMzcyNzRiMjBhODY3MDExODA1ZDI5YTYy
14
- ZmNjNTM3YjU5NGE4ZmM4NGFlNWE4ZmEwZWI2YTIzNjM3YmVmODE5MDc2OTQ1
15
- NWMxNzM1OGUzZTExNTY0ZmM0MTBmYWRmNDJhZTc4MjhlNTZkNmQ=
13
+ MGZjMzM1NmVhNzdhZDhjODg3ZWZiNGRmOWQwMTU5MzUwZmEwYTBkMDY1MTgz
14
+ NDBhOTAwM2Y3NDNjM2VlZTE1YjRkZjA4MWNiZjZjN2QzOTBjYTliMzJlYTgw
15
+ OGY2ZGZmMDJmMTI4ZWU1YjNmMTMxNTc5NDdjN2NiODkxYzQ4MmI=
data/ChangeLog CHANGED
@@ -1,4 +1,13 @@
1
1
 
2
+ 2015-01-29 version 0.2.1:
3
+
4
+ * Fixed LineEncoder#finish to flush all remaining buffer (reported by @aibou)
5
+ * Fixed NextConfig to be merged to in: or out: rather than the top-level
6
+ (reported by enukane) [#41]
7
+ * ./bin/embulk shows warns to run `rake` if ./classpath doesn't exist
8
+ * Embulk::PageBuilder#add accepts nil
9
+
10
+
2
11
  2015-01-26 version 0.2.0:
3
12
 
4
13
  * Changed JRuby InputPlugin API to use #run instead of .run
data/README.md CHANGED
@@ -4,14 +4,24 @@ A plugin-based parallel bulk data loader that makes painful data integration wor
4
4
 
5
5
  ## What's Embulk?
6
6
 
7
- TODO
7
+ Embulk is a plugin-based parallel bulk data loader that helps **data transfer** between various **storages**, **databases**, **NoSQL** and **cloud services**.
8
+
9
+ You can install input and output plugins to integrate many other file formats and storages.
10
+
11
+ You also can release plugins to share your efforts of data cleaning, error handling, transaction control, and retrying.
12
+ Packaging effrots into plugins **brings OSS-style development to the data scripts** which **was tend to be one-time adhoc scripts**.
13
+
14
+ [Embuk, an open-source plugin-based parallel bulk data loader](http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed) at Slideshare
15
+
16
+ [![Embulk](https://gist.githubusercontent.com/frsyuki/f322a77ee2766a508ba9/raw/e8539b6b4fda1b3357e8c79d3966aa8148dbdbd3/embulk-overview.png)](http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed/12)
17
+
8
18
 
9
19
  ## Quick Start
10
20
 
11
21
  The single-file package is the simplest way to try Embulk. You can download the latest embulk-VERSION.jar from [the releases page](https://bintray.com/embulk/maven/embulk/view#files) and run it with java:
12
22
 
13
23
  ```
14
- wget https://bintray.com/artifact/download/embulk/maven/embulk-0.2.0.jar -O embulk.jar
24
+ wget https://bintray.com/artifact/download/embulk/maven/embulk-0.2.1.jar -O embulk.jar
15
25
  java -jar embulk.jar --help
16
26
  ```
17
27
 
@@ -27,14 +37,14 @@ java -jar embulk.jar run config.yml
27
37
  ### Using plugins
28
38
 
29
39
  You can use plugins to load data from/to various systems and file formats.
30
- An example is [embulk-output-postgres-json]() plugin. It outputs data into PostgreSQL server using "json" column type.
40
+ An example is [embulk-output-postgres-json](https://github.com/frsyuki/embulk-plugin-postgres-json) plugin. It outputs data into PostgreSQL server using "json" column type.
31
41
 
32
42
  ```
33
43
  java -jar embulk.jar gem install embulk-output-postgres-json
34
44
  java -jar embulk.jar gem list
35
45
  ```
36
46
 
37
- You can search plugins on RubyGems: [search for "embulk-"](https://rubygems.org/search?utf8=%E2%9C%93&query=embulk-).
47
+ You can search plugins on RubyGems: [search for "embulk-plugin"](https://rubygems.org/search?utf8=%E2%9C%93&query=embulk-plugin).
38
48
 
39
49
  ### Using plugin bundle
40
50
 
data/bin/embulk CHANGED
@@ -37,20 +37,16 @@ unless java_cmd
37
37
  end
38
38
  end
39
39
 
40
- embulk_home = ENV['EMBULK_HOME']
41
- unless embulk_home
42
- embulk_home = File.dirname(File.dirname(__FILE__))
43
- end
44
- ENV['EMBULK_HOME'] = File.expand_path(embulk_home)
45
-
40
+ embulk_home = File.dirname(File.dirname(__FILE__))
46
41
  classpath_dir = File.join(embulk_home, 'classpath')
47
42
  lib_dir = File.join(embulk_home, 'lib')
48
43
 
49
- jruby_complete = Dir.entries(classpath_dir).find {|jar| jar =~ /jruby-complete-[\d\.]+\.jar/ }
44
+ jruby_complete = Dir.entries(classpath_dir).find {|jar| jar =~ /jruby-complete-[\d\.]+\.jar/ } rescue nil
50
45
  unless jruby_complete
51
- STDERR.puts "Could not find jruby-complete at $EMBULK_HOME/classpath directory."
52
- STDERR.puts "Please confirm EMBULK_HOME is correctly set"
53
- STDERR.puts "Current EMBULK_HOME = #{ENV['EMBULK_HOME'].to_s.dump}"
46
+ STDERR.puts "Could not find jruby-complete at #{embulk_home}/classpath directory."
47
+ if embulk_home == '.'
48
+ STDERR.puts "Did you run \`rake\`? You need to build java code and create ./classpath directory first."
49
+ end
54
50
  raise SystemExit.new(1)
55
51
  end
56
52
 
data/build.gradle CHANGED
@@ -23,7 +23,7 @@ allprojects {
23
23
  apply plugin: 'com.jfrog.bintray'
24
24
 
25
25
  group = 'org.embulk'
26
- version = '0.2.0'
26
+ version = '0.2.1'
27
27
 
28
28
  // to upload artifacts to Bintray by gradle-bintray-plugin
29
29
  // $ gradle bintrayUpload
@@ -92,6 +92,7 @@ subprojects {
92
92
  'org.slf4j:slf4j-api:1.7.9',
93
93
  'org.slf4j:slf4j-log4j12:1.7.9',
94
94
  'org.jruby:jruby-complete:1.7.16.1',
95
+ 'com.google.code.findbugs:annotations:3.0.0',
95
96
  'org.yaml:snakeyaml:1.14',
96
97
  'javax.validation:validation-api:1.1.0.Final',
97
98
  'org.apache.bval:bval-jsr303:0.5',
data/embulk-cli/pom.xml CHANGED
@@ -5,7 +5,7 @@
5
5
  <parent>
6
6
  <groupId>org.embulk</groupId>
7
7
  <artifactId>embulk-parent</artifactId>
8
- <version>0.2.0-SNAPSHOT</version>
8
+ <version>0.2.1-SNAPSHOT</version>
9
9
  </parent>
10
10
 
11
11
  <artifactId>embulk-cli</artifactId>
data/embulk-core/pom.xml CHANGED
@@ -5,7 +5,7 @@
5
5
  <parent>
6
6
  <groupId>org.embulk</groupId>
7
7
  <artifactId>embulk-parent</artifactId>
8
- <version>0.2.0-SNAPSHOT</version>
8
+ <version>0.2.1-SNAPSHOT</version>
9
9
  </parent>
10
10
 
11
11
  <artifactId>embulk-core</artifactId>
@@ -117,6 +117,11 @@
117
117
  <artifactId>jruby-complete</artifactId>
118
118
  </dependency>
119
119
 
120
+ <dependency>
121
+ <groupId>com.google.code.findbugs</groupId>
122
+ <artifactId>annotations</artifactId>
123
+ </dependency>
124
+
120
125
  <!-- for guess_charset plugin -->
121
126
  <dependency>
122
127
  <groupId>com.ibm.icu</groupId>
@@ -23,11 +23,11 @@ public class EmbulkService
23
23
  modules.add(new ExtensionServiceLoaderModule(systemConfig));
24
24
  modules.add(new BuiltinPluginSourceModule());
25
25
  modules.add(new JRubyScriptingModule(systemConfig));
26
- modules.addAll(getAdditionalModules());
26
+ modules.addAll(getAdditionalModules(systemConfig));
27
27
  injector = Guice.createInjector(modules.build());
28
28
  }
29
29
 
30
- protected Iterable<? extends Module> getAdditionalModules()
30
+ protected Iterable<? extends Module> getAdditionalModules(ConfigSource systemConfig)
31
31
  {
32
32
  return ImmutableList.of();
33
33
  }
@@ -109,7 +109,9 @@ public class LocalExecutor
109
109
  if (outputNextConfig == null) {
110
110
  outputNextConfig = Exec.newNextConfig();
111
111
  }
112
- NextConfig nextConfig = inputNextConfig.deepCopy().merge(outputNextConfig);
112
+ NextConfig nextConfig = Exec.newNextConfig();
113
+ nextConfig.getNestedOrSetEmpty("in").merge(inputNextConfig);
114
+ nextConfig.getNestedOrSetEmpty("out").merge(outputNextConfig);
113
115
  return new ExecuteResult(nextConfig);
114
116
  }
115
117
  }
@@ -2,6 +2,8 @@ package org.embulk.spi;
2
2
 
3
3
  import java.util.Arrays;
4
4
 
5
+ import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
6
+
5
7
  public class Buffer
6
8
  {
7
9
  public static final Buffer EMPTY = Buffer.allocate(0);
@@ -48,6 +50,7 @@ public class Buffer
48
50
  return new Buffer(src, offset, size).limit(size);
49
51
  }
50
52
 
53
+ @SuppressFBWarnings(value = "EI_EXPOSE_REP")
51
54
  public byte[] array()
52
55
  {
53
56
  return array;
@@ -13,6 +13,7 @@ import org.embulk.config.ConfigSource;
13
13
  *
14
14
  * An example extention to add a custom PluginSource will be as following:
15
15
  *
16
+ * <code>
16
17
  * class MyPluginSourceExtension
17
18
  * implements Extension, Module
18
19
  * {
@@ -22,19 +23,20 @@ import org.embulk.config.ConfigSource;
22
23
  * // ...
23
24
  * }
24
25
  *
25
- * @Override
26
+ * {@literal @}Override
26
27
  * public void configure(Binder binder)
27
28
  * {
28
- * Multibinder<PluginSource> multibinder = Multibinder.newSetBinder(binder, PluginSource.class);
29
+ * Multibinder&lt;PluginSource&gt; multibinder = Multibinder.newSetBinder(binder, PluginSource.class);
29
30
  * multibinder.addBinding().to(MyPluginSource.class);
30
31
  * }
31
32
  *
32
- * @Override
33
- * public List<Module> getModules()
33
+ * {@literal @}Override
34
+ * public List&lt;Module&gt; getModules()
34
35
  * {
35
- * return ImmutableList.<Module>of(this);
36
+ * return ImmutableList.&lt;Module&gt;of(this);
36
37
  * }
37
38
  * }
39
+ * </code>
38
40
  */
39
41
  public interface Extension
40
42
  {
@@ -1,5 +1,7 @@
1
1
  package org.embulk.spi.type;
2
2
 
3
+ import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
4
+
3
5
  public abstract class AbstractType
4
6
  implements Type
5
7
  {
@@ -32,6 +34,7 @@ public abstract class AbstractType
32
34
  return fixedStorageSize;
33
35
  }
34
36
 
37
+ @SuppressFBWarnings(value = "EQ_UNUSUAL")
35
38
  @Override
36
39
  public boolean equals(Object o)
37
40
  {
@@ -92,8 +92,13 @@ public class LineEncoder
92
92
 
93
93
  public void finish()
94
94
  {
95
- close(); // flush all remaining buffer in writer
96
- outputStream.finish();
95
+ try {
96
+ writer.flush(); // flush all remaining buffer in writer because FileOutputOutputStream.close() doesn't flush buffer
97
+ outputStream.finish();
98
+ writer.close();
99
+ } catch (IOException ex) {
100
+ throw new RuntimeException(ex);
101
+ }
97
102
  }
98
103
 
99
104
  @Override
@@ -5,7 +5,7 @@
5
5
  <parent>
6
6
  <groupId>org.embulk</groupId>
7
7
  <artifactId>embulk-parent</artifactId>
8
- <version>0.2.0-SNAPSHOT</version>
8
+ <version>0.2.1-SNAPSHOT</version>
9
9
  </parent>
10
10
 
11
11
  <artifactId>embulk-standards</artifactId>
@@ -205,11 +205,6 @@ public class CsvParserPlugin
205
205
  private static String nextColumn(Schema schema, CsvTokenizer tokenizer, String nullStringOrNull)
206
206
  {
207
207
  String v = tokenizer.nextColumn();
208
- if (v == null) {
209
- throw new RuntimeException(String.format("Expected %d columns but line %d has fewer number of columns",
210
- schema.getColumnCount(), tokenizer.getCurrentLineNumber()));
211
- }
212
-
213
208
  if (!v.isEmpty()) {
214
209
  if (v.equals(nullStringOrNull)) {
215
210
  return null;
@@ -5,7 +5,6 @@ import java.util.List;
5
5
  import java.util.ArrayList;
6
6
  import java.util.Deque;
7
7
  import java.util.ArrayDeque;
8
- import java.util.Iterator;
9
8
  import org.embulk.spi.util.LineDecoder;
10
9
 
11
10
  public class CsvTokenizer
@@ -71,7 +70,7 @@ public class CsvTokenizer
71
70
  quotedValueLines.clear();
72
71
  }
73
72
  recordState = RecordState.END;
74
- return line;
73
+ return skippedLine;
75
74
  }
76
75
 
77
76
  public boolean nextFile()
data/lib/embulk/schema.rb CHANGED
@@ -33,21 +33,25 @@ module Embulk
33
33
  record_writer_script = "lambda do |builder,record|\n"
34
34
  record_writer_script << "java_timestamp_class = ::Embulk::Java::Timestamp\n"
35
35
  each do |column|
36
- column_script =
36
+ idx = column.index
37
+ column_script = "if record[#{idx}].nil?\n" <<
38
+ "builder.setNull(#{idx})\n" <<
39
+ "else\n" <<
37
40
  case column.type
38
41
  when :boolean
39
- "builder.setBoolean(#{column.index}, record[#{column.index}])"
42
+ "builder.setBoolean(#{idx}, record[#{idx}])"
40
43
  when :long
41
- "builder.setLong(#{column.index}, record[#{column.index}])"
44
+ "builder.setLong(#{idx}, record[#{idx}])"
42
45
  when :double
43
- "builder.setDouble(#{column.index}, record[#{column.index}])"
46
+ "builder.setDouble(#{idx}, record[#{idx}])"
44
47
  when :string
45
- "builder.setString(#{column.index}, record[#{column.index}])"
48
+ "builder.setString(#{idx}, record[#{idx}])"
46
49
  when :timestamp
47
- "builder.setTimestamp(#{column.index}, java_timestamp_class.fromRubyTime(record[#{column.index}]))"
50
+ "builder.setTimestamp(#{idx}, java_timestamp_class.fromRubyTime(record[#{idx}]))"
48
51
  else
49
52
  raise "Unknown type #{column.type.inspect}"
50
- end
53
+ end <<
54
+ "end\n"
51
55
  record_writer_script << column_script << "\n"
52
56
  end
53
57
  record_writer_script << "builder.addRecord\n"
@@ -1,3 +1,3 @@
1
1
  module Embulk
2
- VERSION = "0.2.0"
2
+ VERSION = "0.2.1"
3
3
  end
data/pom.xml CHANGED
@@ -4,7 +4,7 @@
4
4
 
5
5
  <groupId>org.embulk</groupId>
6
6
  <artifactId>embulk-parent</artifactId>
7
- <version>0.2.0-SNAPSHOT</version>
7
+ <version>0.2.1-SNAPSHOT</version>
8
8
  <packaging>pom</packaging>
9
9
 
10
10
  <name>Embulk</name>
@@ -222,6 +222,14 @@
222
222
  <version>1.9.5</version>
223
223
  <scope>test</scope>
224
224
  </dependency>
225
+
226
+ <dependency>
227
+ <groupId>com.google.code.findbugs</groupId>
228
+ <artifactId>annotations</artifactId>
229
+ <version>3.0.0</version>
230
+ <scope>compile</scope>
231
+ </dependency>
232
+
225
233
  </dependencies>
226
234
  </dependencyManagement>
227
235
 
@@ -410,7 +418,7 @@
410
418
  <plugin>
411
419
  <groupId>org.codehaus.mojo</groupId>
412
420
  <artifactId>findbugs-maven-plugin</artifactId>
413
- <version>2.5.2</version>
421
+ <version>3.0.0</version>
414
422
  <configuration>
415
423
  <skip>${project.check.skip-findbugs}</skip>
416
424
  <jvmArgs>-Xmx${project.build.jvmsize}</jvmArgs>
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sadayuki Furuhashi
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-01-27 00:00:00.000000000 Z
11
+ date: 2015-01-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -113,6 +113,7 @@ files:
113
113
  - Rakefile
114
114
  - bin/embulk
115
115
  - build.gradle
116
+ - classpath/annotations-3.0.0.jar
116
117
  - classpath/aopalliance-1.0.jar
117
118
  - classpath/aws-java-sdk-1.5.2.jar
118
119
  - classpath/bval-core-0.5.jar
@@ -121,9 +122,9 @@ files:
121
122
  - classpath/commons-codec-1.3.jar
122
123
  - classpath/commons-lang3-3.1.jar
123
124
  - classpath/commons-logging-1.2.jar
124
- - classpath/embulk-cli-0.2.0-SNAPSHOT.jar
125
- - classpath/embulk-core-0.2.0-SNAPSHOT.jar
126
- - classpath/embulk-standards-0.2.0-SNAPSHOT.jar
125
+ - classpath/embulk-cli-0.2.1-SNAPSHOT.jar
126
+ - classpath/embulk-core-0.2.1-SNAPSHOT.jar
127
+ - classpath/embulk-standards-0.2.1-SNAPSHOT.jar
127
128
  - classpath/guava-17.0.jar
128
129
  - classpath/guice-3.0.jar
129
130
  - classpath/guice-multibindings-3.0.jar
@@ -305,8 +306,6 @@ files:
305
306
  - embulk-standards/src/test/java/org/embulk/standards/TestCsvTokenizer.java
306
307
  - embulk-standards/src/test/java/org/embulk/standards/TestS3FileInputPlugin.java
307
308
  - embulk.gemspec
308
- - examples/config.yml
309
- - examples/csv/sample.csv.gz
310
309
  - gradle/wrapper/gradle-wrapper.jar
311
310
  - gradle/wrapper/gradle-wrapper.properties
312
311
  - gradlew
data/examples/config.yml DELETED
@@ -1,34 +0,0 @@
1
- #exec:
2
- # transaction_time: 2015-01-17 00:00:00 UTC
3
- # transaction_timezone: UTC
4
- in:
5
- type: file
6
- paths:
7
- - examples/csv/
8
- #parser:
9
- # type: csv
10
- # newline: LF
11
- # columns:
12
- # - {name: date_code, type: string}
13
- # - {name: customer_code, type: long}
14
- # - {name: product_code, type: string}
15
- # - {name: employee_code, type: string}
16
- # file_decoders:
17
- # - type: gzip
18
- out:
19
- type: file
20
- directory: tmp/
21
- file_name: output
22
- file_ext: csv
23
- #encoders:
24
- # - type: gzip
25
- # level: 9
26
- formatter:
27
- type: csv
28
- timezone: Etc/UTC
29
- #in:
30
- # type: file
31
- # paths:
32
- # - benchmark/
33
- #out:
34
- # type: "null"
Binary file