embulk 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- NDBmNzE4N2NiODA5MmM5NzBlMjFjMTg5ZDNkMjIxM2VmYzUzM2I5NA==
4
+ NGUwOTc0ZDE1MWZlZjJhYjdhNmJmMjQwZjliOWU3MmEyYmM5ZTczNQ==
5
5
  data.tar.gz: !binary |-
6
- ZmRmZDUwNTU0ZmE1NjNhZDc5NzQyMzEwMGM2ODBmNjM1ZTY5ZmY3NA==
6
+ ZjA1YTE5NDlhZGViMTU1NjVmOTBhZDVlZDY5NGZjODI0NGU5OGViZA==
7
7
  SHA512:
8
8
  metadata.gz: !binary |-
9
- Njk2ZDU4MDNkODFmZTJiNmEyNmNhNWIwOWYzZjViYWI0YmYxNmMwMjBhNDQ5
10
- ODMwY2YxOGI0ZWFiNjExNTdjMGE5ZmVlMDA2YWUwYTY2YWUxZDRjYTQ0OTY5
11
- NzI0MGVkODQwYWI1YjUzNzdhNmNkMmVhMGRmZjRlZWIxZmQxYzU=
9
+ NDY3NzQ1NTkxNTk5MzAzMGQ2ZmIzYjM0YjMyMTczOGM1YjhmYzFkYTg0YTY3
10
+ ZDBlNzdiYWIwZmVkMWU5YzA3NTEyYzA2ZGI3YjQyMTQ5ZDI2MWI3ZWEwZTM0
11
+ YjQ3MTllMTZkYzdlYTM2YjNlZDVjNGIwYjEwNDVhNjlmN2IxYTk=
12
12
  data.tar.gz: !binary |-
13
- N2Q3NTEwN2NlMDg5OGQ1NmRlNmVkMzcyNzRiMjBhODY3MDExODA1ZDI5YTYy
14
- ZmNjNTM3YjU5NGE4ZmM4NGFlNWE4ZmEwZWI2YTIzNjM3YmVmODE5MDc2OTQ1
15
- NWMxNzM1OGUzZTExNTY0ZmM0MTBmYWRmNDJhZTc4MjhlNTZkNmQ=
13
+ MGZjMzM1NmVhNzdhZDhjODg3ZWZiNGRmOWQwMTU5MzUwZmEwYTBkMDY1MTgz
14
+ NDBhOTAwM2Y3NDNjM2VlZTE1YjRkZjA4MWNiZjZjN2QzOTBjYTliMzJlYTgw
15
+ OGY2ZGZmMDJmMTI4ZWU1YjNmMTMxNTc5NDdjN2NiODkxYzQ4MmI=
data/ChangeLog CHANGED
@@ -1,4 +1,13 @@
1
1
 
2
+ 2015-01-29 version 0.2.1:
3
+
4
+ * Fixed LineEncoder#finish to flush all remaining buffer (reported by @aibou)
5
+ * Fixed NextConfig to be merged to in: or out: rather than the top-level
6
+ (reported by enukane) [#41]
7
+ * ./bin/embulk shows warns to run `rake` if ./classpath doesn't exist
8
+ * Embulk::PageBuilder#add accepts nil
9
+
10
+
2
11
  2015-01-26 version 0.2.0:
3
12
 
4
13
  * Changed JRuby InputPlugin API to use #run instead of .run
data/README.md CHANGED
@@ -4,14 +4,24 @@ A plugin-based parallel bulk data loader that makes painful data integration wor
4
4
 
5
5
  ## What's Embulk?
6
6
 
7
- TODO
7
+ Embulk is a plugin-based parallel bulk data loader that helps **data transfer** between various **storages**, **databases**, **NoSQL** and **cloud services**.
8
+
9
+ You can install input and output plugins to integrate many other file formats and storages.
10
+
11
+ You also can release plugins to share your efforts of data cleaning, error handling, transaction control, and retrying.
12
+ Packaging effrots into plugins **brings OSS-style development to the data scripts** which **was tend to be one-time adhoc scripts**.
13
+
14
+ [Embuk, an open-source plugin-based parallel bulk data loader](http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed) at Slideshare
15
+
16
+ [![Embulk](https://gist.githubusercontent.com/frsyuki/f322a77ee2766a508ba9/raw/e8539b6b4fda1b3357e8c79d3966aa8148dbdbd3/embulk-overview.png)](http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed/12)
17
+
8
18
 
9
19
  ## Quick Start
10
20
 
11
21
  The single-file package is the simplest way to try Embulk. You can download the latest embulk-VERSION.jar from [the releases page](https://bintray.com/embulk/maven/embulk/view#files) and run it with java:
12
22
 
13
23
  ```
14
- wget https://bintray.com/artifact/download/embulk/maven/embulk-0.2.0.jar -O embulk.jar
24
+ wget https://bintray.com/artifact/download/embulk/maven/embulk-0.2.1.jar -O embulk.jar
15
25
  java -jar embulk.jar --help
16
26
  ```
17
27
 
@@ -27,14 +37,14 @@ java -jar embulk.jar run config.yml
27
37
  ### Using plugins
28
38
 
29
39
  You can use plugins to load data from/to various systems and file formats.
30
- An example is [embulk-output-postgres-json]() plugin. It outputs data into PostgreSQL server using "json" column type.
40
+ An example is [embulk-output-postgres-json](https://github.com/frsyuki/embulk-plugin-postgres-json) plugin. It outputs data into PostgreSQL server using "json" column type.
31
41
 
32
42
  ```
33
43
  java -jar embulk.jar gem install embulk-output-postgres-json
34
44
  java -jar embulk.jar gem list
35
45
  ```
36
46
 
37
- You can search plugins on RubyGems: [search for "embulk-"](https://rubygems.org/search?utf8=%E2%9C%93&query=embulk-).
47
+ You can search plugins on RubyGems: [search for "embulk-plugin"](https://rubygems.org/search?utf8=%E2%9C%93&query=embulk-plugin).
38
48
 
39
49
  ### Using plugin bundle
40
50
 
data/bin/embulk CHANGED
@@ -37,20 +37,16 @@ unless java_cmd
37
37
  end
38
38
  end
39
39
 
40
- embulk_home = ENV['EMBULK_HOME']
41
- unless embulk_home
42
- embulk_home = File.dirname(File.dirname(__FILE__))
43
- end
44
- ENV['EMBULK_HOME'] = File.expand_path(embulk_home)
45
-
40
+ embulk_home = File.dirname(File.dirname(__FILE__))
46
41
  classpath_dir = File.join(embulk_home, 'classpath')
47
42
  lib_dir = File.join(embulk_home, 'lib')
48
43
 
49
- jruby_complete = Dir.entries(classpath_dir).find {|jar| jar =~ /jruby-complete-[\d\.]+\.jar/ }
44
+ jruby_complete = Dir.entries(classpath_dir).find {|jar| jar =~ /jruby-complete-[\d\.]+\.jar/ } rescue nil
50
45
  unless jruby_complete
51
- STDERR.puts "Could not find jruby-complete at $EMBULK_HOME/classpath directory."
52
- STDERR.puts "Please confirm EMBULK_HOME is correctly set"
53
- STDERR.puts "Current EMBULK_HOME = #{ENV['EMBULK_HOME'].to_s.dump}"
46
+ STDERR.puts "Could not find jruby-complete at #{embulk_home}/classpath directory."
47
+ if embulk_home == '.'
48
+ STDERR.puts "Did you run \`rake\`? You need to build java code and create ./classpath directory first."
49
+ end
54
50
  raise SystemExit.new(1)
55
51
  end
56
52
 
data/build.gradle CHANGED
@@ -23,7 +23,7 @@ allprojects {
23
23
  apply plugin: 'com.jfrog.bintray'
24
24
 
25
25
  group = 'org.embulk'
26
- version = '0.2.0'
26
+ version = '0.2.1'
27
27
 
28
28
  // to upload artifacts to Bintray by gradle-bintray-plugin
29
29
  // $ gradle bintrayUpload
@@ -92,6 +92,7 @@ subprojects {
92
92
  'org.slf4j:slf4j-api:1.7.9',
93
93
  'org.slf4j:slf4j-log4j12:1.7.9',
94
94
  'org.jruby:jruby-complete:1.7.16.1',
95
+ 'com.google.code.findbugs:annotations:3.0.0',
95
96
  'org.yaml:snakeyaml:1.14',
96
97
  'javax.validation:validation-api:1.1.0.Final',
97
98
  'org.apache.bval:bval-jsr303:0.5',
data/embulk-cli/pom.xml CHANGED
@@ -5,7 +5,7 @@
5
5
  <parent>
6
6
  <groupId>org.embulk</groupId>
7
7
  <artifactId>embulk-parent</artifactId>
8
- <version>0.2.0-SNAPSHOT</version>
8
+ <version>0.2.1-SNAPSHOT</version>
9
9
  </parent>
10
10
 
11
11
  <artifactId>embulk-cli</artifactId>
data/embulk-core/pom.xml CHANGED
@@ -5,7 +5,7 @@
5
5
  <parent>
6
6
  <groupId>org.embulk</groupId>
7
7
  <artifactId>embulk-parent</artifactId>
8
- <version>0.2.0-SNAPSHOT</version>
8
+ <version>0.2.1-SNAPSHOT</version>
9
9
  </parent>
10
10
 
11
11
  <artifactId>embulk-core</artifactId>
@@ -117,6 +117,11 @@
117
117
  <artifactId>jruby-complete</artifactId>
118
118
  </dependency>
119
119
 
120
+ <dependency>
121
+ <groupId>com.google.code.findbugs</groupId>
122
+ <artifactId>annotations</artifactId>
123
+ </dependency>
124
+
120
125
  <!-- for guess_charset plugin -->
121
126
  <dependency>
122
127
  <groupId>com.ibm.icu</groupId>
@@ -23,11 +23,11 @@ public class EmbulkService
23
23
  modules.add(new ExtensionServiceLoaderModule(systemConfig));
24
24
  modules.add(new BuiltinPluginSourceModule());
25
25
  modules.add(new JRubyScriptingModule(systemConfig));
26
- modules.addAll(getAdditionalModules());
26
+ modules.addAll(getAdditionalModules(systemConfig));
27
27
  injector = Guice.createInjector(modules.build());
28
28
  }
29
29
 
30
- protected Iterable<? extends Module> getAdditionalModules()
30
+ protected Iterable<? extends Module> getAdditionalModules(ConfigSource systemConfig)
31
31
  {
32
32
  return ImmutableList.of();
33
33
  }
@@ -109,7 +109,9 @@ public class LocalExecutor
109
109
  if (outputNextConfig == null) {
110
110
  outputNextConfig = Exec.newNextConfig();
111
111
  }
112
- NextConfig nextConfig = inputNextConfig.deepCopy().merge(outputNextConfig);
112
+ NextConfig nextConfig = Exec.newNextConfig();
113
+ nextConfig.getNestedOrSetEmpty("in").merge(inputNextConfig);
114
+ nextConfig.getNestedOrSetEmpty("out").merge(outputNextConfig);
113
115
  return new ExecuteResult(nextConfig);
114
116
  }
115
117
  }
@@ -2,6 +2,8 @@ package org.embulk.spi;
2
2
 
3
3
  import java.util.Arrays;
4
4
 
5
+ import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
6
+
5
7
  public class Buffer
6
8
  {
7
9
  public static final Buffer EMPTY = Buffer.allocate(0);
@@ -48,6 +50,7 @@ public class Buffer
48
50
  return new Buffer(src, offset, size).limit(size);
49
51
  }
50
52
 
53
+ @SuppressFBWarnings(value = "EI_EXPOSE_REP")
51
54
  public byte[] array()
52
55
  {
53
56
  return array;
@@ -13,6 +13,7 @@ import org.embulk.config.ConfigSource;
13
13
  *
14
14
  * An example extention to add a custom PluginSource will be as following:
15
15
  *
16
+ * <code>
16
17
  * class MyPluginSourceExtension
17
18
  * implements Extension, Module
18
19
  * {
@@ -22,19 +23,20 @@ import org.embulk.config.ConfigSource;
22
23
  * // ...
23
24
  * }
24
25
  *
25
- * @Override
26
+ * {@literal @}Override
26
27
  * public void configure(Binder binder)
27
28
  * {
28
- * Multibinder<PluginSource> multibinder = Multibinder.newSetBinder(binder, PluginSource.class);
29
+ * Multibinder&lt;PluginSource&gt; multibinder = Multibinder.newSetBinder(binder, PluginSource.class);
29
30
  * multibinder.addBinding().to(MyPluginSource.class);
30
31
  * }
31
32
  *
32
- * @Override
33
- * public List<Module> getModules()
33
+ * {@literal @}Override
34
+ * public List&lt;Module&gt; getModules()
34
35
  * {
35
- * return ImmutableList.<Module>of(this);
36
+ * return ImmutableList.&lt;Module&gt;of(this);
36
37
  * }
37
38
  * }
39
+ * </code>
38
40
  */
39
41
  public interface Extension
40
42
  {
@@ -1,5 +1,7 @@
1
1
  package org.embulk.spi.type;
2
2
 
3
+ import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
4
+
3
5
  public abstract class AbstractType
4
6
  implements Type
5
7
  {
@@ -32,6 +34,7 @@ public abstract class AbstractType
32
34
  return fixedStorageSize;
33
35
  }
34
36
 
37
+ @SuppressFBWarnings(value = "EQ_UNUSUAL")
35
38
  @Override
36
39
  public boolean equals(Object o)
37
40
  {
@@ -92,8 +92,13 @@ public class LineEncoder
92
92
 
93
93
  public void finish()
94
94
  {
95
- close(); // flush all remaining buffer in writer
96
- outputStream.finish();
95
+ try {
96
+ writer.flush(); // flush all remaining buffer in writer because FileOutputOutputStream.close() doesn't flush buffer
97
+ outputStream.finish();
98
+ writer.close();
99
+ } catch (IOException ex) {
100
+ throw new RuntimeException(ex);
101
+ }
97
102
  }
98
103
 
99
104
  @Override
@@ -5,7 +5,7 @@
5
5
  <parent>
6
6
  <groupId>org.embulk</groupId>
7
7
  <artifactId>embulk-parent</artifactId>
8
- <version>0.2.0-SNAPSHOT</version>
8
+ <version>0.2.1-SNAPSHOT</version>
9
9
  </parent>
10
10
 
11
11
  <artifactId>embulk-standards</artifactId>
@@ -205,11 +205,6 @@ public class CsvParserPlugin
205
205
  private static String nextColumn(Schema schema, CsvTokenizer tokenizer, String nullStringOrNull)
206
206
  {
207
207
  String v = tokenizer.nextColumn();
208
- if (v == null) {
209
- throw new RuntimeException(String.format("Expected %d columns but line %d has fewer number of columns",
210
- schema.getColumnCount(), tokenizer.getCurrentLineNumber()));
211
- }
212
-
213
208
  if (!v.isEmpty()) {
214
209
  if (v.equals(nullStringOrNull)) {
215
210
  return null;
@@ -5,7 +5,6 @@ import java.util.List;
5
5
  import java.util.ArrayList;
6
6
  import java.util.Deque;
7
7
  import java.util.ArrayDeque;
8
- import java.util.Iterator;
9
8
  import org.embulk.spi.util.LineDecoder;
10
9
 
11
10
  public class CsvTokenizer
@@ -71,7 +70,7 @@ public class CsvTokenizer
71
70
  quotedValueLines.clear();
72
71
  }
73
72
  recordState = RecordState.END;
74
- return line;
73
+ return skippedLine;
75
74
  }
76
75
 
77
76
  public boolean nextFile()
data/lib/embulk/schema.rb CHANGED
@@ -33,21 +33,25 @@ module Embulk
33
33
  record_writer_script = "lambda do |builder,record|\n"
34
34
  record_writer_script << "java_timestamp_class = ::Embulk::Java::Timestamp\n"
35
35
  each do |column|
36
- column_script =
36
+ idx = column.index
37
+ column_script = "if record[#{idx}].nil?\n" <<
38
+ "builder.setNull(#{idx})\n" <<
39
+ "else\n" <<
37
40
  case column.type
38
41
  when :boolean
39
- "builder.setBoolean(#{column.index}, record[#{column.index}])"
42
+ "builder.setBoolean(#{idx}, record[#{idx}])"
40
43
  when :long
41
- "builder.setLong(#{column.index}, record[#{column.index}])"
44
+ "builder.setLong(#{idx}, record[#{idx}])"
42
45
  when :double
43
- "builder.setDouble(#{column.index}, record[#{column.index}])"
46
+ "builder.setDouble(#{idx}, record[#{idx}])"
44
47
  when :string
45
- "builder.setString(#{column.index}, record[#{column.index}])"
48
+ "builder.setString(#{idx}, record[#{idx}])"
46
49
  when :timestamp
47
- "builder.setTimestamp(#{column.index}, java_timestamp_class.fromRubyTime(record[#{column.index}]))"
50
+ "builder.setTimestamp(#{idx}, java_timestamp_class.fromRubyTime(record[#{idx}]))"
48
51
  else
49
52
  raise "Unknown type #{column.type.inspect}"
50
- end
53
+ end <<
54
+ "end\n"
51
55
  record_writer_script << column_script << "\n"
52
56
  end
53
57
  record_writer_script << "builder.addRecord\n"
@@ -1,3 +1,3 @@
1
1
  module Embulk
2
- VERSION = "0.2.0"
2
+ VERSION = "0.2.1"
3
3
  end
data/pom.xml CHANGED
@@ -4,7 +4,7 @@
4
4
 
5
5
  <groupId>org.embulk</groupId>
6
6
  <artifactId>embulk-parent</artifactId>
7
- <version>0.2.0-SNAPSHOT</version>
7
+ <version>0.2.1-SNAPSHOT</version>
8
8
  <packaging>pom</packaging>
9
9
 
10
10
  <name>Embulk</name>
@@ -222,6 +222,14 @@
222
222
  <version>1.9.5</version>
223
223
  <scope>test</scope>
224
224
  </dependency>
225
+
226
+ <dependency>
227
+ <groupId>com.google.code.findbugs</groupId>
228
+ <artifactId>annotations</artifactId>
229
+ <version>3.0.0</version>
230
+ <scope>compile</scope>
231
+ </dependency>
232
+
225
233
  </dependencies>
226
234
  </dependencyManagement>
227
235
 
@@ -410,7 +418,7 @@
410
418
  <plugin>
411
419
  <groupId>org.codehaus.mojo</groupId>
412
420
  <artifactId>findbugs-maven-plugin</artifactId>
413
- <version>2.5.2</version>
421
+ <version>3.0.0</version>
414
422
  <configuration>
415
423
  <skip>${project.check.skip-findbugs}</skip>
416
424
  <jvmArgs>-Xmx${project.build.jvmsize}</jvmArgs>
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sadayuki Furuhashi
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-01-27 00:00:00.000000000 Z
11
+ date: 2015-01-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -113,6 +113,7 @@ files:
113
113
  - Rakefile
114
114
  - bin/embulk
115
115
  - build.gradle
116
+ - classpath/annotations-3.0.0.jar
116
117
  - classpath/aopalliance-1.0.jar
117
118
  - classpath/aws-java-sdk-1.5.2.jar
118
119
  - classpath/bval-core-0.5.jar
@@ -121,9 +122,9 @@ files:
121
122
  - classpath/commons-codec-1.3.jar
122
123
  - classpath/commons-lang3-3.1.jar
123
124
  - classpath/commons-logging-1.2.jar
124
- - classpath/embulk-cli-0.2.0-SNAPSHOT.jar
125
- - classpath/embulk-core-0.2.0-SNAPSHOT.jar
126
- - classpath/embulk-standards-0.2.0-SNAPSHOT.jar
125
+ - classpath/embulk-cli-0.2.1-SNAPSHOT.jar
126
+ - classpath/embulk-core-0.2.1-SNAPSHOT.jar
127
+ - classpath/embulk-standards-0.2.1-SNAPSHOT.jar
127
128
  - classpath/guava-17.0.jar
128
129
  - classpath/guice-3.0.jar
129
130
  - classpath/guice-multibindings-3.0.jar
@@ -305,8 +306,6 @@ files:
305
306
  - embulk-standards/src/test/java/org/embulk/standards/TestCsvTokenizer.java
306
307
  - embulk-standards/src/test/java/org/embulk/standards/TestS3FileInputPlugin.java
307
308
  - embulk.gemspec
308
- - examples/config.yml
309
- - examples/csv/sample.csv.gz
310
309
  - gradle/wrapper/gradle-wrapper.jar
311
310
  - gradle/wrapper/gradle-wrapper.properties
312
311
  - gradlew
data/examples/config.yml DELETED
@@ -1,34 +0,0 @@
1
- #exec:
2
- # transaction_time: 2015-01-17 00:00:00 UTC
3
- # transaction_timezone: UTC
4
- in:
5
- type: file
6
- paths:
7
- - examples/csv/
8
- #parser:
9
- # type: csv
10
- # newline: LF
11
- # columns:
12
- # - {name: date_code, type: string}
13
- # - {name: customer_code, type: long}
14
- # - {name: product_code, type: string}
15
- # - {name: employee_code, type: string}
16
- # file_decoders:
17
- # - type: gzip
18
- out:
19
- type: file
20
- directory: tmp/
21
- file_name: output
22
- file_ext: csv
23
- #encoders:
24
- # - type: gzip
25
- # level: 9
26
- formatter:
27
- type: csv
28
- timezone: Etc/UTC
29
- #in:
30
- # type: file
31
- # paths:
32
- # - benchmark/
33
- #out:
34
- # type: "null"
Binary file