embulk 0.4.5 → 0.4.6

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 57cffd859609745fafc0e644e0e8466c0511c622
4
- data.tar.gz: 5df593df4e75dce977ac7ab295f9fe089ababa1a
3
+ metadata.gz: b02e8762879a557c331e6fba758a1be4bd2d0575
4
+ data.tar.gz: 1d75577656dbf789d30674e5e689abbb2048f28e
5
5
  SHA512:
6
- metadata.gz: 2c613261f18c551e6eb1ea9f748148422054d0818a76d8367de8eb68a071445686e871ce304d1413b10eca8570d220b4d53dbd6f3b7f060b008ce3c500903fe9
7
- data.tar.gz: 0e6c8086818d6f7d81ac8009ee5fbf76d0aa0a34fc962af60f3647d189ca488cd3c4aa09c8b49f62b3551fcd9b4ad42e2582b69f6a56f2c78f9f6e06f6986439
6
+ metadata.gz: b3e674c5d0e1a9fa3a09b516339cba7a7f587ff70521287e11c64caa0ec8af1d4a229b88bb168e3d10135c1e574e34c40c264587ab4c3c169e194323ef2040be
7
+ data.tar.gz: d51979854aa840ff03cbbf0ee118a93aae2066f06485503f3451d31486f222e56203ac11e819e2fa42238a9914b3023a276d0e6a2dbe5010367da837fc504938
@@ -1,33 +1,26 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- embulk (0.1.0)
4
+ embulk (0.4.5)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
- diff-lcs (1.2.5)
10
- json (1.8.1)
11
9
  kramdown (1.5.0)
10
+ power_assert (0.2.2)
12
11
  rake (10.4.2)
13
- rspec (2.99.0)
14
- rspec-core (~> 2.99.0)
15
- rspec-expectations (~> 2.99.0)
16
- rspec-mocks (~> 2.99.0)
17
- rspec-core (2.99.2)
18
- rspec-expectations (2.99.2)
19
- diff-lcs (>= 1.1.3, < 2.0)
20
- rspec-mocks (2.99.2)
12
+ test-unit (3.0.9)
13
+ power_assert
21
14
  yard (0.8.7.6)
22
15
 
23
16
  PLATFORMS
17
+ java
24
18
  ruby
25
19
 
26
20
  DEPENDENCIES
27
21
  bundler (>= 1.0)
28
22
  embulk!
29
- json (~> 1.7)
30
23
  kramdown (~> 1.5.0)
31
24
  rake (>= 0.10.0)
32
- rspec (~> 2.11)
25
+ test-unit (~> 3.0.9)
33
26
  yard (~> 0.8.7)
data/README.md CHANGED
@@ -21,20 +21,36 @@ You can release plugins to share your efforts of data cleaning, error handling,
21
21
 
22
22
  ## Quick Start
23
23
 
24
- The single-file package is the simplest way to try Embulk. You can download the latest embulk-VERSION.jar from [the releases page](https://bintray.com/embulk/maven/embulk/view#files) and run it with java:
24
+ The single-file package is the simplest way to try Embulk. You can download the latest embulk-VERSION.jar from [the releases page](https://bintray.com/embulk/maven/embulk/view#files) and run it with java.
25
+
26
+ ### Linux & Mac & BSD
27
+
28
+ Following 4 commans install embulk to your home directory:
29
+
30
+ ```
31
+ curl --create-dirs -o ~/.embulk/bin/embulk -L https://bintray.com/artifact/download/embulk/maven/embulk-0.4.6.jar
32
+ chmod +x ~/.embulk/bin/embulk
33
+ echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrc
34
+ source ~/.bashrc
35
+ ```
36
+
37
+ ### Windows
38
+
39
+ You can assume the jar file is a .bat file.
25
40
 
26
41
  ```
27
- wget https://bintray.com/artifact/download/embulk/maven/embulk-0.4.5.jar -O embulk.jar
28
- java -jar embulk.jar --help
42
+ curl -o embulk.bat -L https://bintray.com/artifact/download/embulk/maven/embulk-0.4.5.jar
29
43
  ```
30
44
 
45
+ ### Trying examples
46
+
31
47
  Let's load a CSV file, for example. `embulk example` subcommand generates a csv file and config file for you.
32
48
 
33
49
  ```
34
- java -jar embulk.jar example ./try1
35
- java -jar embulk.jar guess ./try1/example.yml -o config.yml
36
- java -jar embulk.jar preview config.yml
37
- java -jar embulk.jar run config.yml
50
+ embulk example ./try1
51
+ embulk guess ./try1/example.yml -o config.yml
52
+ embulk preview config.yml
53
+ embulk run config.yml
38
54
  ```
39
55
 
40
56
  ### Using plugins
@@ -43,8 +59,8 @@ You can use plugins to load data from/to various systems and file formats.
43
59
  An example is [embulk-output-postgres-json](https://github.com/frsyuki/embulk-output-postgres-json) plugin. It outputs data into PostgreSQL server using "json" column type.
44
60
 
45
61
  ```
46
- java -jar embulk.jar gem install embulk-output-postgres-json
47
- java -jar embulk.jar gem list
62
+ embulk gem install embulk-output-postgres-json
63
+ embulk gem list
48
64
  ```
49
65
 
50
66
  You can search plugins on RubyGems: [search for "embulk"](https://rubygems.org/search?utf8=%E2%9C%93&query=embulk).
@@ -57,9 +73,9 @@ You can use the bundle using `-b <bundle_dir>` option. `embulk bundle` also gene
57
73
  See generated \<bundle_dir>/Gemfile file how to plugin bundles work.
58
74
 
59
75
  ```
60
- java -jar embulk.jar bundle ./embulk_bundle
61
- java -jar embulk.jar guess -b ./embulk_bundle ...
62
- java -jar embulk.jar run -b ./embulk_bundle ...
76
+ embulk bundle ./embulk_bundle
77
+ embulk guess -b ./embulk_bundle ...
78
+ embulk run -b ./embulk_bundle ...
63
79
  ```
64
80
 
65
81
  ### Releasing plugins to RubyGems
@@ -76,19 +92,19 @@ Embulk supports resuming failed transactions.
76
92
  To enable resuming, you need to start transaction with `-r PATH` option:
77
93
 
78
94
  ```
79
- java -jar embulk.jar run config.yml -r resume-state.yml
95
+ embulk run config.yml -r resume-state.yml
80
96
  ```
81
97
 
82
98
  If the transaction fails, embulk stores state some states to the yaml file. You can retry the transaction using exactly same command:
83
99
 
84
100
  ```
85
- java -jar embulk.jar run config.yml -r resume-state.yml
101
+ embulk run config.yml -r resume-state.yml
86
102
  ```
87
103
 
88
104
  If you giveup to resume the transaction, you can use `embulk cleanup` subcommand to delete intermediate data:
89
105
 
90
106
  ```
91
- java -jar embulk.jar cleanup config.yml -r resume-state.yml
107
+ embulk cleanup config.yml -r resume-state.yml
92
108
  ```
93
109
 
94
110
 
@@ -5,13 +5,14 @@ plugins {
5
5
  id 'com.github.jruby-gradle.base' version '0.1.5'
6
6
  id 'com.github.johnrengelman.shadow' version '1.2.0'
7
7
  }
8
+ import com.github.jrubygradle.JRubyExec
8
9
 
9
10
  def java_projects = [project(":embulk-core"), project(":embulk-standards"), project(":embulk-cli")]
10
11
  def release_projects = [project(":embulk-core"), project(":embulk-standards")]
11
12
 
12
13
  allprojects {
13
14
  group = 'org.embulk'
14
- version = '0.4.5'
15
+ version = '0.4.6'
15
16
 
16
17
  apply plugin: 'java'
17
18
  apply plugin: 'maven-publish'
@@ -131,6 +132,10 @@ subprojects {
131
132
  }
132
133
  }
133
134
 
135
+ dependencies {
136
+ jrubyExec 'rubygems:test-unit:3.0.+'
137
+ }
138
+
134
139
  //
135
140
  // classpath task
136
141
  //
@@ -143,11 +148,8 @@ clean { delete 'classpath' }
143
148
  task cli(dependsOn: ':embulk-cli:shadowJar') << {
144
149
  file('pkg').mkdirs()
145
150
  File f = file("pkg/embulk-${project.version}.jar")
146
- f.write('''\
147
- #!/bin/sh
148
- exec java -jar "$0" "$@"
149
- exit 127
150
- ''')
151
+ f.write("")
152
+ f.append(file("embulk-cli/src/main/sh/selfrun.sh").readBytes())
151
153
  f.append(file("embulk-cli/build/libs/embulk-cli-${project.version}-all.jar").readBytes())
152
154
  f.setExecutable(true)
153
155
  }
@@ -175,16 +177,19 @@ project(':embulk-cli') {
175
177
  }
176
178
  }
177
179
 
180
+ task rubyTest(type: JRubyExec) {
181
+ jrubyArgs '-Ilib', '-Itest', '-rtest/unit', '-eTest::Unit::AutoRunner.run(true, *ARGV)'
182
+ script 'test'
183
+ }
184
+
178
185
  //
179
186
  // gem task
180
187
  //
181
- import com.github.jrubygradle.JRubyExec
182
188
  task gem(type: JRubyExec) {
183
189
  jrubyArgs '-rrubygems/gem_runner', '-eGem::GemRunner.new.run(ARGV)', 'build'
184
- script 'build/gemspec'
190
+ script 'embulk.gemspec'
185
191
  doLast { ant.move(file: "${project.name}-${project.version}.gem", todir: "pkg") }
186
192
  }
187
- gem.dependsOn('gemspec')
188
193
  gem.dependsOn('classpath')
189
194
 
190
195
  //
@@ -194,7 +199,6 @@ task rubyGemsUpload(type: JRubyExec, dependsOn: ["gem"]) {
194
199
  jrubyArgs '-rrubygems/gem_runner', '-eGem::GemRunner.new.run(ARGV)', 'push'
195
200
  script "pkg/embulk-${project.version}.gem"
196
201
  }
197
- gem.dependsOn('gemspec')
198
202
 
199
203
  //
200
204
  // releaseCheck and release tasks
@@ -248,33 +252,3 @@ task set_version << {
248
252
 
249
253
  println "add 'release/release-${to}' line to embulk-docs/src/release.rst"
250
254
  }
251
-
252
- task gemspec << {
253
- file('build').mkdirs()
254
- file('build/gemspec').write($/
255
- Gem::Specification.new do |gem|
256
- gem.name = "embulk"
257
- gem.version = "${project.version}"
258
-
259
- gem.summary = "Embulk, a plugin-based parallel bulk data loader"
260
- gem.description = "Embulk is an open-source, plugin-based bulk data loader to scale and simplify data management across heterogeneous data stores. It can collect and ship any kinds of data in high throughput with transaction control."
261
- gem.authors = ["Sadayuki Furuhashi"]
262
- gem.email = ["frsyuki@gmail.com"]
263
- gem.license = "Apache 2.0"
264
- gem.homepage = "https://github.com/embulk/embulk"
265
-
266
- gem.files = `git ls-files`.split("\n") + Dir["classpath/*.jar"]
267
- gem.test_files = gem.files.grep(%r"^(test|spec)/")
268
- gem.executables = gem.files.grep(%r"^bin/").map{ |f| File.basename(f) }
269
- gem.require_paths = ["lib"]
270
- gem.has_rdoc = false
271
-
272
- gem.add_development_dependency "bundler", [">= 1.0"]
273
- gem.add_development_dependency "rake", [">= 0.10.0"]
274
- gem.add_development_dependency "rspec", ["~> 2.11"]
275
- gem.add_development_dependency "json", ["~> 1.7"]
276
- gem.add_development_dependency "yard", ["~> 0.8.7"]
277
- gem.add_development_dependency "kramdown", ["~> 1.5.0"]
278
- end
279
- /$)
280
- }
@@ -0,0 +1,64 @@
1
+
2
+ : <<BAT
3
+ @echo off
4
+
5
+ java -jar %0 %*
6
+
7
+ exit /B
8
+ BAT
9
+
10
+ java_args=""
11
+ jruby_args=""
12
+ default_optimize=""
13
+ overwrite_optimize=""
14
+
15
+ while true; do
16
+ case "$1" in
17
+ "-J+O")
18
+ overwrite_optimize="true"
19
+ shift
20
+ break;
21
+ ;;
22
+ "-J-O")
23
+ overwrite_optimize="false"
24
+ shift
25
+ break;
26
+ ;;
27
+ -J*)
28
+ v="${1#-J}"
29
+ if test "$v"; then
30
+ java_args="$java_args $v"
31
+ else
32
+ shift
33
+ file_args=`cat "$1"`
34
+ if test $? -ne 0; then
35
+ echo "Failed to load java argument file."
36
+ exit 1
37
+ fi
38
+ java_args="$java_args $file_args"
39
+ fi
40
+ shift
41
+ ;;
42
+ -R*)
43
+ v="${1#-R}"
44
+ jruby_args="$jruby_args $v"
45
+ shift
46
+ ;;
47
+ run)
48
+ default_optimize="true"
49
+ break
50
+ ;;
51
+ *)
52
+ break
53
+ ;;
54
+ esac
55
+ done
56
+
57
+ if test "$overwrite_optimize" = "true" -o "$default_optimize" -a "$overwrite_optimize" != "false"; then
58
+ java_args="-XX:+AggressiveOpts -XX:+UseConcMarkSweepGC $java_args"
59
+ else
60
+ java_args="-XX:+AggressiveOpts -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Xverify:none $java_args"
61
+ fi
62
+
63
+ exec java $java_args -jar "$0" $jruby_args "$@"
64
+ exit 127
@@ -43,6 +43,9 @@ public class TimestampFormat
43
43
  if(s.startsWith("+") || s.startsWith("-")) {
44
44
  return DateTimeZone.forID(s);
45
45
 
46
+ } else if (s.equals("Z")) {
47
+ return DateTimeZone.UTC;
48
+
46
49
  } else {
47
50
  try {
48
51
  int rawOffset = (int) DateTimeFormat.forPattern("z").parseMillis(s);
@@ -16,4 +16,5 @@ Release Notes
16
16
  release/release-0.4.3
17
17
  release/release-0.4.4
18
18
  release/release-0.4.5
19
+ release/release-0.4.6
19
20
 
@@ -0,0 +1,30 @@
1
+ Release 0.4.6
2
+ ==================================
3
+
4
+ CLI
5
+ ------------------
6
+
7
+ * Updated installation script to install ``embulk`` binary to ~/.embulk/bin directory.
8
+ * Updated boot script to set ``-XX:+TieredCompilation`` and ``-XX:TieredStopAtLevel=1`` JVM options excepting ``run`` mode.
9
+ * Those change improves startup time of embulk command 1.5 ~ 2x faster.
10
+ * Updated boot script to set ``-XX:+UseConcMarkSweepGC`` JVM option to ``run`` mode.
11
+ * Usage message shows embulk version.
12
+
13
+ Built-in plugins
14
+ ------------------
15
+
16
+ * ``input/file`` plugin sets ``last_path`` option to the next configuration.
17
+ * This enables scheduled execution where next execution loads files created after the last execution.
18
+ * ``guess/csv`` fixed guess result when the ordr of dates is not year, month day.
19
+ * ``guess/csv`` guesses dd/mm/yyyy time format if the input data includes at least one day larger than 12.
20
+ * ``guess/csv`` guesses RFC 2822 date and time formats
21
+ * ``guess/csv`` supports "." as the delimiter of year, month or day.
22
+
23
+ General Changes
24
+ ------------------
25
+
26
+ * embulk command raises a special message if HOME environment variable is not set.
27
+
28
+ Release Date
29
+ ------------------
30
+ 2015-02-20
@@ -1,6 +1,8 @@
1
1
  package org.embulk.standards;
2
2
 
3
3
  import java.util.List;
4
+ import java.util.ArrayList;
5
+ import java.util.Collections;
4
6
  import java.io.File;
5
7
  import java.io.FileInputStream;
6
8
  import java.io.InputStream;
@@ -73,8 +75,14 @@ public class LocalFileInputPlugin
73
75
  int taskCount,
74
76
  FileInputPlugin.Control control)
75
77
  {
78
+ PluginTask task = taskSource.loadTask(PluginTask.class);
79
+
76
80
  control.run(taskSource, taskCount);
77
- return Exec.newConfigDiff();
81
+
82
+ List<String> files = new ArrayList<String>(task.getFiles());
83
+ Collections.sort(files);
84
+ return Exec.newConfigDiff().
85
+ set("last_path", files.get(files.size() - 1));
78
86
  }
79
87
 
80
88
  @Override
@@ -5,23 +5,22 @@ Gem::Specification.new do |gem|
5
5
  gem.name = "embulk"
6
6
  gem.version = Embulk::VERSION
7
7
 
8
- gem.summary = %q{Embulk, a plugin-based parallel bulk data loader}
9
- gem.description = %q{Embulk is an open-source, plugin-based bulk data loader to scale and simplify data management across heterogeneous data stores. It can collect and ship any kinds of data in high throughput with transaction control.}
8
+ gem.summary = "Embulk, a plugin-based parallel bulk data loader"
9
+ gem.description = "Embulk is an open-source, plugin-based bulk data loader to scale and simplify data management across heterogeneous data stores. It can collect and ship any kinds of data in high throughput with transaction control."
10
10
  gem.authors = ["Sadayuki Furuhashi"]
11
11
  gem.email = ["frsyuki@gmail.com"]
12
12
  gem.license = "Apache 2.0"
13
13
  gem.homepage = "https://github.com/embulk/embulk"
14
14
 
15
15
  gem.files = `git ls-files`.split("\n") + Dir["classpath/*.jar"]
16
- gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
17
- gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
16
+ gem.test_files = gem.files.grep(%r"^(test|spec)/")
17
+ gem.executables = gem.files.grep(%r"^bin/").map{ |f| File.basename(f) }
18
18
  gem.require_paths = ["lib"]
19
19
  gem.has_rdoc = false
20
20
 
21
- gem.add_development_dependency 'bundler', ['>= 1.0']
22
- gem.add_development_dependency 'rake', ['>= 0.10.0']
23
- gem.add_development_dependency 'rspec', ['~> 2.11']
24
- gem.add_development_dependency 'json', ['~> 1.7']
25
- gem.add_development_dependency 'yard', ['~> 0.8.7']
26
- gem.add_development_dependency 'kramdown', ['~> 1.5.0']
21
+ gem.add_development_dependency "bundler", [">= 1.0"]
22
+ gem.add_development_dependency "rake", [">= 0.10.0"]
23
+ gem.add_development_dependency "test-unit", ["~> 3.0.9"]
24
+ gem.add_development_dependency "yard", ["~> 0.8.7"]
25
+ gem.add_development_dependency "kramdown", ["~> 1.5.0"]
27
26
  end
@@ -10,7 +10,7 @@ module Embulk
10
10
  # GEM_HOME is already set by embulk/command/embulk.rb
11
11
  gem_home = ENV['GEM_HOME'].to_s
12
12
  if gem_home.empty?
13
- ENV['GEM_HOME'] = File.expand_path File.join(ENV['HOME'], '.embulk', Gem.ruby_engine, RbConfig::CONFIG['ruby_version'])
13
+ ENV['GEM_HOME'] = default_gem_home
14
14
  Gem.clear_paths # force rubygems to reload GEM_HOME
15
15
  end
16
16
 
@@ -298,6 +298,17 @@ examples:
298
298
  File.join(home, dir)
299
299
  end
300
300
 
301
+ def self.default_gem_home
302
+ if RUBY_PLATFORM =~ /java/i
303
+ user_home = java.lang.System.properties["user.home"]
304
+ end
305
+ user_home ||= ENV['HOME']
306
+ unless user_home
307
+ raise "HOME environment variable is not set."
308
+ end
309
+ File.expand_path File.join(user_home, '.embulk', Gem.ruby_engine, RbConfig::CONFIG['ruby_version'])
310
+ end
311
+
301
312
  private
302
313
 
303
314
  def self.setup_gem_paths(path)
@@ -322,6 +333,7 @@ examples:
322
333
  end
323
334
 
324
335
  def self.usage(message)
336
+ STDERR.puts "Embulk v#{Embulk::VERSION}"
325
337
  STDERR.puts "usage: <command> [--options]"
326
338
  STDERR.puts "commands:"
327
339
  STDERR.puts " bundle [directory] # create or update plugin environment."
@@ -26,12 +26,12 @@ module Embulk
26
26
  def add(page)
27
27
  # filtering code:
28
28
  page.each do |record|
29
- @page_builder.add(record)
29
+ page_builder.add(record)
30
30
  end
31
31
  end
32
32
 
33
33
  def finish
34
- @page_builder.finish
34
+ page_builder.finish
35
35
  end
36
36
  end
37
37
 
@@ -31,7 +31,7 @@ module Embulk
31
31
  # output code:
32
32
  page.each do |record|
33
33
  if @current_file == nil || @current_file_size > 32*1024
34
- @current_file = @file_output.next_file
34
+ @current_file = file_output.next_file
35
35
  @current_file_size = 0
36
36
  end
37
37
  @current_file.write "|mydata|"
@@ -39,7 +39,7 @@ module Embulk
39
39
  end
40
40
 
41
41
  def finish
42
- @file_output.finish
42
+ file_output.finish
43
43
  end
44
44
  end
45
45
 
@@ -34,9 +34,9 @@ module Embulk
34
34
  end
35
35
 
36
36
  def run
37
- @page_builder.add(["example-value", 1, 0.1])
38
- @page_builder.add(["example-value", 2, 0.2])
39
- @page_builder.finish
37
+ page_builder.add(["example-value", 1, 0.1])
38
+ page_builder.add(["example-value", 2, 0.2])
39
+ page_builder.finish
40
40
 
41
41
  commit_report = {}
42
42
  return commit_report
@@ -31,10 +31,10 @@ module Embulk
31
31
  file.each do |buffer|
32
32
  # parsering code
33
33
  record = ["col1", 2, 3.0]
34
- @page_builder.add(record)
34
+ page_builder.add(record)
35
35
  end
36
36
  end
37
- @page_builder.finish
37
+ page_builder.finish
38
38
  end
39
39
  end
40
40
 
@@ -16,6 +16,9 @@ module Embulk::Guess
16
16
 
17
17
  WEEKDAY_NAME_SHORT = /Sun|Mon|Tue|Wed|Thu|Fri|Sat/
18
18
  WEEKDAY_NAME_FULL = /Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday/
19
+
20
+ ZONE_OFF = /(?:Z|[\-\+]\d\d(?::?\d\d)?)/
21
+ ZONE_ABB = /[A-Z]{1,3}/
19
22
  end
20
23
 
21
24
  class GuessMatch
@@ -119,9 +122,17 @@ module Embulk::Guess
119
122
  end
120
123
 
121
124
  def mergeable_group
122
- [@delimiters, @parts]
125
+ # MDY is mergible with DMY
126
+ if i = array_sequence_find(@parts, [:day, :month, :year])
127
+ ps = @parts.dup
128
+ ps[i, 3] = [:month, :day, :year]
129
+ [@delimiters, ps]
130
+ else
131
+ [@delimiters, @parts]
132
+ end
123
133
  end
124
134
 
135
+ attr_reader :parts
125
136
  attr_reader :part_options
126
137
 
127
138
  def merge!(another_in_group)
@@ -136,27 +147,44 @@ module Embulk::Guess
136
147
  [@part_options[i], part_options[i]].sort.last
137
148
  end
138
149
  end
150
+
151
+ # if DMY matches, MDY is likely false match of DMY.
152
+ dmy = array_sequence_find(another_in_group.parts, [:day, :month, :year])
153
+ mdy = array_sequence_find(@parts, [:month, :day, :year])
154
+ if mdy && dmy
155
+ @parts[mdy, 3] = [:day, :month, :year]
156
+ end
157
+ end
158
+
159
+ def array_sequence_find(array, seq)
160
+ (array.size - seq.size + 1).times {|i|
161
+ return i if array[i, seq.size] == seq
162
+ }
163
+ return nil
139
164
  end
140
165
  end
141
166
 
142
167
  class GuessPattern
143
168
  include Parts
144
169
 
145
- date_delims = /[\/\-]/
170
+ date_delims = /[\/\-\.]/
146
171
  # yyyy-MM-dd
147
172
  YMD = /(?<year>#{YEAR})(?<date_delim>#{date_delims})(?<month>#{MONTH})\k<date_delim>(?<day>#{DAY})/
148
173
  YMD_NODELIM = /(?<year>#{YEAR})(?<month>#{MONTH_NODELIM})(?<day>#{DAY_NODELIM})/
149
- # dd/MM/yyyy
150
- DMY = /(?<year>#{YEAR})(?<date_delim>#{date_delims})(?<month>#{MONTH})\k<date_delim>(?<day>#{DAY})/
151
- DMY_NODELIM = /(?<year>#{YEAR})(?<month>#{MONTH_NODELIM})(?<day>#{DAY_NODELIM})/
152
-
153
- frac = /[0-9]{1,24}/
174
+ # MM/dd/yyyy
175
+ MDY = /(?<month>#{MONTH})(?<date_delim>#{date_delims})(?<day>#{DAY})\k<date_delim>(?<year>#{YEAR})/
176
+ MDY_NODELIM = /(?<month>#{MONTH_NODELIM})(?<day>#{DAY_NODELIM})(?<year>#{YEAR})/
177
+ # dd.MM.yyyy
178
+ DMY = /(?<day>#{DAY})(?<date_delim>#{date_delims})(?<month>#{MONTH})\k<date_delim>(?<year>#{YEAR})/
179
+ DMY_NODELIM = /(?<day>#{DAY_NODELIM})(?<month>#{MONTH_NODELIM})(?<year>#{YEAR})/
180
+
181
+ frac = /[0-9]{1,9}/
154
182
  time_delims = /[\:\-]/
155
183
  frac_delims = /[\.\,]/
156
- TIME = /(?<hour>#{HOUR})(?<time_delim>#{time_delims})(?<minute>#{MINUTE})(?:\k<time_delim>(?<second>#{SECOND})(?:(?<frac_delim>#{frac_delims})(?<frac>#{frac}))?)?/
157
- TIME_NODELIM = /(?<hour>#{HOUR_NODELIM})(?<minute>#{MINUTE_NODELIM})((?<second>#{SECOND_NODELIM})(?:(?<frac_delim>#{frac_delims})(?<frac>#{frac}))?)?/
184
+ TIME = /(?<hour>#{HOUR})(?:(?<time_delim>#{time_delims})(?<minute>#{MINUTE})(?:\k<time_delim>(?<second>#{SECOND})(?:(?<frac_delim>#{frac_delims})(?<frac>#{frac}))?)?)?/
185
+ TIME_NODELIM = /(?<hour>#{HOUR_NODELIM})(?:(?<minute>#{MINUTE_NODELIM})((?<second>#{SECOND_NODELIM})(?:(?<frac_delim>#{frac_delims})(?<frac>#{frac}))?)?)?/
158
186
 
159
- TZ = /(?<zone_space> )?(?<zone>(?<zone_off>[\-\+]\d\d(?::?\d\d)?)|(?<zone_abb>[A-Z]{3}))|(?<z>Z)/
187
+ ZONE = /(?<zone_space> )?(?<zone>(?<zone_off>#{ZONE_OFF})|(?<zone_abb>#{ZONE_ABB}))/
160
188
 
161
189
  def match(text)
162
190
  delimiters = []
@@ -177,6 +205,21 @@ module Embulk::Guess
177
205
  parts << :day
178
206
  part_options << part_heading_option(dm["day"])
179
207
 
208
+ elsif dm = (/^#{MDY}(?<rest>.*?)$/.match(text) or /^#{MDY_NODELIM}(?<rest>.*?)$/.match(text))
209
+ date_delim = dm["date_delim"] rescue ""
210
+
211
+ parts << :month
212
+ part_options << part_heading_option(dm["month"])
213
+ delimiters << date_delim
214
+
215
+ parts << :day
216
+ part_options << part_heading_option(dm["day"])
217
+ delimiters << date_delim
218
+
219
+ parts << :year
220
+ part_options << nil
221
+ delimiters << date_delim
222
+
180
223
  elsif dm = (/^#{DMY}(?<rest>.*?)$/.match(text) or /^#{DMY_NODELIM}(?<rest>.*?)$/.match(text))
181
224
  date_delim = dm["date_delim"] rescue ""
182
225
 
@@ -198,7 +241,7 @@ module Embulk::Guess
198
241
  end
199
242
  rest = dm["rest"]
200
243
 
201
- date_time_delims = /[ _T]/
244
+ date_time_delims = /(:? |_|T|\. ?)/
202
245
  if tm = (
203
246
  /^(?<date_time_delim>#{date_time_delims})#{TIME}(?<rest>.*?)?$/.match(rest) or
204
247
  /^(?<date_time_delim>#{date_time_delims})#{TIME_NODELIM}(?<rest>.*?)?$/.match(rest) or
@@ -211,31 +254,30 @@ module Embulk::Guess
211
254
  parts << :hour
212
255
  part_options << part_heading_option(tm["hour"])
213
256
 
214
- delimiters << time_delim
215
- parts << :minute
216
- part_options << part_heading_option(tm["minute"])
217
-
218
- if tm["second"]
257
+ if tm["minute"]
219
258
  delimiters << time_delim
220
- parts << :second
221
- part_options << part_heading_option(tm["second"])
222
- end
223
-
224
- if tm["frac"]
225
- delimiters << tm["frac_delim"]
226
- parts << :frac
227
- part_options << tm["frac"].size
259
+ parts << :minute
260
+ part_options << part_heading_option(tm["minute"])
261
+
262
+ if tm["second"]
263
+ delimiters << time_delim
264
+ parts << :second
265
+ part_options << part_heading_option(tm["second"])
266
+
267
+ if tm["frac"]
268
+ delimiters << tm["frac_delim"]
269
+ parts << :frac
270
+ part_options << tm["frac"].size
271
+ end
272
+ end
228
273
  end
229
274
 
230
275
  rest = tm["rest"]
231
276
  end
232
277
 
233
- if zm = /^#{TZ}$/.match(rest)
278
+ if zm = /^#{ZONE}$/.match(rest)
234
279
  delimiters << (zm["zone_space"] || '')
235
- if zm["z"]
236
- # TODO ISO 8601
237
- parts << :zone_off
238
- elsif zm["zone_off"]
280
+ if zm["zone_off"]
239
281
  parts << :zone_off
240
282
  else
241
283
  parts << :zone_abb
@@ -265,9 +307,9 @@ module Embulk::Guess
265
307
  end
266
308
  end
267
309
 
268
- class RegexpMatch
310
+ class SimpleMatch
269
311
  def initialize(format)
270
- @format
312
+ @format = format
271
313
  end
272
314
 
273
315
  attr_reader :format
@@ -280,10 +322,33 @@ module Embulk::Guess
280
322
  end
281
323
  end
282
324
 
325
+ class Rfc2822Pattern
326
+ include Parts
327
+
328
+ def initialize
329
+ @regexp = /^(?<weekday>#{WEEKDAY_NAME_SHORT}, )?\d\d #{MONTH_NAME_SHORT} \d\d\d\d(?<time> \d\d:\d\d(?<second>:\d\d)? (?:(?<zone_off>#{ZONE_OFF})|(?<zone_abb>#{ZONE_ABB})))?$/
330
+ end
331
+
332
+ def match(text)
333
+ if m = @regexp.match(text)
334
+ format = ''
335
+ format << "%a, " if m['weekday']
336
+ format << "%d %b %Y"
337
+ format << " %H:%M" if m['time']
338
+ format << ":%S" if m['second']
339
+ format << " %z" if m['zone_off']
340
+ format << " %Z" if m['zone_abb']
341
+ SimpleMatch.new(format)
342
+ else
343
+ nil
344
+ end
345
+ end
346
+ end
347
+
283
348
  class RegexpPattern
284
349
  def initialize(regexp, format)
285
350
  @regexp = regexp
286
- @match = RegexpMatch.new(format)
351
+ @match = SimpleMatch.new(format)
287
352
  end
288
353
 
289
354
  def match(text)
@@ -298,18 +363,15 @@ module Embulk::Guess
298
363
  module StandardPatterns
299
364
  include Parts
300
365
 
301
- RFC_822_1123 = /^#{WEEKDAY_NAME_SHORT}, \d\d #{MONTH_NAME_SHORT} \d\d\d\d \d\d:\d\d:\d\d [a-zA-Z]{3}$/
302
- RFC_850_1035 = /^#{WEEKDAY_NAME_FULL}, \d\d-#{MONTH_NAME_SHORT}-\d\d \d\d:\d\d:\d\d [a-zA-Z]{3}$/
303
- APACHE_CLF = /^\d\d\/#{MONTH_NAME_SHORT}\/\d\d\d\d \d\d:\d\d:\d\d [\-\+]\d\d(?::?\d\d)?$/
366
+ APACHE_CLF = /^\d\d\/#{MONTH_NAME_SHORT}\/\d\d\d\d:\d\d:\d\d:\d\d #{ZONE_OFF}?$/
304
367
  ANSI_C_ASCTIME = /^#{WEEKDAY_NAME_SHORT} #{MONTH_NAME_SHORT} \d\d? \d\d:\d\d:\d\d \d\d\d\d$/
305
368
  end
306
369
 
307
370
  PATTERNS = [
308
371
  GuessPattern.new,
309
- RegexpPattern.new(StandardPatterns::RFC_822_1123, "%a, %d %b %Y %H:%M:%S %z"),
310
- RegexpPattern.new(StandardPatterns::RFC_850_1035, "%A, %d-%b-%y %H:%M:%S %z"),
311
- RegexpPattern.new(StandardPatterns::APACHE_CLF, "%d/%b/%Y %H:%M:%S %Z"),
312
- RegexpPattern.new(StandardPatterns::ANSI_C_ASCTIME, "$a %b %e %H:%M:%S %Y"),
372
+ Rfc2822Pattern.new,
373
+ RegexpPattern.new(StandardPatterns::APACHE_CLF, "%d/%b/%Y:%H:%M:%S %z"),
374
+ RegexpPattern.new(StandardPatterns::ANSI_C_ASCTIME, "%a %b %e %H:%M:%S %Y"),
313
375
  ]
314
376
 
315
377
  def self.guess(texts)
@@ -322,8 +384,8 @@ module Embulk::Guess
322
384
  elsif matches.size == 1
323
385
  return matches[0].format
324
386
  else
325
- match_groups = matches.group_by {|match| match.mergeable_group }
326
- best_match_group = match_groups.sort_by {|group| group.size }.last[1]
387
+ match_groups = matches.group_by {|match| match.mergeable_group }.values
388
+ best_match_group = match_groups.sort_by {|group| group.size }.last
327
389
  best_match = best_match_group.shift
328
390
  best_match_group.each {|m| best_match.merge!(m) }
329
391
  return best_match.format
@@ -25,6 +25,8 @@ module Embulk
25
25
  init
26
26
  end
27
27
 
28
+ attr_reader :task, :schema, :index, :page_builder
29
+
28
30
  def init
29
31
  end
30
32
 
@@ -17,6 +17,8 @@ module Embulk
17
17
  init
18
18
  end
19
19
 
20
+ attr_reader :task, :schema, :page_builder
21
+
20
22
  def init
21
23
  end
22
24
 
@@ -1,3 +1,3 @@
1
1
  module Embulk
2
- VERSION = '0.4.5'
2
+ VERSION = '0.4.6'
3
3
  end
@@ -0,0 +1,122 @@
1
+ require 'helper'
2
+ require 'time'
3
+ require 'embulk/guess/time_format_guess'
4
+
5
+ class TimeFormatGuessTest < ::Test::Unit::TestCase
6
+ def test_format_delims
7
+ # date-delim "-" date-time-delim " " time-delim ":" frac delim "."
8
+ assert_guess "%Y-%m-%d %H:%M:%S.%N", "2014-01-01 01:01:01.000000001"
9
+ assert_guess "%Y-%m-%d %H:%M:%S.%N", "2014-01-01 01:01:01.000001"
10
+ assert_guess "%Y-%m-%d %H:%M:%S.%L", "2014-01-01 01:01:01.001"
11
+ assert_guess "%Y-%m-%d %H:%M:%S", "2014-01-01 01:01:01"
12
+ assert_guess "%Y-%m-%d %H:%M", "2014-01-01 01:01"
13
+ assert_guess "%Y-%m-%d", "2014-01-01"
14
+
15
+ # date-delim "/" date-time-delim " " time-delim "-" frac delim ","
16
+ assert_guess "%Y/%m/%d %H-%M-%S,%N", "2014/01/01 01-01-01,000000001"
17
+ assert_guess "%Y/%m/%d %H-%M-%S,%N", "2014/01/01 01-01-01,000001"
18
+ assert_guess "%Y/%m/%d %H-%M-%S,%L", "2014/01/01 01-01-01,001"
19
+ assert_guess "%Y/%m/%d %H-%M-%S", "2014/01/01 01-01-01"
20
+ assert_guess "%Y/%m/%d %H-%M", "2014/01/01 01-01"
21
+ assert_guess "%Y/%m/%d", "2014/01/01"
22
+
23
+ # date-delim "." date-time-delim "." time-delim ":" frac delim "."
24
+ assert_guess "%Y.%m.%d.%H:%M:%S.%N", "2014.01.01.01:01:01.000000001"
25
+ assert_guess "%Y.%m.%d.%H:%M:%S.%N", "2014.01.01.01:01:01.000001"
26
+ assert_guess "%Y.%m.%d.%H:%M:%S.%L", "2014.01.01.01:01:01.001"
27
+ assert_guess "%Y.%m.%d.%H:%M:%S", "2014.01.01.01:01:01"
28
+ assert_guess "%Y.%m.%d.%H:%M", "2014.01.01.01:01"
29
+ assert_guess "%Y.%m.%d", "2014.01.01"
30
+
31
+ # date-delim "." date-time-delim ". " time-delim ":" frac delim ","
32
+ assert_guess "%Y.%m.%d. %H:%M:%S,%N", "2014.01.01. 01:01:01,000000001"
33
+ assert_guess "%Y.%m.%d. %H:%M:%S,%N", "2014.01.01. 01:01:01,000001"
34
+ assert_guess "%Y.%m.%d. %H:%M:%S,%L", "2014.01.01. 01:01:01,001"
35
+ assert_guess "%Y.%m.%d. %H:%M:%S", "2014.01.01. 01:01:01"
36
+ assert_guess "%Y.%m.%d. %H:%M", "2014.01.01. 01:01"
37
+ assert_guess "%Y.%m.%d", "2014.01.01"
38
+ end
39
+
40
+ def test_format_ymd_orders
41
+ assert_guess "%Y-%m-%d", "2014-01-01"
42
+ assert_guess "%Y/%m/%d", "2014/01/01"
43
+ assert_guess "%Y.%m.%d", "2014.01.01"
44
+ assert_guess "%m/%d/%Y", "01/01/2014"
45
+ assert_guess "%m.%d.%Y", "01.01.2014"
46
+ assert_guess "%d/%m/%Y", "13/01/2014"
47
+ assert_guess "%d/%m/%Y", "21/01/2014"
48
+ end
49
+
50
+ def test_format_iso8601
51
+ assert_guess "%Y-%m-%d", "1981-04-05"
52
+ assert_guess "%Y-%m-%dT%H", "2007-04-06T13"
53
+ assert_guess "%Y-%m-%dT%H:%M", "2007-04-06T00:00"
54
+ assert_guess "%Y-%m-%dT%H:%M", "2007-04-05T24:00"
55
+ assert_guess "%Y-%m-%dT%H:%M:%S", "2007-04-06T13:47:30"
56
+ assert_guess "%Y-%m-%dT%H:%M:%S%z", "2007-04-06T13:47:30Z"
57
+ assert_guess "%Y-%m-%dT%H:%M:%S%z", "2007-04-06T13:47:30+00"
58
+ assert_guess "%Y-%m-%dT%H:%M:%S%z", "2007-04-06T13:47:30+00:00"
59
+ assert_guess "%Y-%m-%dT%H:%M:%S%z", "2007-04-06T13:47:30+0000"
60
+ assert_guess "%Y-%m-%dT%H:%M:%S%z", "2007-04-06T13:47:30-01"
61
+ assert_guess "%Y-%m-%dT%H:%M:%S%z", "2007-04-06T13:47:30-01:30"
62
+ assert_guess "%Y-%m-%dT%H:%M:%S%z", "2007-04-06T13:47:30-0130"
63
+ end
64
+
65
+ def test_format_rfc_822_2822
66
+ assert_guess '%a, %d %b %Y %H:%M:%S %Z', "Fri, 20 Feb 2015 14:02:34 PST"
67
+ assert_guess '%a, %d %b %Y %H:%M:%S %Z', "Fri, 20 Feb 2015 22:02:34 UT"
68
+ assert_guess '%a, %d %b %Y %H:%M:%S %Z', "Fri, 20 Feb 2015 22:02:34 GMT"
69
+ assert_guess '%d %b %Y %H:%M:%S %Z', "20 Feb 2015 22:02:34 GMT"
70
+ assert_guess '%d %b %Y %H:%M %Z', "20 Feb 2015 22:02 GMT"
71
+ assert_guess '%a, %d %b %Y %H:%M %Z', "Fri, 20 Feb 2015 22:02 GMT"
72
+ assert_guess '%d %b %Y', "20 Feb 2015"
73
+ assert_guess '%a, %d %b %Y', "Fri, 20 Feb 2015"
74
+ assert_guess '%a, %d %b %Y %H:%M %z', "Fri, 20 Feb 2015 22:02 +0000"
75
+ assert_guess '%a, %d %b %Y %H:%M %z', "Fri, 20 Feb 2015 22:02 +00:00"
76
+ assert_guess '%a, %d %b %Y %H:%M %z', "Fri, 20 Feb 2015 22:02 +00"
77
+ end
78
+
79
+ def test_format_apache_clf
80
+ assert_guess '%d/%b/%Y:%H:%M:%S %z', "07/Mar/2004:16:05:50 -0800"
81
+ end
82
+
83
+ def test_format_ansi_c_asctime
84
+ assert_guess '%a %b %e %H:%M:%S %Y', "Fri May 11 21:44:53 2001"
85
+ end
86
+
87
+ def test_format_merge_frequency
88
+ assert_guess_partial 2, "%Y-%m-%d %H:%M:%S", ["2014-01-01", "2014-01-01 00:00:00", "2014-01-01 00:00:00"]
89
+ end
90
+
91
+ def test_format_merge_dmy
92
+ # DMY has higher priority than MDY
93
+ assert_guess "%m/%d/%Y", ["01/01/2014"]
94
+ assert_guess "%d/%m/%Y", ["01/01/2014", "01/01/2014", "13/01/2014"]
95
+ assert_guess "%d.%m.%Y", ["01.01.2014", "01.01.2014", "13.01.2014"]
96
+ # but frequency is more important if delimiter is different
97
+ assert_guess_partial 2, "%m/%d/%Y", ["01/01/2014", "01/01/2014", "13.01.2014"]
98
+ end
99
+
100
+ def assert_guess(format, texts)
101
+ assert_equal format, guess(texts)
102
+ Array(texts).each do |text|
103
+ time = Time.strptime(text, format)
104
+ assert_equal time.to_i, Time.strptime(time.strftime(format), format).to_i
105
+ end
106
+ end
107
+
108
+ def assert_guess_partial(count, format, texts)
109
+ assert_equal format, guess(texts)
110
+ times = Array(texts).map do |text|
111
+ Time.strptime(text, format) rescue nil
112
+ end.compact
113
+ assert_equal count, times.size
114
+ times.each do |time|
115
+ assert_equal time.to_i, Time.strptime(time.strftime(format), format).to_i
116
+ end
117
+ end
118
+
119
+ def guess(texts)
120
+ Embulk::Guess::TimeFormatGuess.guess(Array(texts))
121
+ end
122
+ end
@@ -0,0 +1,6 @@
1
+ require 'test/unit'
2
+
3
+ module Embulk
4
+ end
5
+
6
+ # TODO simplecov
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.5
4
+ version: 0.4.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sadayuki Furuhashi
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-02-20 00:00:00.000000000 Z
11
+ date: 2015-02-21 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -39,31 +39,17 @@ dependencies:
39
39
  prerelease: false
40
40
  type: :development
41
41
  - !ruby/object:Gem::Dependency
42
- name: rspec
42
+ name: test-unit
43
43
  version_requirements: !ruby/object:Gem::Requirement
44
44
  requirements:
45
45
  - - ~>
46
46
  - !ruby/object:Gem::Version
47
- version: '2.11'
47
+ version: 3.0.9
48
48
  requirement: !ruby/object:Gem::Requirement
49
49
  requirements:
50
50
  - - ~>
51
51
  - !ruby/object:Gem::Version
52
- version: '2.11'
53
- prerelease: false
54
- type: :development
55
- - !ruby/object:Gem::Dependency
56
- name: json
57
- version_requirements: !ruby/object:Gem::Requirement
58
- requirements:
59
- - - ~>
60
- - !ruby/object:Gem::Version
61
- version: '1.7'
62
- requirement: !ruby/object:Gem::Requirement
63
- requirements:
64
- - - ~>
65
- - !ruby/object:Gem::Version
66
- version: '1.7'
52
+ version: 3.0.9
67
53
  prerelease: false
68
54
  type: :development
69
55
  - !ruby/object:Gem::Dependency
@@ -113,6 +99,7 @@ files:
113
99
  - build.gradle
114
100
  - embulk-cli/build.gradle
115
101
  - embulk-cli/src/main/java/org/embulk/cli/Main.java
102
+ - embulk-cli/src/main/sh/selfrun.sh
116
103
  - embulk-core/build.gradle
117
104
  - embulk-core/src/main/java/org/embulk/EmbulkService.java
118
105
  - embulk-core/src/main/java/org/embulk/command/Runner.java
@@ -272,6 +259,7 @@ files:
272
259
  - embulk-docs/src/release/release-0.4.3.rst
273
260
  - embulk-docs/src/release/release-0.4.4.rst
274
261
  - embulk-docs/src/release/release-0.4.5.rst
262
+ - embulk-docs/src/release/release-0.4.6.rst
275
263
  - embulk-standards/build.gradle
276
264
  - embulk-standards/src/main/java/org/embulk/standards/CsvFormatterPlugin.java
277
265
  - embulk-standards/src/main/java/org/embulk/standards/CsvParserPlugin.java
@@ -364,6 +352,8 @@ files:
364
352
  - lib/embulk/schema.rb
365
353
  - lib/embulk/version.rb
366
354
  - settings.gradle
355
+ - test/guess/test_time_format_guess.rb
356
+ - test/helper.rb
367
357
  - classpath/annotations-3.0.0.jar
368
358
  - classpath/aopalliance-1.0.jar
369
359
  - classpath/bval-core-0.5.jar
@@ -371,7 +361,9 @@ files:
371
361
  - classpath/commons-beanutils-core-1.8.3.jar
372
362
  - classpath/commons-lang3-3.1.jar
373
363
  - classpath/embulk-core-0.4.5.jar
364
+ - classpath/embulk-core-0.4.6.jar
374
365
  - classpath/embulk-standards-0.4.5.jar
366
+ - classpath/embulk-standards-0.4.6.jar
375
367
  - classpath/guava-18.0.jar
376
368
  - classpath/guice-3.0.jar
377
369
  - classpath/guice-multibindings-3.0.jar
@@ -417,4 +409,6 @@ rubygems_version: 2.1.9
417
409
  signing_key:
418
410
  specification_version: 4
419
411
  summary: Embulk, a plugin-based parallel bulk data loader
420
- test_files: []
412
+ test_files:
413
+ - test/guess/test_time_format_guess.rb
414
+ - test/helper.rb