embulk-plugin-input-roo-excel 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 39175d88b536db9345ddc1f516ab03dd1660a1db
4
+ data.tar.gz: 7745e6d72fd68c9f173378bfd98d5c2228e2aa71
5
+ SHA512:
6
+ metadata.gz: b42ca65857ec1dd48977bc63b78d5c5934361c6fbe5cb8e97d42e4d9e65a18d4c33e38e6350ff396a5d2111f601c8eb29b31368a7b768d72fd6cd482db9639b8
7
+ data.tar.gz: 9a8f84401eaa961656fade25a3d7b33befaa22d5e4ee996f0c115e5aa21099310903a5de7d6e658d873db9f83df795ed914ca2394e05fe22d425908a870e89bc
data/.gitignore ADDED
@@ -0,0 +1,14 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ *.bundle
11
+ *.so
12
+ *.o
13
+ *.a
14
+ mkmf.log
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in embulk-plugin-input-roo-excel.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,13 @@
1
+ Copyright (c) 2015 Hiroyuki Sato
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
data/README.ja.md ADDED
@@ -0,0 +1,134 @@
1
+ # embulk-plugin-input-roo-excel
2
+
3
+ [embulk-plugin-input-roo-excel](https://github.com/hiroyuki-sato/embulk-plugin-input-roo-excel)は、[Embulk](https://github.com/embulk/embulk)で、xlsxファイルを読込みするための入力プラグインです。
4
+
5
+ ## インストール方法
6
+
7
+ Embulkのgemインストール方法にならって次のようにしてパッケージを導入します。
8
+
9
+
10
+ ```
11
+ java -jar embulk.jar gem intall embulk-plugin-input-roo-excel
12
+ ```
13
+
14
+ また本プラグインは、xlsxファイルの読込みにrooを利用しているので、自動的にrooが導入されない場合は次のようにしてrooもインストールしてください。
15
+
16
+ ```
17
+ java -jar ~/embulk.jar gem install roo
18
+ Fetching: ruby-ole-1.2.11.8.gem (100%)
19
+ Successfully installed ruby-ole-1.2.11.8
20
+ Fetching: spreadsheet-1.0.1.gem (100%)
21
+ Successfully installed spreadsheet-1.0.1
22
+ Fetching: nokogiri-1.6.6.2-java.gem (100%)
23
+ Successfully installed nokogiri-1.6.6.2-java
24
+ Fetching: rubyzip-1.1.7.gem (100%)
25
+ Successfully installed rubyzip-1.1.7
26
+ Fetching: roo-1.13.2.gem (100%)
27
+ Successfully installed roo-1.13.2
28
+ 5 gems installed
29
+ ```
30
+
31
+
32
+ 導入したパッケージは次のコマンドで確認をすることができます。(~/.embulk以下に導入されます。)
33
+
34
+ ```
35
+ java -jar ~/embulk.jar gem list
36
+
37
+ *** LOCAL GEMS ***
38
+
39
+ ffi (1.9.3 java)
40
+ jar-dependencies (0.1.2)
41
+ jruby-openssl (0.9.5 java)
42
+ json (1.8.0 java)
43
+ krypt (0.0.2)
44
+ krypt-core (0.0.2 universal-java)
45
+ krypt-provider-jdk (0.0.2)
46
+ nokogiri (1.6.6.2 java)
47
+ rake (10.1.0)
48
+ rdoc (4.0.1)
49
+ roo (1.13.2)
50
+ ruby-ole (1.2.11.8)
51
+ rubyzip (1.1.7)
52
+ spreadsheet (1.0.1)
53
+ ```
54
+
55
+ 設定ファイル
56
+
57
+
58
+ ## 設定
59
+
60
+ 設定
61
+
62
+ | 項目名 | 説明 | 未指定時 |
63
+ |----------|------------------------------|-------------|
64
+ | data_pos | データが何行目から開始するか | 1 |
65
+ | sheet | 読み込みたいシート名 | 最初のシート|
66
+ | paths | xlsxを保存したディレクトリ | |
67
+ | colums | Embulkに取り込むカラム名 | |
68
+
69
+
70
+ カラム名は次のようなパラメータがあります。
71
+
72
+
73
+ | 項目名 | 設定 |
74
+ |----------|------------------------------|
75
+ | name | カラムの名前 |
76
+ | type | 型 |
77
+
78
+ 型は、Embulkの型にあわせてください
79
+
80
+ * booealn (未テスト)
81
+ * long: 整数
82
+ * double: 浮動小数点
83
+ * string: 文字列
84
+ * timestamp: 日時
85
+
86
+ ## 設定例
87
+
88
+ シート名: "The Beatles"に次のようなデータが格納されている場合
89
+
90
+ | No | first_name | first_name | nickname | birthday |
91
+ |----|-------------|------------|----------|------------|
92
+ | 1 | John | Lennon | John | 1940/10/09 |
93
+ | 2 | Paul | McCartney | Paul | 1942/06/18 |
94
+ | 3 | George | Harrison | George | 1943/02/25 |
95
+ | 4 | Ringo | Starr | Ringo | 1940/07/07 |
96
+
97
+
98
+ 設定は次のように記述します。
99
+
100
+ ```
101
+ in:
102
+ type: roo_excel
103
+ sheet: "The Beatles"
104
+ data_pos: 2
105
+ paths: ["/path/to/beatles"]
106
+ columns:
107
+ - { name: no, type: long }
108
+ - { name: first_name, type: string }
109
+ - { name: last_name, type: string }
110
+ - { name: nick_name, type: string }
111
+ - { name: birthday, type:timestamp }
112
+ out:
113
+ type: stdout
114
+ ```
115
+
116
+ ## 実行例
117
+
118
+ ```
119
+ java -jar embulk.jar preview config.yml
120
+ java -jar embulk.jar run config.yml
121
+ ```
122
+
123
+ ## 既存の問題点
124
+
125
+ * doneはまだ動きません。
126
+ * '1:00'など時刻を記載するとExcel上はTimeになりますが良い変換方法が思いつかないので型をdoubleにして、3600.0と秒に変換してください。
127
+
128
+ ## Contributing
129
+
130
+ 1. Fork it ( https://github.com/hiroyuki-sato/embulk-plugin-input-roo-excel/fork )
131
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
132
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
133
+ 4. Push to the branch (`git push origin my-new-feature`)
134
+ 5. Create a new Pull Request
data/README.md ADDED
@@ -0,0 +1,124 @@
1
+ # embulk-plugin-input-roo-excel
2
+
3
+ [Japanese Page](README.ja.md)
4
+
5
+ This is a input plugin for [Embulk](https://github.com/embulk/embulk) to read xlsx documents.
6
+
7
+ ## Installation
8
+
9
+ ```
10
+ java -jar embulk.jar gem intall embulk-plugin-input-roo-excel
11
+ ```
12
+
13
+ You also need roo gem for read xlsx documents. If those package not install automatically, install too.
14
+
15
+ ```
16
+ java -jar ~/embulk.jar gem install roo
17
+ Fetching: ruby-ole-1.2.11.8.gem (100%)
18
+ Successfully installed ruby-ole-1.2.11.8
19
+ Fetching: spreadsheet-1.0.1.gem (100%)
20
+ Successfully installed spreadsheet-1.0.1
21
+ Fetching: nokogiri-1.6.6.2-java.gem (100%)
22
+ Successfully installed nokogiri-1.6.6.2-java
23
+ Fetching: rubyzip-1.1.7.gem (100%)
24
+ Successfully installed rubyzip-1.1.7
25
+ Fetching: roo-1.13.2.gem (100%)
26
+ Successfully installed roo-1.13.2
27
+ 5 gems installed
28
+ ```
29
+
30
+ You can check package list.
31
+
32
+ ```
33
+ java -jar ~/embulk.jar gem list
34
+
35
+ *** LOCAL GEMS ***
36
+
37
+ ffi (1.9.3 java)
38
+ jar-dependencies (0.1.2)
39
+ jruby-openssl (0.9.5 java)
40
+ json (1.8.0 java)
41
+ krypt (0.0.2)
42
+ krypt-core (0.0.2 universal-java)
43
+ krypt-provider-jdk (0.0.2)
44
+ nokogiri (1.6.6.2 java)
45
+ rake (10.1.0)
46
+ rdoc (4.0.1)
47
+ roo (1.13.2)
48
+ ruby-ole (1.2.11.8)
49
+ rubyzip (1.1.7)
50
+ spreadsheet (1.0.1)
51
+ ```
52
+
53
+ ## configuration
54
+
55
+ data example.
56
+
57
+
58
+ 設定
59
+
60
+ | key | description | default |
61
+ |----------|-----------------------------|
62
+ | data_pos | data position | 1 |
63
+ | sheet | sheet name | first sheet |
64
+ | paths | file path | [] |
65
+ | colums | column names | |
66
+
67
+ column name
68
+
69
+ | key | description |
70
+ |--------|-------------|
71
+ | name | colum name |
72
+ | type | type |
73
+
74
+ Type is one of the following value.
75
+
76
+ * booealn
77
+ * long
78
+ * double
79
+ * string
80
+ * timestamp
81
+
82
+ ## Usage
83
+
84
+ Example data. The sheet name is "The Beatles".
85
+
86
+ | No | first_name | first_name | nickname | birthday |
87
+ |----|-------------|------------|----------|------------|
88
+ | 1 | John | Lennon | John | 1940/10/09 |
89
+ | 2 | Paul | McCartney | Paul | 1942/06/18 |
90
+ | 3 | George | Harrison | George | 1943/02/25 |
91
+ | 4 | Ringo | Starr | Ringo | 1940/07/07 |
92
+
93
+ configuration file
94
+
95
+ ```
96
+ in:
97
+ type: roo_excel
98
+ sheet: "The Beatles"
99
+ data_pos: 2
100
+ paths: ["/path/to/beatles"]
101
+ columns:
102
+ - { name: no, type: long }
103
+ - { name: first_name, type: string }
104
+ - { name: last_name, type: string }
105
+ - { name: nick_name, type: string }
106
+ - { name: birthday, type:timestamp, format:"%Y/%m/%d" }
107
+ out:
108
+ type: stdout
109
+ ```
110
+
111
+ ## execution
112
+
113
+ ```
114
+ java -jar embulk.jar preview config.yml
115
+ java -jar embulk.jar run config.yml
116
+ ```
117
+
118
+ ## Contributing
119
+
120
+ 1. Fork it ( https://github.com/hiroyuki-sato/embulk-plugin-input-roo-excel/fork )
121
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
122
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
123
+ 4. Push to the branch (`git push origin my-new-feature`)
124
+ 5. Create a new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,2 @@
1
+ require "bundler/gem_tasks"
2
+
@@ -0,0 +1,23 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "embulk-plugin-input-roo-excel"
7
+ spec.version = "0.1.0"
8
+ spec.authors = ["Hiroyuki Sato"]
9
+ spec.email = ["hiroysato@gmail.com"]
10
+ spec.summary = %q{Embulk input plugin to read xlsx files}
11
+ spec.description = %q{Embulk input plugin to read xlsx files}
12
+ spec.homepage = "https://github.com/hiroyuki-sato/embulk-plugin-input-roo-excel"
13
+ spec.license = "Apache 2.0"
14
+
15
+ spec.files = `git ls-files -z`.split("\x0")
16
+ spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
17
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
18
+ spec.require_paths = ["lib"]
19
+
20
+ spec.add_development_dependency "bundler", "~> 1.7"
21
+ spec.add_development_dependency "rake", "~> 10.0"
22
+ spec.add_development_dependency "roo"
23
+ end
@@ -0,0 +1,116 @@
1
+ require 'roo'
2
+ require 'time'
3
+
4
+ module Embulk
5
+ module Plugin
6
+
7
+ class InputRooExcel < InputPlugin
8
+ # input plugin file name must be: embulk/input_<name>.rb
9
+ Plugin.register_input('roo_excel', self)
10
+
11
+ def self.transaction(config, &control)
12
+
13
+ task = {
14
+ 'columns' => config.param('columns', :array, default: []),
15
+ 'done' => config.param('done', :array, default: []),
16
+ 'sheet' => config.param('sheet', :string, default: nil),
17
+ 'data_pos' => config.param('data_pos', :integer, default: 1),
18
+ }
19
+ task['files'] = config.param('paths', :array, default: []).map{ |path|
20
+ next [] unless Dir.exists?(path)
21
+ Dir.entries(path).sort.select { |entry| entry.match(/\.xlsx\Z/) }.map{ |entry|
22
+ File.join(path,entry)
23
+ }
24
+ }.flatten
25
+
26
+ files = task['files'] - task['done']
27
+ if files.empty?
28
+ raise "no valid xlsx file found"
29
+ end
30
+
31
+ columns = []
32
+ task['columns'].each_with_index do |c,i|
33
+ columns << Column.new(i, c['name'], c['type'].to_sym)
34
+ end
35
+
36
+ resume(task, columns, files.length, &control)
37
+ end
38
+
39
+ def self.resume(task, columns, count, &control)
40
+ puts "InputRooExcel input started."
41
+ commit_reports = yield(task, columns, count)
42
+ puts "InputRooExcel input finished. Commit reports = #{commit_reports.to_json}"
43
+
44
+ next_config_diff = {}
45
+ return next_config_diff
46
+ end
47
+
48
+ def initialize(task, schema, index, page_builder)
49
+ super
50
+ @file = task['files'][index]
51
+ end
52
+
53
+ def run
54
+ puts "InputRooExcel input thread #{@index}..."
55
+
56
+ columns = @task['columns']
57
+ ncol = columns.size
58
+ data_pos = @task['data_pos']
59
+
60
+ sheet = @task['sheet']
61
+ xlsx = Roo::Excelx.new(@file)
62
+ if( sheet )
63
+ xlsx.default_sheet = sheet
64
+ else
65
+ xlsx.default_sheet = xlsx.sheets.first
66
+ end
67
+
68
+ data_pos.upto(xlsx.last_row) do |row|
69
+ data = []
70
+ 1.upto(ncol) do |col|
71
+ column = columns[col-1]
72
+ data << convert_cell(column,xlsx,row,col)
73
+ end
74
+ @page_builder.add(data)
75
+ end
76
+
77
+ @page_builder.finish # don't forget to call finish :-)
78
+
79
+ commit_report = {}
80
+ return commit_report
81
+ end
82
+
83
+ # MEMO roo celltype
84
+ # returns the type of a cell: * :float * :string, * :date * :percentage * :formula * :time * :datetime.
85
+ #
86
+ def convert_cell(column,xlsx,nrow,ncol)
87
+ d = xlsx.cell(nrow,ncol)
88
+ type = column['type'] || 'string'
89
+
90
+ case type
91
+ when 'long'
92
+ d.to_i
93
+ when 'double'
94
+ d.to_f
95
+ when 'string'
96
+ d.to_s
97
+ when 'timestamp'
98
+ convert_time(d)
99
+ else # TODO
100
+ d.to_s
101
+ end
102
+ end
103
+ def convert_time(t)
104
+ if( t.kind_of?(Date) or t.kind_of?(DateTime) )
105
+ t.to_time
106
+ elsif( t.kind_of?(Time) )
107
+ t
108
+ elsif( t.kind_of?(String) )
109
+ Time.parse(t)
110
+ else
111
+ raise ArgumentError,"Can't convert time:#{t}"
112
+ end
113
+ end
114
+ end # InputRooExcel
115
+ end # Plugin
116
+ end # Embulk
metadata ADDED
@@ -0,0 +1,95 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: embulk-plugin-input-roo-excel
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Hiroyuki Sato
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-02-07 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.7'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.7'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: roo
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ description: Embulk input plugin to read xlsx files
56
+ email:
57
+ - hiroysato@gmail.com
58
+ executables: []
59
+ extensions: []
60
+ extra_rdoc_files: []
61
+ files:
62
+ - ".gitignore"
63
+ - Gemfile
64
+ - LICENSE.txt
65
+ - README.ja.md
66
+ - README.md
67
+ - Rakefile
68
+ - embulk-plugin-input-roo-excel.gemspec
69
+ - lib/embulk/input_roo_excel.rb
70
+ homepage: https://github.com/hiroyuki-sato/embulk-plugin-input-roo-excel
71
+ licenses:
72
+ - Apache 2.0
73
+ metadata: {}
74
+ post_install_message:
75
+ rdoc_options: []
76
+ require_paths:
77
+ - lib
78
+ required_ruby_version: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ required_rubygems_version: !ruby/object:Gem::Requirement
84
+ requirements:
85
+ - - ">="
86
+ - !ruby/object:Gem::Version
87
+ version: '0'
88
+ requirements: []
89
+ rubyforge_project:
90
+ rubygems_version: 2.4.5
91
+ signing_key:
92
+ specification_version: 4
93
+ summary: Embulk input plugin to read xlsx files
94
+ test_files: []
95
+ has_rdoc: