embulk-plugin-input-roo-excel 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 39175d88b536db9345ddc1f516ab03dd1660a1db
4
+ data.tar.gz: 7745e6d72fd68c9f173378bfd98d5c2228e2aa71
5
+ SHA512:
6
+ metadata.gz: b42ca65857ec1dd48977bc63b78d5c5934361c6fbe5cb8e97d42e4d9e65a18d4c33e38e6350ff396a5d2111f601c8eb29b31368a7b768d72fd6cd482db9639b8
7
+ data.tar.gz: 9a8f84401eaa961656fade25a3d7b33befaa22d5e4ee996f0c115e5aa21099310903a5de7d6e658d873db9f83df795ed914ca2394e05fe22d425908a870e89bc
data/.gitignore ADDED
@@ -0,0 +1,14 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ *.bundle
11
+ *.so
12
+ *.o
13
+ *.a
14
+ mkmf.log
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in embulk-plugin-input-roo-excel.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,13 @@
1
+ Copyright (c) 2015 Hiroyuki Sato
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
data/README.ja.md ADDED
@@ -0,0 +1,134 @@
1
+ # embulk-plugin-input-roo-excel
2
+
3
+ [embulk-plugin-input-roo-excel](https://github.com/hiroyuki-sato/embulk-plugin-input-roo-excel)は、[Embulk](https://github.com/embulk/embulk)で、xlsxファイルを読込みするための入力プラグインです。
4
+
5
+ ## インストール方法
6
+
7
+ Embulkのgemインストール方法にならって次のようにしてパッケージを導入します。
8
+
9
+
10
+ ```
11
+ java -jar embulk.jar gem intall embulk-plugin-input-roo-excel
12
+ ```
13
+
14
+ また本プラグインは、xlsxファイルの読込みにrooを利用しているので、自動的にrooが導入されない場合は次のようにしてrooもインストールしてください。
15
+
16
+ ```
17
+ java -jar ~/embulk.jar gem install roo
18
+ Fetching: ruby-ole-1.2.11.8.gem (100%)
19
+ Successfully installed ruby-ole-1.2.11.8
20
+ Fetching: spreadsheet-1.0.1.gem (100%)
21
+ Successfully installed spreadsheet-1.0.1
22
+ Fetching: nokogiri-1.6.6.2-java.gem (100%)
23
+ Successfully installed nokogiri-1.6.6.2-java
24
+ Fetching: rubyzip-1.1.7.gem (100%)
25
+ Successfully installed rubyzip-1.1.7
26
+ Fetching: roo-1.13.2.gem (100%)
27
+ Successfully installed roo-1.13.2
28
+ 5 gems installed
29
+ ```
30
+
31
+
32
+ 導入したパッケージは次のコマンドで確認をすることができます。(~/.embulk以下に導入されます。)
33
+
34
+ ```
35
+ java -jar ~/embulk.jar gem list
36
+
37
+ *** LOCAL GEMS ***
38
+
39
+ ffi (1.9.3 java)
40
+ jar-dependencies (0.1.2)
41
+ jruby-openssl (0.9.5 java)
42
+ json (1.8.0 java)
43
+ krypt (0.0.2)
44
+ krypt-core (0.0.2 universal-java)
45
+ krypt-provider-jdk (0.0.2)
46
+ nokogiri (1.6.6.2 java)
47
+ rake (10.1.0)
48
+ rdoc (4.0.1)
49
+ roo (1.13.2)
50
+ ruby-ole (1.2.11.8)
51
+ rubyzip (1.1.7)
52
+ spreadsheet (1.0.1)
53
+ ```
54
+
55
+ 設定ファイル
56
+
57
+
58
+ ## 設定
59
+
60
+ 設定
61
+
62
+ | 項目名 | 説明 | 未指定時 |
63
+ |----------|------------------------------|-------------|
64
+ | data_pos | データが何行目から開始するか | 1 |
65
+ | sheet | 読み込みたいシート名 | 最初のシート|
66
+ | paths | xlsxを保存したディレクトリ | |
67
+ | colums | Embulkに取り込むカラム名 | |
68
+
69
+
70
+ カラム名は次のようなパラメータがあります。
71
+
72
+
73
+ | 項目名 | 設定 |
74
+ |----------|------------------------------|
75
+ | name | カラムの名前 |
76
+ | type | 型 |
77
+
78
+ 型は、Embulkの型にあわせてください
79
+
80
+ * booealn (未テスト)
81
+ * long: 整数
82
+ * double: 浮動小数点
83
+ * string: 文字列
84
+ * timestamp: 日時
85
+
86
+ ## 設定例
87
+
88
+ シート名: "The Beatles"に次のようなデータが格納されている場合
89
+
90
+ | No | first_name | first_name | nickname | birthday |
91
+ |----|-------------|------------|----------|------------|
92
+ | 1 | John | Lennon | John | 1940/10/09 |
93
+ | 2 | Paul | McCartney | Paul | 1942/06/18 |
94
+ | 3 | George | Harrison | George | 1943/02/25 |
95
+ | 4 | Ringo | Starr | Ringo | 1940/07/07 |
96
+
97
+
98
+ 設定は次のように記述します。
99
+
100
+ ```
101
+ in:
102
+ type: roo_excel
103
+ sheet: "The Beatles"
104
+ data_pos: 2
105
+ paths: ["/path/to/beatles"]
106
+ columns:
107
+ - { name: no, type: long }
108
+ - { name: first_name, type: string }
109
+ - { name: last_name, type: string }
110
+ - { name: nick_name, type: string }
111
+ - { name: birthday, type:timestamp }
112
+ out:
113
+ type: stdout
114
+ ```
115
+
116
+ ## 実行例
117
+
118
+ ```
119
+ java -jar embulk.jar preview config.yml
120
+ java -jar embulk.jar run config.yml
121
+ ```
122
+
123
+ ## 既存の問題点
124
+
125
+ * doneはまだ動きません。
126
+ * '1:00'など時刻を記載するとExcel上はTimeになりますが良い変換方法が思いつかないので型をdoubleにして、3600.0と秒に変換してください。
127
+
128
+ ## Contributing
129
+
130
+ 1. Fork it ( https://github.com/hiroyuki-sato/embulk-plugin-input-roo-excel/fork )
131
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
132
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
133
+ 4. Push to the branch (`git push origin my-new-feature`)
134
+ 5. Create a new Pull Request
data/README.md ADDED
@@ -0,0 +1,124 @@
1
+ # embulk-plugin-input-roo-excel
2
+
3
+ [Japanese Page](README.ja.md)
4
+
5
+ This is a input plugin for [Embulk](https://github.com/embulk/embulk) to read xlsx documents.
6
+
7
+ ## Installation
8
+
9
+ ```
10
+ java -jar embulk.jar gem intall embulk-plugin-input-roo-excel
11
+ ```
12
+
13
+ You also need roo gem for read xlsx documents. If those package not install automatically, install too.
14
+
15
+ ```
16
+ java -jar ~/embulk.jar gem install roo
17
+ Fetching: ruby-ole-1.2.11.8.gem (100%)
18
+ Successfully installed ruby-ole-1.2.11.8
19
+ Fetching: spreadsheet-1.0.1.gem (100%)
20
+ Successfully installed spreadsheet-1.0.1
21
+ Fetching: nokogiri-1.6.6.2-java.gem (100%)
22
+ Successfully installed nokogiri-1.6.6.2-java
23
+ Fetching: rubyzip-1.1.7.gem (100%)
24
+ Successfully installed rubyzip-1.1.7
25
+ Fetching: roo-1.13.2.gem (100%)
26
+ Successfully installed roo-1.13.2
27
+ 5 gems installed
28
+ ```
29
+
30
+ You can check package list.
31
+
32
+ ```
33
+ java -jar ~/embulk.jar gem list
34
+
35
+ *** LOCAL GEMS ***
36
+
37
+ ffi (1.9.3 java)
38
+ jar-dependencies (0.1.2)
39
+ jruby-openssl (0.9.5 java)
40
+ json (1.8.0 java)
41
+ krypt (0.0.2)
42
+ krypt-core (0.0.2 universal-java)
43
+ krypt-provider-jdk (0.0.2)
44
+ nokogiri (1.6.6.2 java)
45
+ rake (10.1.0)
46
+ rdoc (4.0.1)
47
+ roo (1.13.2)
48
+ ruby-ole (1.2.11.8)
49
+ rubyzip (1.1.7)
50
+ spreadsheet (1.0.1)
51
+ ```
52
+
53
+ ## configuration
54
+
55
+ data example.
56
+
57
+
58
+ 設定
59
+
60
+ | key | description | default |
61
+ |----------|-----------------------------|
62
+ | data_pos | data position | 1 |
63
+ | sheet | sheet name | first sheet |
64
+ | paths | file path | [] |
65
+ | colums | column names | |
66
+
67
+ column name
68
+
69
+ | key | description |
70
+ |--------|-------------|
71
+ | name | colum name |
72
+ | type | type |
73
+
74
+ Type is one of the following value.
75
+
76
+ * booealn
77
+ * long
78
+ * double
79
+ * string
80
+ * timestamp
81
+
82
+ ## Usage
83
+
84
+ Example data. The sheet name is "The Beatles".
85
+
86
+ | No | first_name | first_name | nickname | birthday |
87
+ |----|-------------|------------|----------|------------|
88
+ | 1 | John | Lennon | John | 1940/10/09 |
89
+ | 2 | Paul | McCartney | Paul | 1942/06/18 |
90
+ | 3 | George | Harrison | George | 1943/02/25 |
91
+ | 4 | Ringo | Starr | Ringo | 1940/07/07 |
92
+
93
+ configuration file
94
+
95
+ ```
96
+ in:
97
+ type: roo_excel
98
+ sheet: "The Beatles"
99
+ data_pos: 2
100
+ paths: ["/path/to/beatles"]
101
+ columns:
102
+ - { name: no, type: long }
103
+ - { name: first_name, type: string }
104
+ - { name: last_name, type: string }
105
+ - { name: nick_name, type: string }
106
+ - { name: birthday, type:timestamp, format:"%Y/%m/%d" }
107
+ out:
108
+ type: stdout
109
+ ```
110
+
111
+ ## execution
112
+
113
+ ```
114
+ java -jar embulk.jar preview config.yml
115
+ java -jar embulk.jar run config.yml
116
+ ```
117
+
118
+ ## Contributing
119
+
120
+ 1. Fork it ( https://github.com/hiroyuki-sato/embulk-plugin-input-roo-excel/fork )
121
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
122
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
123
+ 4. Push to the branch (`git push origin my-new-feature`)
124
+ 5. Create a new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,2 @@
1
+ require "bundler/gem_tasks"
2
+
@@ -0,0 +1,23 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "embulk-plugin-input-roo-excel"
7
+ spec.version = "0.1.0"
8
+ spec.authors = ["Hiroyuki Sato"]
9
+ spec.email = ["hiroysato@gmail.com"]
10
+ spec.summary = %q{Embulk input plugin to read xlsx files}
11
+ spec.description = %q{Embulk input plugin to read xlsx files}
12
+ spec.homepage = "https://github.com/hiroyuki-sato/embulk-plugin-input-roo-excel"
13
+ spec.license = "Apache 2.0"
14
+
15
+ spec.files = `git ls-files -z`.split("\x0")
16
+ spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
17
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
18
+ spec.require_paths = ["lib"]
19
+
20
+ spec.add_development_dependency "bundler", "~> 1.7"
21
+ spec.add_development_dependency "rake", "~> 10.0"
22
+ spec.add_development_dependency "roo"
23
+ end
@@ -0,0 +1,116 @@
1
+ require 'roo'
2
+ require 'time'
3
+
4
+ module Embulk
5
+ module Plugin
6
+
7
+ class InputRooExcel < InputPlugin
8
+ # input plugin file name must be: embulk/input_<name>.rb
9
+ Plugin.register_input('roo_excel', self)
10
+
11
+ def self.transaction(config, &control)
12
+
13
+ task = {
14
+ 'columns' => config.param('columns', :array, default: []),
15
+ 'done' => config.param('done', :array, default: []),
16
+ 'sheet' => config.param('sheet', :string, default: nil),
17
+ 'data_pos' => config.param('data_pos', :integer, default: 1),
18
+ }
19
+ task['files'] = config.param('paths', :array, default: []).map{ |path|
20
+ next [] unless Dir.exists?(path)
21
+ Dir.entries(path).sort.select { |entry| entry.match(/\.xlsx\Z/) }.map{ |entry|
22
+ File.join(path,entry)
23
+ }
24
+ }.flatten
25
+
26
+ files = task['files'] - task['done']
27
+ if files.empty?
28
+ raise "no valid xlsx file found"
29
+ end
30
+
31
+ columns = []
32
+ task['columns'].each_with_index do |c,i|
33
+ columns << Column.new(i, c['name'], c['type'].to_sym)
34
+ end
35
+
36
+ resume(task, columns, files.length, &control)
37
+ end
38
+
39
+ def self.resume(task, columns, count, &control)
40
+ puts "InputRooExcel input started."
41
+ commit_reports = yield(task, columns, count)
42
+ puts "InputRooExcel input finished. Commit reports = #{commit_reports.to_json}"
43
+
44
+ next_config_diff = {}
45
+ return next_config_diff
46
+ end
47
+
48
+ def initialize(task, schema, index, page_builder)
49
+ super
50
+ @file = task['files'][index]
51
+ end
52
+
53
+ def run
54
+ puts "InputRooExcel input thread #{@index}..."
55
+
56
+ columns = @task['columns']
57
+ ncol = columns.size
58
+ data_pos = @task['data_pos']
59
+
60
+ sheet = @task['sheet']
61
+ xlsx = Roo::Excelx.new(@file)
62
+ if( sheet )
63
+ xlsx.default_sheet = sheet
64
+ else
65
+ xlsx.default_sheet = xlsx.sheets.first
66
+ end
67
+
68
+ data_pos.upto(xlsx.last_row) do |row|
69
+ data = []
70
+ 1.upto(ncol) do |col|
71
+ column = columns[col-1]
72
+ data << convert_cell(column,xlsx,row,col)
73
+ end
74
+ @page_builder.add(data)
75
+ end
76
+
77
+ @page_builder.finish # don't forget to call finish :-)
78
+
79
+ commit_report = {}
80
+ return commit_report
81
+ end
82
+
83
+ # MEMO roo celltype
84
+ # returns the type of a cell: * :float * :string, * :date * :percentage * :formula * :time * :datetime.
85
+ #
86
+ def convert_cell(column,xlsx,nrow,ncol)
87
+ d = xlsx.cell(nrow,ncol)
88
+ type = column['type'] || 'string'
89
+
90
+ case type
91
+ when 'long'
92
+ d.to_i
93
+ when 'double'
94
+ d.to_f
95
+ when 'string'
96
+ d.to_s
97
+ when 'timestamp'
98
+ convert_time(d)
99
+ else # TODO
100
+ d.to_s
101
+ end
102
+ end
103
+ def convert_time(t)
104
+ if( t.kind_of?(Date) or t.kind_of?(DateTime) )
105
+ t.to_time
106
+ elsif( t.kind_of?(Time) )
107
+ t
108
+ elsif( t.kind_of?(String) )
109
+ Time.parse(t)
110
+ else
111
+ raise ArgumentError,"Can't convert time:#{t}"
112
+ end
113
+ end
114
+ end # InputRooExcel
115
+ end # Plugin
116
+ end # Embulk
metadata ADDED
@@ -0,0 +1,95 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: embulk-plugin-input-roo-excel
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Hiroyuki Sato
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-02-07 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.7'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.7'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: roo
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ description: Embulk input plugin to read xlsx files
56
+ email:
57
+ - hiroysato@gmail.com
58
+ executables: []
59
+ extensions: []
60
+ extra_rdoc_files: []
61
+ files:
62
+ - ".gitignore"
63
+ - Gemfile
64
+ - LICENSE.txt
65
+ - README.ja.md
66
+ - README.md
67
+ - Rakefile
68
+ - embulk-plugin-input-roo-excel.gemspec
69
+ - lib/embulk/input_roo_excel.rb
70
+ homepage: https://github.com/hiroyuki-sato/embulk-plugin-input-roo-excel
71
+ licenses:
72
+ - Apache 2.0
73
+ metadata: {}
74
+ post_install_message:
75
+ rdoc_options: []
76
+ require_paths:
77
+ - lib
78
+ required_ruby_version: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ required_rubygems_version: !ruby/object:Gem::Requirement
84
+ requirements:
85
+ - - ">="
86
+ - !ruby/object:Gem::Version
87
+ version: '0'
88
+ requirements: []
89
+ rubyforge_project:
90
+ rubygems_version: 2.4.5
91
+ signing_key:
92
+ specification_version: 4
93
+ summary: Embulk input plugin to read xlsx files
94
+ test_files: []
95
+ has_rdoc: