embulk-parser-poi_excel 0.1.12 → 0.1.13

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0ce575511ee0ce2d0a9496f1d63013909d751504
4
- data.tar.gz: f79c800fd6454ba5e2511e6e675a3465c3c2c7d3
3
+ metadata.gz: 181f7ed2aa447dea8c2214aa72a24fe85d2d5975
4
+ data.tar.gz: 1eb5a1c03276531b7cc7771abb7d870e2d67d0c7
5
5
  SHA512:
6
- metadata.gz: a4ddf5f3506c56924bbe9147c6b23e40f31853356b53e91215120ef9f930e322db50a60a9c2277bc51c863cf588610e44d18cf5ec401a0d8b31d0c7ff30f4b50
7
- data.tar.gz: bf22e1256df159efd21cdf40d98ae014c8ddf1dc053848dce95a2635fa68dbac2d174833731f5ff700d814fc98a680ab666774b15cce49c24d7a94eea87d6ac1
6
+ metadata.gz: 65ed0d0fa2e27d3ea00954a87062e49d91052e5478f33b6ec175d133e0f6e6f7e42be488d7d40fc3a259f833a6abe6ab72077fa72e007f6ba848defb9ec030a6
7
+ data.tar.gz: 1522b1223a8c7ad0577b022fae899820196cb2b703633363f92a3398bd33dcfcabd5e49525516b34beac29c2e2a97a3954ad23995295fa73316c915a7ff6b31c
data/LICENSE.txt CHANGED
@@ -1,21 +1,21 @@
1
-
2
- MIT License
3
-
4
- Permission is hereby granted, free of charge, to any person obtaining
5
- a copy of this software and associated documentation files (the
6
- "Software"), to deal in the Software without restriction, including
7
- without limitation the rights to use, copy, modify, merge, publish,
8
- distribute, sublicense, and/or sell copies of the Software, and to
9
- permit persons to whom the Software is furnished to do so, subject to
10
- the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be
13
- included in all copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
- NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
19
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
20
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
21
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
1
+
2
+ MIT License
3
+
4
+ Permission is hereby granted, free of charge, to any person obtaining
5
+ a copy of this software and associated documentation files (the
6
+ "Software"), to deal in the Software without restriction, including
7
+ without limitation the rights to use, copy, modify, merge, publish,
8
+ distribute, sublicense, and/or sell copies of the Software, and to
9
+ permit persons to whom the Software is furnished to do so, subject to
10
+ the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be
13
+ included in all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
19
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
20
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
21
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md CHANGED
@@ -1,245 +1,247 @@
1
- # Apache POI Excel parser plugin for Embulk
2
-
3
- Parses Microsoft Excel files(xls, xlsx) read by other file input plugins.
4
- This plugin uses Apache POI.
5
-
6
- ## Overview
7
-
8
- * **Plugin type**: parser
9
- * **Guess supported**: no
10
-
11
-
12
- ## Example
13
-
14
- ```yaml
15
- in:
16
- type: any file input plugin type
17
- parser:
18
- type: poi_excel
19
- sheets: ["DQ10-orb"]
20
- skip_header_lines: 1 # first row is header.
21
- columns:
22
- - {name: row, type: long, value: row_number}
23
- - {name: get_date, type: timestamp, cell_column: A, value: cell_value}
24
- - {name: orb_type, type: string}
25
- - {name: orb_name, type: string}
26
- - {name: orb_shape, type: long}
27
- - {name: drop_monster_name, type: string}
28
- ```
29
-
30
- if omit **value**, specified `cell_value`.
31
- if omit **cell_column** when **value** is `cell_value`, specified next column.
32
-
33
-
34
- ## Configuration
35
-
36
- * **sheets**: sheet name. can use wildcards `*`, `?`. (list of string, required)
37
- * **record_type**: record type. (`row`, `column` or `sheet`. default: `row`)
38
- * **skip_header_lines**: skip rows when **record_type**=`row` (skip columns when **record_type**=`column`). ignored when **record_type**=`sheet`. (integer, default: `0`)
39
- * **columns**: column definition. see below. (hash, required)
40
- * **sheet_options**: sheet option. see below. (hash, default: null)
41
-
42
- ### columns
43
-
44
- * **name**: Embulk column name. (string, required)
45
- * **type**: Embulk column type. (string, required)
46
- * **value**: value type. see below. (string, default: `cell_value`)
47
- * **column_number**: same as **cell_column**.
48
- * **cell_column**: Excel column number. see below. (string, default: next column when **record_type**=`row`)
49
- * **cell_row**: Excel row number. see below. (integer, default: next row when **record_type**=`column`)
50
- * **cell_address**: Excel cell address such as `A1`, `Sheet1!B3`. (string, not required)
51
- * **numeric_format**: format of numeric(double) to string such as `%4.2f`. (default: Java's Double.toString())
52
- * **attribute_name**: use with value `cell_style`, `cell_font`, etc. see below. (list of string)
53
- * **on_cell_error**: processing method of Cell error. see below. (string, default: `constant`)
54
- * **formula_handling**: processing method of formula. see below. (`evaluate` or `cashed_value`. default: `evaluate`)
55
- * **on_evaluate_error**: processing method of evaluate formula error. see below. (string, default: `exception`)
56
- * **formula_replace**: replace formula before evaluate. see below.
57
- * **on_convert_error**: processing method of convert error. see below. (string, default: `exception`)
58
- * **search_merged_cell**: search merged cell when cell is BLANK. (`none`, `linear_search`, `tree_search` or `hash_search`, default: `hash_search`)
59
-
60
- ### value
61
-
62
- * `cell_value`: value in cell.
63
- * `cell_formula`: formula in cell. (if cell is not formula, same `cell_value`.)
64
- * `cell_style`: all cell style attributes. returned json string. see **attribute_name**. (**type** required `string`)
65
- * `cell_font`: all cell font attributes. returned json string. see **attribute_name**. (**type** required `string`)
66
- * `cell_comment`: all cell comment attributes. returned json string. see **attribute_name**. (**type** required `string`)
67
- * `cell_type`: cell type. returned Cell.getCellType() of POI.
68
- * `cell_cached_type`: cell cached formula result type. returned Cell.getCachedFormulaResultType() of POI when CellType==FORMULA, otherwise same as `cell_type` (returned Cell.getCellType()).
69
- * `sheet_name`: sheet name.
70
- * `row_number`: row number(1 origin).
71
- * `column_number`: column number(1 origin).
72
- * `constant`: constant value.
73
-
74
- * `constant.`*value*: specified value.
75
- * `constant`: null.
76
-
77
- ### cell_column
78
-
79
- Basically used for **record_type**=`row`.
80
-
81
- * `A`,`B`,`C`,...: column number of "A1 format".
82
- * *number*: column number (1 origin).
83
- * `+`: next column.
84
- * `+`*name*: next column of name.
85
- * `+`*number*: number next column.
86
- * `-`: previous column.
87
- * `-`*name*: previous column of name.
88
- * `-`*number*: number previous column.
89
- * `=`: same column.
90
- * `=`*name*: same column of name.
91
-
92
- ### cell_row
93
-
94
- Basically used for **record_type**=`column`.
95
-
96
- * *number*: row number (1 origin).
97
-
98
- ### attribute_name
99
-
100
- **value**が`cell_style`, `cell_font`, `cell_comment`のとき、デフォルトでは、全属性を取得してJSON文字列に変換します。
101
- (JSON文字列を返すので、**type**は`string`である必要があります)
102
-
103
- ```yaml
104
- columns:
105
- - {name: foo, type: string, cell_column: A, value: cell_style}
106
- ```
107
-
108
-
109
- attribute_nameを指定することで、指定された属性だけを取得してJSON文字列に変換します。
110
-
111
- * **attribute_name**: attribute names. (list of string)
112
-
113
- ```yaml
114
- columns:
115
- - {name: foo, type: string, cell_column: A, value: cell_style, attribute_name: [border_top, border_bottom, border_left, border_right]}
116
- ```
117
-
118
-
119
- また、`cell_style`や`cell_font`の直後にピリオドを付けて属性名を指定することにより、その属性だけを取得することが出来ます。
120
- この場合はJSON文字列にはならず、属性の型に合う**type**を指定する必要があります。
121
-
122
- ```yaml
123
- columns:
124
- - {name: foo, type: long, value: cell_style.border}
125
- - {name: bar, type: long, value: cell_font.color}
126
- ```
127
-
128
- なお、`cell_style`や`cell_font`では、**cell_column**を省略した場合は直前と同じ列を対象とします。
129
- (`cell_value`では、**cell_column**を省略すると次の列に移る)
130
-
131
-
132
- ### on_cell_error
133
-
134
- Processing method of Cell error (`#DIV/0!`, `#REF!`, etc).
135
-
136
- ```yaml
137
- columns:
138
- - {name: foo, type: string, cell_column: A, value: cell_value, on_cell_error: error_code}
139
- ```
140
-
141
- * `constant`: set null. (default)
142
- * `constant.`*value*: set specified value.
143
- * `error_code`: set error code.
144
- * `exception`: throw exception.
145
-
146
-
147
- ### formula_handling
148
-
149
- Processing method of formula.
150
-
151
- ```yaml
152
- columns:
153
- - {name: foo, type: string, cell_column: A, value: cell_value, formula_handling: cashed_value}
154
- ```
155
-
156
- * `evaluate`: evaluate formula. (default)
157
- * `cashed_value`: cashed value in cell.
158
-
159
-
160
- ### on_evaluate_error
161
-
162
- Processing method of evaluate formula error.
163
-
164
- ```yaml
165
- columns:
166
- - {name: foo, type: string, cell_column: A, value: cell_value, on_evaluate_error: constant}
167
- ```
168
-
169
- * `constant`: set null.
170
- * `constant.`*value*: set specified value.
171
- * `exception`: throw exception. (default)
172
-
173
-
174
- ### formula_replace
175
-
176
- Replace formula before evaluate.
177
-
178
- ```yaml
179
- columns:
180
- - {name: foo, type: string, cell_column: A, value: cell_value, formula_replace: [{regex: aaa, to: "A${row}"}, {regex: bbb, to: "B${row}"}]}
181
- ```
182
-
183
- `${row}` is replaced with the current row number.
184
- `${column}` is replaced with the current column string.
185
-
186
-
187
- ### on_convert_error
188
-
189
- Processing method of convert error. ex) Excel boolean to Embulk timestamp
190
-
191
- ```yaml
192
- columns:
193
- - {name: foo, type: timestamp, format: "%Y/%m/%d", cell_column: A, value: cell_value, on_convert_error: constant.9999/12/31}
194
- ```
195
-
196
- * `constant`: set null.
197
- * `constant.`*value*: set specified value.
198
- * `exception`: throw exception. (default)
199
-
200
-
201
- ### sheet_options
202
-
203
- Options of individual sheet.
204
-
205
- ```yaml
206
- parser:
207
- type: poi_excel
208
- sheets: [Sheet1, Sheet2]
209
- columns:
210
- - {name: date, type: timestamp, cell_column: A}
211
- - {name: foo, type: string}
212
- - {name: bar, type: long}
213
- sheet_options:
214
- Sheet1:
215
- skip_header_lines: 1
216
- columns:
217
- foo: {cell_column: B}
218
- bar: {cell_column: C}
219
- Sheet2:
220
- skip_header_lines: 0
221
- columns:
222
- foo: {cell_column: D}
223
- bar: {value: constant.0}
224
- ```
225
-
226
- **sheet_options** is map of sheet name.
227
- Map values are **skip_header_lines**, **columns**.
228
-
229
- **columns** is map of column name.
230
- Map values are same **columns** in **parser** (excluding `name`, `type`).
231
-
232
-
233
- ## Install
234
-
235
- ```
236
- $ embulk gem install embulk-parser-poi_excel
237
- ```
238
-
239
-
240
- ## Build
241
-
242
- ```
243
- $ ./gradlew test
244
- $ ./gradlew package
245
- ```
1
+ # Apache POI Excel parser plugin for Embulk
2
+
3
+ Parses Microsoft Excel files(xls, xlsx) read by other file input plugins.
4
+ This plugin uses Apache POI.
5
+
6
+ ## Overview
7
+
8
+ * **Plugin type**: parser
9
+ * **Guess supported**: no
10
+ * Embulk 0.9 or earlier (refer to https://github.com/hishidama/embulk-parser-excel-poi for 0.10 and later)
11
+
12
+
13
+ ## Example
14
+
15
+ ```yaml
16
+ in:
17
+ type: any file input plugin type
18
+ parser:
19
+ type: poi_excel
20
+ sheets: ["DQ10-orb"]
21
+ skip_header_lines: 1 # first row is header.
22
+ columns:
23
+ - {name: row, type: long, value: row_number}
24
+ - {name: get_date, type: timestamp, cell_column: A, value: cell_value}
25
+ - {name: orb_type, type: string}
26
+ - {name: orb_name, type: string}
27
+ - {name: orb_shape, type: long}
28
+ - {name: drop_monster_name, type: string}
29
+ ```
30
+
31
+ if omit **value**, specified `cell_value`.
32
+ if omit **cell_column** when **value** is `cell_value`, specified next column.
33
+
34
+
35
+ ## Configuration
36
+
37
+ * **sheets**: sheet name. can use wildcards `*`, `?`. (list of string, required)
38
+ * **record_type**: record type. (`row`, `column` or `sheet`. default: `row`)
39
+ * **skip_header_lines**: skip rows when **record_type**=`row` (skip columns when **record_type**=`column`). ignored when **record_type**=`sheet`. (integer, default: `0`)
40
+ * **columns**: column definition. see below. (hash, required)
41
+ * **sheet_options**: sheet option. see below. (hash, default: null)
42
+
43
+ ### columns
44
+
45
+ * **name**: Embulk column name. (string, required)
46
+ * **type**: Embulk column type. (string, required)
47
+ * **value**: value type. see below. (string, default: `cell_value`)
48
+ * **column_number**: same as **cell_column**.
49
+ * **cell_column**: Excel column number. see below. (string, default: next column when **record_type**=`row`)
50
+ * **cell_row**: Excel row number. see below. (integer, default: next row when **record_type**=`column`)
51
+ * **cell_address**: Excel cell address such as `A1`, `Sheet1!B3`. (string, not required)
52
+ * **numeric_format**: format of numeric(double) to string such as `%4.2f`. (default: Java's Double.toString())
53
+ * **attribute_name**: use with value `cell_style`, `cell_font`, etc. see below. (list of string)
54
+ * **on_cell_error**: processing method of Cell error. see below. (string, default: `constant`)
55
+ * **formula_handling**: processing method of formula. see below. (`evaluate` or `cashed_value`. default: `evaluate`)
56
+ * **on_evaluate_error**: processing method of evaluate formula error. see below. (string, default: `exception`)
57
+ * **formula_replace**: replace formula before evaluate. see below.
58
+ * **on_convert_error**: processing method of convert error. see below. (string, default: `exception`)
59
+ * **search_merged_cell**: search merged cell when cell is BLANK. (`none`, `linear_search`, `tree_search` or `hash_search`, default: `hash_search`)
60
+
61
+ ### value
62
+
63
+ * `cell_value`: value in cell.
64
+ * `cell_formula`: formula in cell. (if cell is not formula, same `cell_value`.)
65
+ * `cell_style`: all cell style attributes. returned json string. see **attribute_name**. (**type** required `string`)
66
+ * `cell_font`: all cell font attributes. returned json string. see **attribute_name**. (**type** required `string`)
67
+ * `cell_comment`: all cell comment attributes. returned json string. see **attribute_name**. (**type** required `string`)
68
+ * `cell_type`: cell type. returned Cell.getCellType() of POI.
69
+ * `cell_cached_type`: cell cached formula result type. returned Cell.getCachedFormulaResultType() of POI when CellType==FORMULA, otherwise same as `cell_type` (returned Cell.getCellType()).
70
+ * `file_name`: excel file name.
71
+ * `sheet_name`: sheet name.
72
+ * `row_number`: row number(1 origin).
73
+ * `column_number`: column number(1 origin).
74
+ * `constant`: constant value.
75
+
76
+ * `constant.`*value*: specified value.
77
+ * `constant`: null.
78
+
79
+ ### cell_column
80
+
81
+ Basically used for **record_type**=`row`.
82
+
83
+ * `A`,`B`,`C`,...: column number of "A1 format".
84
+ * *number*: column number (1 origin).
85
+ * `+`: next column.
86
+ * `+`*name*: next column of name.
87
+ * `+`*number*: number next column.
88
+ * `-`: previous column.
89
+ * `-`*name*: previous column of name.
90
+ * `-`*number*: number previous column.
91
+ * `=`: same column.
92
+ * `=`*name*: same column of name.
93
+
94
+ ### cell_row
95
+
96
+ Basically used for **record_type**=`column`.
97
+
98
+ * *number*: row number (1 origin).
99
+
100
+ ### attribute_name
101
+
102
+ **value**が`cell_style`, `cell_font`, `cell_comment`のとき、デフォルトでは、全属性を取得してJSON文字列に変換します。
103
+ (JSON文字列を返すので、**type**は`string`である必要があります)
104
+
105
+ ```yaml
106
+ columns:
107
+ - {name: foo, type: string, cell_column: A, value: cell_style}
108
+ ```
109
+
110
+
111
+ attribute_nameを指定することで、指定された属性だけを取得してJSON文字列に変換します。
112
+
113
+ * **attribute_name**: attribute names. (list of string)
114
+
115
+ ```yaml
116
+ columns:
117
+ - {name: foo, type: string, cell_column: A, value: cell_style, attribute_name: [border_top, border_bottom, border_left, border_right]}
118
+ ```
119
+
120
+
121
+ また、`cell_style`や`cell_font`の直後にピリオドを付けて属性名を指定することにより、その属性だけを取得することが出来ます。
122
+ この場合はJSON文字列にはならず、属性の型に合う**type**を指定する必要があります。
123
+
124
+ ```yaml
125
+ columns:
126
+ - {name: foo, type: long, value: cell_style.border}
127
+ - {name: bar, type: long, value: cell_font.color}
128
+ ```
129
+
130
+ なお、`cell_style`や`cell_font`では、**cell_column**を省略した場合は直前と同じ列を対象とします。
131
+ (`cell_value`では、**cell_column**を省略すると次の列に移る)
132
+
133
+
134
+ ### on_cell_error
135
+
136
+ Processing method of Cell error (`#DIV/0!`, `#REF!`, etc).
137
+
138
+ ```yaml
139
+ columns:
140
+ - {name: foo, type: string, cell_column: A, value: cell_value, on_cell_error: error_code}
141
+ ```
142
+
143
+ * `constant`: set null. (default)
144
+ * `constant.`*value*: set specified value.
145
+ * `error_code`: set error code.
146
+ * `exception`: throw exception.
147
+
148
+
149
+ ### formula_handling
150
+
151
+ Processing method of formula.
152
+
153
+ ```yaml
154
+ columns:
155
+ - {name: foo, type: string, cell_column: A, value: cell_value, formula_handling: cashed_value}
156
+ ```
157
+
158
+ * `evaluate`: evaluate formula. (default)
159
+ * `cashed_value`: cashed value in cell.
160
+
161
+
162
+ ### on_evaluate_error
163
+
164
+ Processing method of evaluate formula error.
165
+
166
+ ```yaml
167
+ columns:
168
+ - {name: foo, type: string, cell_column: A, value: cell_value, on_evaluate_error: constant}
169
+ ```
170
+
171
+ * `constant`: set null.
172
+ * `constant.`*value*: set specified value.
173
+ * `exception`: throw exception. (default)
174
+
175
+
176
+ ### formula_replace
177
+
178
+ Replace formula before evaluate.
179
+
180
+ ```yaml
181
+ columns:
182
+ - {name: foo, type: string, cell_column: A, value: cell_value, formula_replace: [{regex: aaa, to: "A${row}"}, {regex: bbb, to: "B${row}"}]}
183
+ ```
184
+
185
+ `${row}` is replaced with the current row number.
186
+ `${column}` is replaced with the current column string.
187
+
188
+
189
+ ### on_convert_error
190
+
191
+ Processing method of convert error. ex) Excel boolean to Embulk timestamp
192
+
193
+ ```yaml
194
+ columns:
195
+ - {name: foo, type: timestamp, format: "%Y/%m/%d", cell_column: A, value: cell_value, on_convert_error: constant.9999/12/31}
196
+ ```
197
+
198
+ * `constant`: set null.
199
+ * `constant.`*value*: set specified value.
200
+ * `exception`: throw exception. (default)
201
+
202
+
203
+ ### sheet_options
204
+
205
+ Options of individual sheet.
206
+
207
+ ```yaml
208
+ parser:
209
+ type: poi_excel
210
+ sheets: [Sheet1, Sheet2]
211
+ columns:
212
+ - {name: date, type: timestamp, cell_column: A}
213
+ - {name: foo, type: string}
214
+ - {name: bar, type: long}
215
+ sheet_options:
216
+ Sheet1:
217
+ skip_header_lines: 1
218
+ columns:
219
+ foo: {cell_column: B}
220
+ bar: {cell_column: C}
221
+ Sheet2:
222
+ skip_header_lines: 0
223
+ columns:
224
+ foo: {cell_column: D}
225
+ bar: {value: constant.0}
226
+ ```
227
+
228
+ **sheet_options** is map of sheet name.
229
+ Map values are **skip_header_lines**, **columns**.
230
+
231
+ **columns** is map of column name.
232
+ Map values are same **columns** in **parser** (excluding `name`, `type`).
233
+
234
+
235
+ ## Install
236
+
237
+ ```
238
+ $ embulk gem install embulk-parser-poi_excel
239
+ ```
240
+
241
+
242
+ ## Build
243
+
244
+ ```
245
+ $ ./gradlew test
246
+ $ ./gradlew package
247
+ ```