read-excel-file 5.8.7 → 6.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +37 -0
- package/README.md +223 -271
- package/bundle/read-excel-file.min.js +1 -1
- package/bundle/read-excel-file.min.js.map +1 -1
- package/commonjs/read/dropEmptyRows.js +3 -3
- package/commonjs/read/dropEmptyRows.js.map +1 -1
- package/commonjs/read/dropEmptyRows.test.js.map +1 -1
- package/commonjs/read/getData.js +18 -8
- package/commonjs/read/getData.js.map +1 -1
- package/commonjs/read/readXlsx.js +2 -2
- package/commonjs/read/readXlsx.js.map +1 -1
- package/commonjs/read/readXlsxFileContents.js +8 -9
- package/commonjs/read/readXlsxFileContents.js.map +1 -1
- package/commonjs/read/readXlsxFileNode.test.js.map +1 -1
- package/commonjs/read/schema/mapToObjects.js +100 -38
- package/commonjs/read/schema/mapToObjects.js.map +1 -1
- package/commonjs/read/schema/mapToObjects.test.js.map +1 -1
- package/commonjs/read/unpackXlsxFileNode.js +14 -3
- package/commonjs/read/unpackXlsxFileNode.js.map +1 -1
- package/index.d.ts +0 -2
- package/modules/read/dropEmptyRows.js +3 -3
- package/modules/read/dropEmptyRows.js.map +1 -1
- package/modules/read/dropEmptyRows.test.js.map +1 -1
- package/modules/read/getData.js +18 -8
- package/modules/read/getData.js.map +1 -1
- package/modules/read/readXlsx.js +2 -2
- package/modules/read/readXlsx.js.map +1 -1
- package/modules/read/readXlsxFileContents.js +8 -9
- package/modules/read/readXlsxFileContents.js.map +1 -1
- package/modules/read/readXlsxFileNode.test.js.map +1 -1
- package/modules/read/schema/mapToObjects.js +100 -38
- package/modules/read/schema/mapToObjects.js.map +1 -1
- package/modules/read/schema/mapToObjects.test.js.map +1 -1
- package/modules/read/unpackXlsxFileNode.js +14 -3
- package/modules/read/unpackXlsxFileNode.js.map +1 -1
- package/node/index.d.ts +0 -2
- package/package.json +5 -15
- package/types.d.ts +25 -39
- package/web-worker/index.d.ts +0 -2
- package/bundle/index.html +0 -261
- package/bundle/lib/prism.css +0 -125
- package/bundle/lib/prism.js +0 -7
- package/bundle/lib/promise-polyfill.min.js +0 -1
- package/commonjs/read/schema/convertMapToSchema.js +0 -27
- package/commonjs/read/schema/convertMapToSchema.js.map +0 -1
- package/commonjs/read/schema/convertMapToSchema.test.js.map +0 -1
- package/commonjs/read/schema/mapToObjects.legacy.js +0 -60
- package/commonjs/read/schema/mapToObjects.legacy.js.map +0 -1
- package/commonjs/read/schema/mapToObjects.legacy.test.js.map +0 -1
- package/commonjs/read/schema/mapToObjects.spreadsheet.js +0 -25
- package/commonjs/read/schema/mapToObjects.spreadsheet.js.map +0 -1
- package/commonjs/read/schema/mapToObjects.spreadsheet.test.js.map +0 -1
- package/map/index.cjs +0 -2
- package/map/index.cjs.js +0 -7
- package/map/index.d.ts +0 -15
- package/map/index.js +0 -1
- package/map/package.json +0 -17
- package/modules/read/schema/convertMapToSchema.js +0 -21
- package/modules/read/schema/convertMapToSchema.js.map +0 -1
- package/modules/read/schema/convertMapToSchema.test.js.map +0 -1
- package/modules/read/schema/mapToObjects.legacy.js +0 -53
- package/modules/read/schema/mapToObjects.legacy.js.map +0 -1
- package/modules/read/schema/mapToObjects.legacy.test.js.map +0 -1
- package/modules/read/schema/mapToObjects.spreadsheet.js +0 -19
- package/modules/read/schema/mapToObjects.spreadsheet.js.map +0 -1
- package/modules/read/schema/mapToObjects.spreadsheet.test.js.map +0 -1
- package/schema/index.cjs +0 -2
- package/schema/index.cjs.js +0 -7
- package/schema/index.d.ts +0 -11
- package/schema/index.js +0 -1
- package/schema/package.json +0 -17
- package/website/index.html +0 -261
- package/website/lib/prism.css +0 -125
- package/website/lib/prism.js +0 -7
- package/website/lib/promise-polyfill.min.js +0 -1
package/README.md
CHANGED
|
@@ -1,10 +1,14 @@
|
|
|
1
1
|
# `read-excel-file`
|
|
2
2
|
|
|
3
|
-
Read
|
|
3
|
+
Read `*.xlsx` files of moderate size in a web browser or on a server.
|
|
4
|
+
|
|
5
|
+
It also supports parsing spreadsheet rows into JSON objects using a [schema](#schema).
|
|
6
|
+
|
|
7
|
+
[Huge files](#performance) may not be supported.
|
|
4
8
|
|
|
5
9
|
[Demo](https://catamphetamine.gitlab.io/read-excel-file/)
|
|
6
10
|
|
|
7
|
-
Also check out [`write-excel-file`](https://www.npmjs.com/package/write-excel-file) for writing
|
|
11
|
+
Also check out [`write-excel-file`](https://www.npmjs.com/package/write-excel-file) for writing `*.xlsx` files.
|
|
8
12
|
|
|
9
13
|
## Install
|
|
10
14
|
|
|
@@ -12,12 +16,14 @@ Also check out [`write-excel-file`](https://www.npmjs.com/package/write-excel-fi
|
|
|
12
16
|
npm install read-excel-file --save
|
|
13
17
|
```
|
|
14
18
|
|
|
15
|
-
|
|
19
|
+
Alternatively, one could [include it on a web page directly via a `<script/>` tag](#cdn).
|
|
16
20
|
|
|
17
21
|
## Use
|
|
18
22
|
|
|
19
23
|
### Browser
|
|
20
24
|
|
|
25
|
+
Example 1: User chooses a file and the web application reads it.
|
|
26
|
+
|
|
21
27
|
```html
|
|
22
28
|
<input type="file" id="input" />
|
|
23
29
|
```
|
|
@@ -25,72 +31,85 @@ If you're not using a bundler then use a [standalone version from a CDN](#cdn).
|
|
|
25
31
|
```js
|
|
26
32
|
import readXlsxFile from 'read-excel-file'
|
|
27
33
|
|
|
28
|
-
// File.
|
|
29
34
|
const input = document.getElementById('input')
|
|
35
|
+
|
|
30
36
|
input.addEventListener('change', () => {
|
|
31
37
|
readXlsxFile(input.files[0]).then((rows) => {
|
|
32
|
-
// `rows` is an array of rows
|
|
33
|
-
//
|
|
38
|
+
// `rows` is an array of "rows".
|
|
39
|
+
// Each "row" is an array of "cells".
|
|
40
|
+
// Each "cell" is a value: string, number, Date, boolean.
|
|
34
41
|
})
|
|
35
42
|
})
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Example 2: Application fetches a file from a URL and reads it.
|
|
36
46
|
|
|
37
|
-
|
|
47
|
+
```js
|
|
38
48
|
fetch('https://example.com/spreadsheet.xlsx')
|
|
39
49
|
.then(response => response.blob())
|
|
40
50
|
.then(blob => readXlsxFile(blob))
|
|
41
51
|
.then((rows) => {
|
|
42
|
-
// `rows` is an array of rows
|
|
43
|
-
//
|
|
52
|
+
// `rows` is an array of "rows".
|
|
53
|
+
// Each "row" is an array of "cells".
|
|
54
|
+
// Each "cell" is a value: string, number, Date, boolean.
|
|
44
55
|
})
|
|
45
|
-
|
|
46
|
-
// ArrayBuffer.
|
|
47
|
-
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer
|
|
48
|
-
//
|
|
49
|
-
// Could be obtained from:
|
|
50
|
-
// * File
|
|
51
|
-
// * Blob
|
|
52
|
-
// * Base64 string
|
|
53
|
-
//
|
|
54
|
-
readXlsxFile(arrayBuffer).then((rows) => {
|
|
55
|
-
// `rows` is an array of rows
|
|
56
|
-
// each row being an array of cells.
|
|
57
|
-
})
|
|
58
56
|
```
|
|
59
57
|
|
|
60
|
-
|
|
58
|
+
In summary, it can read data from a [`File`](https://developer.mozilla.org/en-US/docs/Web/API/File), a [`Blob`](https://developer.mozilla.org/en-US/docs/Web/API/Blob) or an [`ArrayBuffer`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer).
|
|
59
|
+
|
|
60
|
+
Note: Internet Explorer 11 is an old browser that doesn't support [`Promise`](https://developer.mozilla.org/ru/docs/Web/JavaScript/Reference/Global_Objects/Promise) and would require a [polyfill](https://www.npmjs.com/package/promise-polyfill) to work.
|
|
61
61
|
|
|
62
62
|
### Node.js
|
|
63
63
|
|
|
64
|
+
Example 1: Read data from a file at file path.
|
|
65
|
+
|
|
64
66
|
```js
|
|
67
|
+
// Notice how it imports from '/node' subpackage.
|
|
65
68
|
const readXlsxFile = require('read-excel-file/node')
|
|
66
69
|
|
|
67
|
-
//
|
|
70
|
+
// Read data from a file by file path.
|
|
68
71
|
readXlsxFile('/path/to/file').then((rows) => {
|
|
69
|
-
// `rows` is an array of rows
|
|
70
|
-
//
|
|
72
|
+
// `rows` is an array of "rows".
|
|
73
|
+
// Each "row" is an array of "cells".
|
|
74
|
+
// Each "cell" is a value: string, number, Date, boolean.
|
|
71
75
|
})
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Example 2: Read data from a [`Stream`](https://nodejs.org/api/stream.html)
|
|
72
79
|
|
|
73
|
-
|
|
80
|
+
```js
|
|
81
|
+
// Read data from a `Stream`.
|
|
74
82
|
readXlsxFile(fs.createReadStream('/path/to/file')).then((rows) => {
|
|
75
|
-
// `rows` is an array of rows
|
|
76
|
-
//
|
|
83
|
+
// `rows` is an array of "rows".
|
|
84
|
+
// Each "row" is an array of "cells".
|
|
85
|
+
// Each "cell" is a value: string, number, Date, boolean.
|
|
77
86
|
})
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
Example 3: Read data from a [`Buffer`](https://nodejs.org/api/buffer.html).
|
|
78
90
|
|
|
79
|
-
|
|
91
|
+
```js
|
|
92
|
+
// Read data from a `Buffer`.
|
|
80
93
|
readXlsxFile(Buffer.from(fs.readFileSync('/path/to/file'))).then((rows) => {
|
|
81
|
-
// `rows` is an array of rows
|
|
82
|
-
//
|
|
94
|
+
// `rows` is an array of "rows".
|
|
95
|
+
// Each "row" is an array of "cells".
|
|
96
|
+
// Each "cell" is a value: string, number, Date, boolean.
|
|
83
97
|
})
|
|
84
98
|
```
|
|
85
99
|
|
|
100
|
+
In summary, it can read data from a file path, a [`Stream`](https://nodejs.org/api/stream.html) or a [`Buffer`](https://nodejs.org/api/buffer.html).
|
|
101
|
+
|
|
86
102
|
### Web Worker
|
|
87
103
|
|
|
104
|
+
Example 1: User chooses a file and the web application reads it in a [Web Worker](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers) to avoid freezing the UI on large files.
|
|
105
|
+
|
|
88
106
|
```js
|
|
107
|
+
// Step 1: Initialize Web Worker.
|
|
108
|
+
|
|
89
109
|
const worker = new Worker('web-worker.js')
|
|
90
110
|
|
|
91
111
|
worker.onmessage = function(event) {
|
|
92
|
-
// `event.data` is
|
|
93
|
-
// each row being an array of cells.
|
|
112
|
+
// `event.data` is a `File`.
|
|
94
113
|
console.log(event.data)
|
|
95
114
|
}
|
|
96
115
|
|
|
@@ -98,6 +117,8 @@ worker.onerror = function(event) {
|
|
|
98
117
|
console.error(event.message)
|
|
99
118
|
}
|
|
100
119
|
|
|
120
|
+
// Step 2: User chooses a file and the application sends it to the Web Worker.
|
|
121
|
+
|
|
101
122
|
const input = document.getElementById('input')
|
|
102
123
|
|
|
103
124
|
input.addEventListener('change', () => {
|
|
@@ -108,72 +129,157 @@ input.addEventListener('change', () => {
|
|
|
108
129
|
##### `web-worker.js`
|
|
109
130
|
|
|
110
131
|
```js
|
|
132
|
+
// Notice how it imports from '/web-worker' subpackage.
|
|
111
133
|
import readXlsxFile from 'read-excel-file/web-worker'
|
|
112
134
|
|
|
113
135
|
onmessage = function(event) {
|
|
114
136
|
readXlsxFile(event.data).then((rows) => {
|
|
115
|
-
// `rows` is an array of rows
|
|
116
|
-
//
|
|
137
|
+
// `rows` is an array of "rows".
|
|
138
|
+
// Each "row" is an array of "cells".
|
|
139
|
+
// Each "cell" is a value: string, number, Date, boolean.
|
|
117
140
|
postMessage(rows)
|
|
118
141
|
})
|
|
119
142
|
}
|
|
120
143
|
```
|
|
121
144
|
|
|
122
|
-
##
|
|
145
|
+
## Multiple Sheets
|
|
146
|
+
|
|
147
|
+
By default, it only reads the first "sheet" in the file. If you have multiple sheets in your file then pass either a sheet number (starting from `1`) or a sheet name in the `options` argument.
|
|
148
|
+
|
|
149
|
+
Example 1: Reads the second sheet.
|
|
150
|
+
|
|
151
|
+
```js
|
|
152
|
+
readXlsxFile(file, { sheet: 2 }).then((data) => {
|
|
153
|
+
...
|
|
154
|
+
})
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
Example 2: Reads the sheet called "Sheet1".
|
|
158
|
+
|
|
159
|
+
```js
|
|
160
|
+
readXlsxFile(file, { sheet: 'Sheet1' }).then((data) => {
|
|
161
|
+
...
|
|
162
|
+
})
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
To get the names of all available sheets, use `readSheetNames()` function:
|
|
166
|
+
|
|
167
|
+
```js
|
|
168
|
+
// Depending on where your code runs, import it from
|
|
169
|
+
// 'read-excel-file' or 'read-exel-file/node' or 'read-excel-file/web-worker'.
|
|
170
|
+
import { readSheetNames } from 'read-excel-file'
|
|
171
|
+
|
|
172
|
+
readSheetNames(file).then((sheetNames) => {
|
|
173
|
+
// sheetNames === ['Sheet1', 'Sheet2']
|
|
174
|
+
})
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
## Dates
|
|
123
178
|
|
|
124
|
-
|
|
179
|
+
`*.xlsx` file format originally had no dedicated "date" type, so dates are in almost all cases stored simply as numbers, equal to the count of days since `01/01/1900`. To correctly interpret such numbers as dates, each date cell has a special ["format"](https://xlsxwriter.readthedocs.io/format.html#format-set-num-format) (example: `"d mmm yyyy"`) that instructs the spreadsheet viewer application to format the number in the cell as a date in a given format.
|
|
125
180
|
|
|
126
|
-
|
|
181
|
+
When using `readXlsxFile()` with a [`schema`](#schema) parameter, all columns having `type: Date` are automatically parsed as dates.
|
|
127
182
|
|
|
128
|
-
|
|
129
|
-
* `required` — (optional) Required properties of the object could be marked as such.
|
|
130
|
-
* `required: boolean` — `true` or `false`.
|
|
131
|
-
* `required: (object) => boolean` — A function returning `true` or `false` depending on some other properties of the object.
|
|
132
|
-
* `validate(value)` — (optional) Cell value validation function. Is only called on non-empty cells. If the cell value is invalid, it should throw an error with the error message set to the error code.
|
|
133
|
-
* `type` — (optional) The type of the value. Defines how the cell value will be parsed. If no `type` is specified then the cell value is returned "as is": as a string, number, date or boolean. A `type` could be a:
|
|
134
|
-
* Built-in type:
|
|
135
|
-
* `String`
|
|
136
|
-
* `Number`
|
|
137
|
-
* `Boolean`
|
|
138
|
-
* `Date`
|
|
139
|
-
* "Utility" type exported from the library:
|
|
140
|
-
* `Integer`
|
|
141
|
-
* `Email`
|
|
142
|
-
* `URL`
|
|
143
|
-
* Custom type:
|
|
144
|
-
* A function that receives a cell value and returns a parsed value. If the value is invalid, it should throw an error with the error message set to the error code.
|
|
183
|
+
When using `readXlsxFile()` without a `schema` parameter, it attempts to guess whether the cell value is a date or a number by looking at the cell's "format" — if the "format" is one of the [standard date formats](https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.numberingformat?view=openxml-2.8.1) then the cell value is interpreted as a date. So usually there's no need to configure anything and it usually works out-of-the-box.
|
|
145
184
|
|
|
146
|
-
|
|
185
|
+
Sometimes though, an `*.xlsx` file might use a non-standard date format like `"mm/dd/yyyy"`. To read such files correctly, pass a `dateFormat` parameter to tell it to parse cells having such "format" as date cells.
|
|
147
186
|
|
|
148
|
-
|
|
187
|
+
```js
|
|
188
|
+
readXlsxFile(file, { dateFormat: 'mm/dd/yyyy' })
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
## Numbers
|
|
192
|
+
|
|
193
|
+
In `*.xlsx` files, numbers are stored as strings. `read-excel-file` manually parses such numeric cell values from strings to numbers. But there's an inherent issue with javascript numbers in general: their [floating-point precision](https://www.youtube.com/watch?v=2gIxbTn7GSc) might not be enough for applications that require 100% precision. An example would be finance and banking. To support such demanding use-cases, this library supports passing a custom `parseNumber(string)` function as an option.
|
|
194
|
+
|
|
195
|
+
Example: Use "decimals" to represent numbers with 100% precision in banking applications.
|
|
196
|
+
|
|
197
|
+
```js
|
|
198
|
+
import Decimal from 'decimal.js'
|
|
149
199
|
|
|
150
|
-
|
|
200
|
+
readXlsxFile(file, {
|
|
201
|
+
parseNumber: (string) => new Decimal(string)
|
|
202
|
+
})
|
|
203
|
+
```
|
|
151
204
|
|
|
152
|
-
|
|
205
|
+
## Strings
|
|
153
206
|
|
|
154
|
-
|
|
207
|
+
By default, it automatically trims all string cell values. To disable this feature, pass `trim: false` option.
|
|
155
208
|
|
|
156
|
-
|
|
209
|
+
```js
|
|
210
|
+
readXlsxFile(file, { trim: false })
|
|
211
|
+
```
|
|
157
212
|
|
|
158
|
-
|
|
159
|
-
* `schemaPropertyValueForMissingColumn: null`
|
|
160
|
-
* `schemaPropertyValueForEmptyCell: null`
|
|
161
|
-
* `getEmptyObjectValue = () => null`
|
|
213
|
+
## Formulas
|
|
162
214
|
|
|
163
|
-
|
|
215
|
+
Dynamically calculated cells using formulas (`SUM`, etc) are not supported.
|
|
164
216
|
|
|
165
|
-
|
|
217
|
+
## Performance
|
|
218
|
+
|
|
219
|
+
There have been some [reports](https://github.com/catamphetamine/read-excel-file/issues/38#issuecomment-544286628) about performance issues when reading very large `*.xlsx` spreadsheets using this library. It's true that this library's main point have been usability and convenience, and not performance when handling huge datasets. For example, the time of parsing a file with 2000 rows and 20 columns could be more than 2 seconds. So for reading huge datasets, perhaps use something like [`xlsx`](https://github.com/catamphetamine/read-excel-file/issues/38#issuecomment-544286628) package instead. There're no comparative benchmarks between the two packages, so if you'll be making one, share it in the "Issues".
|
|
220
|
+
|
|
221
|
+
## Schema
|
|
222
|
+
|
|
223
|
+
To read spreadsheet data and then convert each row to a JSON object, pass a `schema` option to `readXlsxFile()`. When doing so, instead of returning an array of rows of cells, it will return an object of shape `{ rows, errors }` where `rows` is gonna be an array of JSON objects created from the spreadsheet rows according to the `schema`, and `errors` is gonna be an array of any errors encountered during the conversion.
|
|
224
|
+
|
|
225
|
+
The spreadsheet should adhere to a certain structure: first goes a header row with only column titles, rest are the data rows.
|
|
226
|
+
|
|
227
|
+
The `schema` should describe every property of the JSON object:
|
|
228
|
+
|
|
229
|
+
* what is the property name
|
|
230
|
+
* what column to read the value from
|
|
231
|
+
* how to validate the value
|
|
232
|
+
* how to parse the value
|
|
233
|
+
|
|
234
|
+
A key of a `schema` entry represents the name of the property. The value of the `schema` entry describes the rest:
|
|
235
|
+
|
|
236
|
+
* `column` — The title of the column to read the value from.
|
|
237
|
+
* If the column is missing from the spreadsheet, the property value will be `undefined`.
|
|
238
|
+
* This can be overridden by passing `schemaPropertyValueForMissingColumn` option. Is `undefined` by default.
|
|
239
|
+
* If the column is present in the spreadsheet but is empty, the property value will be `null`.
|
|
240
|
+
* This can be overridden by passing `schemaPropertyValueForMissingValue` option. Is `null` by default.
|
|
241
|
+
* `required` — (optional) Is the value required?
|
|
242
|
+
* Could be one of:
|
|
243
|
+
* `required: boolean`
|
|
244
|
+
* `true` — The column must not be missing from the spreadsheet and the cell value must not be empty.
|
|
245
|
+
* `false` — The column can be missing from the spreadsheet and the cell value can be empty.
|
|
246
|
+
* `required: (object) => boolean` — A function returning `true` or `false` depending on the other properties of the object.
|
|
247
|
+
* It could be configured to skip `required` validation for missing columns by passing `schemaPropertyShouldSkipRequiredValidationForMissingColumn` function as an option. By default it's `(column, { object }) => false` meaning that when `column` is missing from the spreadsheet, it will not skip `required` validation for it.
|
|
248
|
+
* `validate(value)` — (optional) Validates the value. Is only called for non-empty cells. If the value is invalid, this function should throw an error.
|
|
249
|
+
* `schema` — (optional) If the value is an object, `schema` should describe its properties.
|
|
250
|
+
* If all of its property values happen to be empty (`undefined` or `null`), the object itself will be `null` too.
|
|
251
|
+
* This can be overridden by passing `getEmptyObjectValue(object, { path? })` function as an option. By default, it returns `null`.
|
|
252
|
+
* `type` — (optional) If the value is not an object, `type` should describe the type of the value. It defines how the cell value will be converted to the property value. If no `type` is specified then the cell value is returned "as is": as a string, number, date or boolean.
|
|
253
|
+
* Valid `type`s:
|
|
254
|
+
* Standard types:
|
|
255
|
+
* `String`
|
|
256
|
+
* `Number`
|
|
257
|
+
* `Boolean`
|
|
258
|
+
* `Date`
|
|
259
|
+
* One of the "utility" types that're exported from this package:
|
|
260
|
+
* `Integer`
|
|
261
|
+
* `Email`
|
|
262
|
+
* `URL`
|
|
263
|
+
* Custom type:
|
|
264
|
+
* A function that receives a cell value and returns a parsed value. If the value is invalid, it should throw an error.
|
|
265
|
+
* If the cell value consists of comma-separated values (example: `"a, b, c"`) then `type` could be specified as `[type]` for any of the valid `type`s described above.
|
|
266
|
+
* Example: `{ type: [String] }` or `{ type: [(value) => parseValue(value)] }`
|
|
267
|
+
* If the cell value is empty, or if every element of the array is `null` or `undefined`, then the array property value is gonna be `null` by default.
|
|
268
|
+
* This can be overridden by passing `getEmptyArrayValue(array, { path })` function as an option. By default, it returns `null`.
|
|
269
|
+
|
|
270
|
+
If there're any errors during the conversion of spreadsheet data to JSON objects, the `errors` property returned from the function will be a non-empty array. Each `error` object has properties:
|
|
166
271
|
|
|
167
272
|
* `error: string` — The error code. Examples: `"required"`, `"invalid"`.
|
|
168
273
|
* If a custom `validate()` function is defined and it throws a `new Error(message)` then the `error` property will be the same as the `message` value.
|
|
169
274
|
* If a custom `type()` function is defined and it throws a `new Error(message)` then the `error` property will be the same as the `message` value.
|
|
170
|
-
* `reason?: string` — An optional secondary error code providing more details about the error. Currently, it's only returned for
|
|
275
|
+
* `reason?: string` — An optional secondary error code providing more details about the error: "`error.error` because `error.reason`". Currently, it's only returned for standard `type`s.
|
|
276
|
+
* Example: `{ error: "invalid", reason: "not_a_number" }` for `type: Number` means that "the cell value is _invalid_ **because** it's _not a number_".
|
|
171
277
|
* `row: number` — The row number in the original file. `1` means the first row, etc.
|
|
172
278
|
* `column: string` — The column title.
|
|
173
279
|
* `value?: any` — The cell value.
|
|
174
|
-
* `type?: any` — The
|
|
280
|
+
* `type?: any` — The `type` of the property, as defined in the `schema`.
|
|
175
281
|
|
|
176
|
-
|
|
282
|
+
Below is an example of using a `schema`.
|
|
177
283
|
|
|
178
284
|
```js
|
|
179
285
|
// An example *.xlsx document:
|
|
@@ -184,39 +290,34 @@ If there were any errors while converting spreadsheet data to JSON objects, the
|
|
|
184
290
|
// -----------------------------------------------------------------------------------------
|
|
185
291
|
|
|
186
292
|
const schema = {
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
prop: 'date',
|
|
293
|
+
date: {
|
|
294
|
+
column: 'START DATE',
|
|
190
295
|
type: Date
|
|
191
296
|
},
|
|
192
|
-
|
|
193
|
-
|
|
297
|
+
numberOfStudents: {
|
|
298
|
+
column: 'NUMBER OF STUDENTS',
|
|
194
299
|
type: Number,
|
|
195
300
|
required: true
|
|
196
301
|
},
|
|
197
302
|
// Nested object example.
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
prop: 'course',
|
|
203
|
-
// Nested object schema:
|
|
204
|
-
type: {
|
|
205
|
-
'IS FREE': {
|
|
206
|
-
prop: 'isFree',
|
|
303
|
+
course: {
|
|
304
|
+
schema: {
|
|
305
|
+
isFree: {
|
|
306
|
+
column: 'IS FREE',
|
|
207
307
|
type: Boolean
|
|
208
308
|
},
|
|
209
|
-
|
|
210
|
-
|
|
309
|
+
title: {
|
|
310
|
+
column: 'COURSE TITLE',
|
|
211
311
|
type: String
|
|
212
312
|
}
|
|
213
313
|
}
|
|
314
|
+
// required: true/false
|
|
214
315
|
},
|
|
215
|
-
|
|
216
|
-
|
|
316
|
+
contact: {
|
|
317
|
+
column: 'CONTACT',
|
|
217
318
|
required: true,
|
|
218
|
-
// A custom `type` can be
|
|
219
|
-
//
|
|
319
|
+
// A custom `type` transformation function can be specified.
|
|
320
|
+
// It will transform the cell value if it's not empty.
|
|
220
321
|
type: (value) => {
|
|
221
322
|
const number = parsePhoneNumber(value)
|
|
222
323
|
if (!number) {
|
|
@@ -225,8 +326,8 @@ const schema = {
|
|
|
225
326
|
return number
|
|
226
327
|
}
|
|
227
328
|
},
|
|
228
|
-
|
|
229
|
-
|
|
329
|
+
status: {
|
|
330
|
+
column: 'STATUS',
|
|
230
331
|
type: String,
|
|
231
332
|
oneOf: [
|
|
232
333
|
'SCHEDULED',
|
|
@@ -253,34 +354,6 @@ readXlsxFile(file, { schema }).then(({ rows, errors }) => {
|
|
|
253
354
|
})
|
|
254
355
|
```
|
|
255
356
|
|
|
256
|
-
#### Separate use
|
|
257
|
-
|
|
258
|
-
The function for converting input data rows to JSON objects using a schema is exported independently as `read-excel-file/map`, if anyone's interested.
|
|
259
|
-
|
|
260
|
-
```js
|
|
261
|
-
import mapToObjects from "read-excel-file/map"
|
|
262
|
-
|
|
263
|
-
const { rows, errors } = mapToObjects(data, schema, options)
|
|
264
|
-
```
|
|
265
|
-
|
|
266
|
-
Maps a list of rows — `data` — into a list of objects — `rows` — using a `schema` as a mapping specification.
|
|
267
|
-
|
|
268
|
-
* `data` — An array of rows, each row being an array of cells. The first row should be the list of column headers and the rest of the rows should be the data.
|
|
269
|
-
* `schema` — A "to JSON" convertion schema (see above).
|
|
270
|
-
* `options` — (optional) Schema conversion parameters of `read-excel-file`:
|
|
271
|
-
* `schemaPropertyValueForMissingColumn` — By default, when some of the `schema` columns are missing in the input `data`, those properties are set to `undefined` in the output objects. Pass `schemaPropertyValueForMissingColumn: null` to set such "missing column" properties to `null` in the output objects.
|
|
272
|
-
* `schemaPropertyValueForNullCellValue` — By default, when it encounters a `null` value in a cell in input `data`, it sets it to `undefined` in the output object. Pass `schemaPropertyValueForNullCellValue: null` to make it set such values as `null`s in output objects.
|
|
273
|
-
* `schemaPropertyValueForUndefinedCellValue` — By default, when it encounters an `undefined` value in a cell in input `data`, it it sets it to `undefined` in the output object. Pass `schemaPropertyValueForUndefinedCellValue: null` to make it set such values as `null`s in output objects.
|
|
274
|
-
* `schemaPropertyShouldSkipRequiredValidationForMissingColumn: (column: string, { object }) => boolean` — By default, it does apply `required` validation to `schema` properties for which columns are missing in the input `data`. One could pass a custom `schemaPropertyShouldSkipRequiredValidationForMissingColumn(column, { object })` to disable `required` validation for missing columns in some or all cases.
|
|
275
|
-
* `getEmptyObjectValue(object, { path? })` — By default, it returns `null` for an "empty" resulting object. One could override that value using `getEmptyObjectValue(object, { path })` parameter. The value applies to both top-level object and any nested sub-objects in case of a nested schema, hence the additional (optional) `path?: string` parameter.
|
|
276
|
-
* `getEmptyArrayValue(array, { path })` — By default, it returns `null` for an "empty" array value. One could override that value using `getEmptyArrayValue(array, { path })` parameter.
|
|
277
|
-
|
|
278
|
-
Returns a list of "mapped objects".
|
|
279
|
-
|
|
280
|
-
When parsing a schema property value, in case of an error, the value of that property is gonna be `undefined`.
|
|
281
|
-
|
|
282
|
-
When a "mapped object" is empty, i.e. when all property values of it are `null` or `undefined`, it is returned as `null` rather than an object.
|
|
283
|
-
|
|
284
357
|
#### Schema: Tips and Features
|
|
285
358
|
|
|
286
359
|
<!-- If no `type` is specified then the cell value is returned "as is": as a string, number, date or boolean. -->
|
|
@@ -288,14 +361,16 @@ When a "mapped object" is empty, i.e. when all property values of it are `null`
|
|
|
288
361
|
<!-- There are also some additional exported `type`s available: -->
|
|
289
362
|
|
|
290
363
|
<details>
|
|
291
|
-
<summary
|
|
364
|
+
<summary>How to transform cell value using a <strong>custom <code>type</code></strong> function.</summary>
|
|
292
365
|
|
|
293
366
|
#####
|
|
294
367
|
|
|
368
|
+
Here's an example of a custom `type` parsing function. It will only be called for a non-empty cell and will transform the cell value.
|
|
369
|
+
|
|
295
370
|
```js
|
|
296
371
|
{
|
|
297
|
-
|
|
298
|
-
|
|
372
|
+
property: {
|
|
373
|
+
column: 'COLUMN TITLE',
|
|
299
374
|
type: (value) => {
|
|
300
375
|
try {
|
|
301
376
|
return parseValue(value)
|
|
@@ -311,12 +386,13 @@ When a "mapped object" is empty, i.e. when all property values of it are `null`
|
|
|
311
386
|
|
|
312
387
|
<!-- A schema entry for a column may also define an optional `validate(value)` function for validating the parsed value: in that case, it must `throw` an `Error` if the `value` is invalid. The `validate(value)` function is only called when `value` is not empty (not `null` / `undefined`). -->
|
|
313
388
|
|
|
389
|
+
<!--
|
|
314
390
|
<details>
|
|
315
|
-
<summary
|
|
391
|
+
<summary>How to <strong>not skip empty rows</strong>.</summary>
|
|
316
392
|
|
|
317
393
|
#####
|
|
318
394
|
|
|
319
|
-
By default, it
|
|
395
|
+
By default, it skips any empty rows. To disable that behavior, pass `ignoreEmptyRows: false` option.
|
|
320
396
|
|
|
321
397
|
```js
|
|
322
398
|
readXlsxFile(file, {
|
|
@@ -325,26 +401,7 @@ readXlsxFile(file, {
|
|
|
325
401
|
})
|
|
326
402
|
```
|
|
327
403
|
</details>
|
|
328
|
-
|
|
329
|
-
<details>
|
|
330
|
-
<summary>How to fix spreadsheet data before <code>schema</code> parsing. For example, <strong>how to ignore irrelevant rows</strong>.</summary>
|
|
331
|
-
|
|
332
|
-
#####
|
|
333
|
-
|
|
334
|
-
Sometimes, a spreadsheet doesn't exactly have the structure required by this library's `schema` parsing feature: for example, it may be missing a header row, or contain some purely presentational / irrelevant / "garbage" rows that should be removed. To fix that, one could pass an optional `transformData(data)` function that would modify the spreadsheet contents as required.
|
|
335
|
-
|
|
336
|
-
```js
|
|
337
|
-
readXlsxFile(file, {
|
|
338
|
-
schema,
|
|
339
|
-
transformData(data) {
|
|
340
|
-
// Add a missing header row.
|
|
341
|
-
return [['ID', 'NAME', ...]].concat(data)
|
|
342
|
-
// Remove irrelevant rows.
|
|
343
|
-
return data.filter(row => row.filter(column => column !== null).length > 0)
|
|
344
|
-
}
|
|
345
|
-
})
|
|
346
|
-
```
|
|
347
|
-
</details>
|
|
404
|
+
-->
|
|
348
405
|
|
|
349
406
|
<details>
|
|
350
407
|
<summary>A <strong>React component for displaying errors</strong> that occured during schema parsing/validation.</summary>
|
|
@@ -354,8 +411,20 @@ readXlsxFile(file, {
|
|
|
354
411
|
```js
|
|
355
412
|
import { parseExcelDate } from 'read-excel-file'
|
|
356
413
|
|
|
357
|
-
function
|
|
358
|
-
|
|
414
|
+
function ParseExcelFileErrors({ errors }) {
|
|
415
|
+
return (
|
|
416
|
+
<ul>
|
|
417
|
+
{errors.map((error, i) => (
|
|
418
|
+
<li key={i}>
|
|
419
|
+
<ParseExcelFileError error={error}>
|
|
420
|
+
</li>
|
|
421
|
+
))}
|
|
422
|
+
</ul>
|
|
423
|
+
)
|
|
424
|
+
}
|
|
425
|
+
|
|
426
|
+
function ParseExcelFileError({ error: errorDetails }) {
|
|
427
|
+
const { type, value, error, reason, row, column } = errorDetails
|
|
359
428
|
|
|
360
429
|
// Error summary.
|
|
361
430
|
return (
|
|
@@ -384,135 +453,26 @@ function stringifyValue(value) {
|
|
|
384
453
|
```
|
|
385
454
|
</details>
|
|
386
455
|
|
|
387
|
-
##
|
|
388
|
-
|
|
389
|
-
Same as above, but simpler: without any parsing or validation.
|
|
390
|
-
|
|
391
|
-
Sometimes, a developer might want to use some other (more advanced) solution for schema parsing and validation (like [`yup`](https://github.com/jquense/yup)). If a developer passes a `map` option instead of a `schema` option to `readXlsxFile()`, then it would just map each data row to a JSON object without doing any parsing or validation. Cell values will remain "as is": as a string, number, date or boolean.
|
|
392
|
-
|
|
393
|
-
```js
|
|
394
|
-
// An example *.xlsx document:
|
|
395
|
-
// ------------------------------------------------------------
|
|
396
|
-
// | START DATE | NUMBER OF STUDENTS | IS FREE | COURSE TITLE |
|
|
397
|
-
// ------------------------------------------------------------
|
|
398
|
-
// | 03/24/2018 | 10 | true | Chemistry |
|
|
399
|
-
// ------------------------------------------------------------
|
|
400
|
-
|
|
401
|
-
const map = {
|
|
402
|
-
'START DATE': 'date',
|
|
403
|
-
'NUMBER OF STUDENTS': 'numberOfStudents',
|
|
404
|
-
'COURSE': {
|
|
405
|
-
'course': {
|
|
406
|
-
'IS FREE': 'isFree',
|
|
407
|
-
'COURSE TITLE': 'title'
|
|
408
|
-
}
|
|
409
|
-
}
|
|
410
|
-
}
|
|
411
|
-
|
|
412
|
-
readXlsxFile(file, { map }).then(({ rows }) => {
|
|
413
|
-
rows === [{
|
|
414
|
-
date: new Date(2018, 2, 24),
|
|
415
|
-
numberOfStudents: 10,
|
|
416
|
-
course: {
|
|
417
|
-
isFree: true,
|
|
418
|
-
title: 'Chemistry'
|
|
419
|
-
}
|
|
420
|
-
}]
|
|
421
|
-
})
|
|
422
|
-
```
|
|
423
|
-
|
|
424
|
-
## Multiple Sheets
|
|
425
|
-
|
|
426
|
-
By default, it reads the first sheet in the document. If you have multiple sheets in your spreadsheet then pass either a sheet number (starting from `1`) or a sheet name in the `options` argument.
|
|
427
|
-
|
|
428
|
-
```js
|
|
429
|
-
readXlsxFile(file, { sheet: 2 }).then((data) => {
|
|
430
|
-
...
|
|
431
|
-
})
|
|
432
|
-
```
|
|
433
|
-
|
|
434
|
-
```js
|
|
435
|
-
readXlsxFile(file, { sheet: 'Sheet1' }).then((data) => {
|
|
436
|
-
...
|
|
437
|
-
})
|
|
438
|
-
```
|
|
439
|
-
|
|
440
|
-
By default, `options.sheet` is `1`.
|
|
456
|
+
## Fix Spreadsheet Before Parsing With Schema
|
|
441
457
|
|
|
442
|
-
|
|
443
|
-
|
|
444
|
-
```js
|
|
445
|
-
readSheetNames(file).then((sheetNames) => {
|
|
446
|
-
// sheetNames === ['Sheet1', 'Sheet2']
|
|
447
|
-
})
|
|
448
|
-
```
|
|
449
|
-
|
|
450
|
-
## Dates
|
|
451
|
-
|
|
452
|
-
XLSX format originally had no dedicated "date" type, so dates are in almost all cases stored simply as numbers (the count of days since `01/01/1900`) along with a ["format"](https://xlsxwriter.readthedocs.io/format.html#format-set-num-format) description (like `"d mmm yyyy"`) that instructs the spreadsheet viewer software to format the date in the cell using that certain format.
|
|
453
|
-
|
|
454
|
-
When using `readXlsx()` with a `schema` parameter, all schema columns having type `Date` are automatically parsed as dates. When using `readXlsx()` without a `schema` parameter, this library attempts to guess whether a cell contains a date or just a number by examining the cell's "format" — if the "format" is one of the [built-in date formats](https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.numberingformat?view=openxml-2.8.1) then such cells' values are automatically parsed as dates. In other cases, when date cells use a non-built-in format (like `"mm/dd/yyyy"`), one can pass an explicit `dateFormat` parameter to instruct the library to parse numeric cells having such "format" as dates:
|
|
455
|
-
|
|
456
|
-
```js
|
|
457
|
-
readXlsxFile(file, { dateFormat: 'mm/dd/yyyy' })
|
|
458
|
-
```
|
|
459
|
-
|
|
460
|
-
## Trim
|
|
461
|
-
|
|
462
|
-
By default, it automatically trims all string values. To disable this feature, pass `trim: false` option.
|
|
463
|
-
|
|
464
|
-
```js
|
|
465
|
-
readXlsxFile(file, { trim: false })
|
|
466
|
-
```
|
|
467
|
-
|
|
468
|
-
## Parse Numbers
|
|
469
|
-
|
|
470
|
-
By default, it parses numeric cell values from strings. In some rare cases though, javascript's [inherently limited](https://www.youtube.com/watch?v=2gIxbTn7GSc) floating-point number precision might become an issue. An example might be finance and banking domain. To work around that, this library supports passing a custom `parseNumber(string)` function option.
|
|
471
|
-
|
|
472
|
-
```js
|
|
473
|
-
// Arbitrary-precision numbers in javascript.
|
|
474
|
-
import Decimal from 'decimal.js'
|
|
475
|
-
|
|
476
|
-
readXlsxFile(file, {
|
|
477
|
-
parseNumber: (string) => new Decimal(string)
|
|
478
|
-
})
|
|
479
|
-
```
|
|
480
|
-
|
|
481
|
-
## Transform
|
|
482
|
-
|
|
483
|
-
Sometimes, a spreadsheet doesn't exactly have the structure required by this library's `schema` parsing feature: for example, it may be missing a header row, or contain some purely presentational / empty / "garbage" rows that should be removed. To fix that, one could pass an optional `transformData(data)` function that would modify the spreadsheet contents as required.
|
|
458
|
+
Sometimes, a spreadsheet doesn't have the required structure to parse it with `schema`. For example, header row might be missing, or there could be some purely presentational / empty / "garbage" rows that should be removed before parsing. To fix that, pass a `transformData(data)` function as an option. It will modify spreadsheet content before it is parsed with `schema`.
|
|
484
459
|
|
|
485
460
|
```js
|
|
486
461
|
readXlsxFile(file, {
|
|
487
462
|
schema,
|
|
488
463
|
transformData(data) {
|
|
489
|
-
// Add a missing header row.
|
|
464
|
+
// Example 1: Add a missing header row.
|
|
490
465
|
return [['ID', 'NAME', ...]].concat(data)
|
|
491
|
-
// Remove empty rows.
|
|
492
|
-
return data.filter(row => row.
|
|
466
|
+
// Example 2: Remove empty rows.
|
|
467
|
+
return data.filter(row => row.some(cell => cell !== null))
|
|
493
468
|
}
|
|
494
469
|
})
|
|
495
470
|
```
|
|
496
471
|
</details>
|
|
497
472
|
|
|
498
|
-
|
|
499
|
-
## Limitations
|
|
500
|
-
|
|
501
|
-
### Performance
|
|
502
|
-
|
|
503
|
-
There have been some [reports](https://github.com/catamphetamine/read-excel-file/issues/38#issuecomment-544286628) about performance issues when reading very large `*.xlsx` spreadsheets using this library. It's true that this library's main point have been usability and convenience, and not performance when handling huge datasets. For example, the time of parsing a file with 2000 rows / 20 columns is about 3 seconds. So, for reading huge datasets, perhaps use something like [`xlsx`](https://github.com/catamphetamine/read-excel-file/issues/38#issuecomment-544286628) package instead. There're no comparative benchmarks between the two, so if you'll be making one, share it in the Issues.
|
|
504
|
-
|
|
505
|
-
### Formulas
|
|
506
|
-
|
|
507
|
-
Dynamically calculated cells using formulas (`SUM`, etc) are not supported.
|
|
508
|
-
|
|
509
|
-
## TypeScript
|
|
510
|
-
|
|
511
|
-
I'm not a TypeScript expert, so the community has to write the typings (and test those). See [example `index.d.ts`](https://github.com/catamphetamine/read-excel-file/issues/71#issuecomment-675140448).
|
|
512
|
-
|
|
513
473
|
## CDN
|
|
514
474
|
|
|
515
|
-
|
|
475
|
+
To include this library directly via a `<script/>` tag on a page, one can use any npm CDN service, e.g. [unpkg.com](https://unpkg.com) or [jsdelivr.net](https://jsdelivr.net)
|
|
516
476
|
|
|
517
477
|
```html
|
|
518
478
|
<script src="https://unpkg.com/read-excel-file@5.x/bundle/read-excel-file.min.js"></script>
|
|
@@ -528,14 +488,6 @@ One can use any npm CDN service, e.g. [unpkg.com](https://unpkg.com) or [jsdeliv
|
|
|
528
488
|
</script>
|
|
529
489
|
```
|
|
530
490
|
|
|
531
|
-
## TypeScript
|
|
532
|
-
|
|
533
|
-
This library comes with TypeScript "typings". If you happen to find any bugs in those, create an issue.
|
|
534
|
-
|
|
535
|
-
## References
|
|
536
|
-
|
|
537
|
-
Uses [`xmldom`](https://github.com/jindw/xmldom) for parsing XML.
|
|
538
|
-
|
|
539
491
|
## GitHub
|
|
540
492
|
|
|
541
493
|
On March 9th, 2020, GitHub, Inc. silently [banned](https://medium.com/@catamphetamine/how-github-blocked-me-and-all-my-libraries-c32c61f061d3) my account (erasing all my repos, issues and comments, even in my employer's private repos) without any notice or explanation. Because of that, all source codes had to be promptly moved to GitLab. The [GitHub repo](https://github.com/catamphetamine/read-excel-file) is now only used as a backup (you can star the repo there too), and the primary repo is now the [GitLab one](https://gitlab.com/catamphetamine/read-excel-file). Issues can be reported in any repo.
|