read-excel-file 4.0.5 → 4.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.gitlab-ci.yml +15 -0
- package/CHANGELOG.md +27 -0
- package/LICENSE +1 -1
- package/README.md +110 -17
- package/bundle/index.html +22 -16
- package/bundle/read-excel-file.min.js +2 -2
- package/bundle/read-excel-file.min.js.map +1 -1
- package/commonjs/convertMapToSchema.js +41 -0
- package/commonjs/convertMapToSchema.js.map +1 -0
- package/commonjs/convertMapToSchema.test.js.map +1 -0
- package/commonjs/convertToJson.js +19 -9
- package/commonjs/convertToJson.js.map +1 -1
- package/commonjs/convertToJson.test.js.map +1 -1
- package/commonjs/readXlsx.js +6 -3
- package/commonjs/readXlsx.js.map +1 -1
- package/commonjs/readXlsxFileContents.js +17 -4
- package/commonjs/readXlsxFileContents.js.map +1 -1
- package/commonjs/readXlsxFileNode.test.js.map +1 -1
- package/index.d.ts.test +20 -0
- package/modules/convertMapToSchema.js +34 -0
- package/modules/convertMapToSchema.js.map +1 -0
- package/modules/convertMapToSchema.test.js.map +1 -0
- package/modules/convertToJson.js +19 -9
- package/modules/convertToJson.js.map +1 -1
- package/modules/convertToJson.test.js.map +1 -1
- package/modules/readXlsx.js +6 -3
- package/modules/readXlsx.js.map +1 -1
- package/modules/readXlsxFileContents.js +14 -4
- package/modules/readXlsxFileContents.js.map +1 -1
- package/modules/readXlsxFileNode.test.js.map +1 -1
- package/node/index.commonjs.js +6 -0
- package/node/index.d.ts.test +23 -0
- package/node/index.js +5 -0
- package/node/package.json +9 -0
- package/package.json +7 -6
- package/schema/index.commonjs.js +2 -0
- package/schema/index.d.ts.test +6 -0
- package/schema/index.js +1 -0
- package/schema/package.json +9 -0
- package/types.d.ts +80 -0
- package/website/index.html +105 -0
- package/node.js +0 -6
package/.gitlab-ci.yml
ADDED
package/CHANGELOG.md
CHANGED
|
@@ -1,3 +1,30 @@
|
|
|
1
|
+
<!--
|
|
2
|
+
5.0.0 / 30.08.2020
|
|
3
|
+
==================
|
|
4
|
+
|
|
5
|
+
* Added [TypeScript](https://github.com/catamphetamine/read-excel-file/issues/71) definitions.
|
|
6
|
+
|
|
7
|
+
* Removed deprecated `URL`, `Integer` and `Email` exports (use the string variants instead: `"URL"`, `"Integer"`, `"Email"`).
|
|
8
|
+
|
|
9
|
+
* Removed undocumented `convertToJson()` export.
|
|
10
|
+
-->
|
|
11
|
+
|
|
12
|
+
4.1.0 / 09.11.2020
|
|
13
|
+
==================
|
|
14
|
+
|
|
15
|
+
* Renamed schema entry `parse()` function: now it's called `type`. This way, `type` could be both a built-in type and a custom type.
|
|
16
|
+
|
|
17
|
+
* Changed the built-in `"Integer"`, `"URL"` and `"Email"` types: now they're exported functions again instead of strings. Strings still work.
|
|
18
|
+
|
|
19
|
+
* Added `map` parameter: similar to `schema` but doesn't perform any parsing or validation. Can be used to map an Excel file to an array of objects that could be parsed/validated using [`yup`](https://github.com/jquense/yup).
|
|
20
|
+
|
|
21
|
+
* `type` of a schema entry is no longer required: if no `type` is specified, then the cell value is returned "as is" (string, or number, or boolean, or `Date`).
|
|
22
|
+
|
|
23
|
+
4.0.8 / 08.11.2020
|
|
24
|
+
==================
|
|
25
|
+
|
|
26
|
+
* Updated `JSZip` to the latest version. The [issue](https://gitlab.com/catamphetamine/read-excel-file/-/issues/8). The [original issue](https://github.com/catamphetamine/read-excel-file/issues/54).
|
|
27
|
+
|
|
1
28
|
4.0.0 / 25.05.2019
|
|
2
29
|
==================
|
|
3
30
|
|
package/LICENSE
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
MIT License
|
|
2
2
|
|
|
3
|
-
Copyright (c) 2018
|
|
3
|
+
Copyright (c) 2018 gitlab.com/catamphetamine
|
|
4
4
|
|
|
5
5
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
6
|
of this software and associated documentation files (the "Software"), to deal
|
package/README.md
CHANGED
|
@@ -1,8 +1,16 @@
|
|
|
1
1
|
# `read-excel-file`
|
|
2
2
|
|
|
3
|
-
Read `*.xlsx` files in a browser or Node.js. Parse to JSON with a strict schema.
|
|
3
|
+
Read small to medium `*.xlsx` files in a browser or Node.js. Parse to JSON with a strict schema.
|
|
4
4
|
|
|
5
|
-
[Demo](https://catamphetamine.
|
|
5
|
+
[Demo](https://catamphetamine.gitlab.io/read-excel-file/)
|
|
6
|
+
|
|
7
|
+
## Restrictions
|
|
8
|
+
|
|
9
|
+
There have been some [complaints](https://github.com/catamphetamine/read-excel-file/issues/38#issuecomment-544286628) about this library not being able to handle large `*.xlsx` spreadsheets. It's true that this library's main point have been usability and convenience, and not performance or the ability to handle huge datasets. For example, the time of parsing a 2000 rows / 20 columns file is about 3 seconds, and when parsing a 30k+ rows file, it may throw a `RangeError: Maximum call stack size exceeded`. So, for handling huge datasets, use something like [`xlsx`](https://github.com/catamphetamine/read-excel-file/issues/38#issuecomment-544286628) package instead. This library is suitable for handling small to medium `*.xlsx` files.
|
|
10
|
+
|
|
11
|
+
## GitHub
|
|
12
|
+
|
|
13
|
+
On March 9th, 2020, GitHub, Inc. silently [banned](https://medium.com/@catamphetamine/how-github-blocked-me-and-all-my-libraries-c32c61f061d3) my account (and all my libraries) without any notice. I opened a support ticked but they didn't answer. Because of that, I had to move all my libraries to [GitLab](https://gitlab.com/catamphetamine).
|
|
6
14
|
|
|
7
15
|
## Install
|
|
8
16
|
|
|
@@ -10,6 +18,8 @@ Read `*.xlsx` files in a browser or Node.js. Parse to JSON with a strict schema.
|
|
|
10
18
|
npm install read-excel-file --save
|
|
11
19
|
```
|
|
12
20
|
|
|
21
|
+
If you're not using a bundler then use a [standalone version from a CDN](#cdn).
|
|
22
|
+
|
|
13
23
|
## Browser
|
|
14
24
|
|
|
15
25
|
```html
|
|
@@ -48,7 +58,11 @@ readXlsxFile(fs.createReadStream('/path/to/file')).then((rows) => {
|
|
|
48
58
|
|
|
49
59
|
## Dates
|
|
50
60
|
|
|
51
|
-
XLSX format has no dedicated "date" type so dates are stored internally as simply numbers along with a "format" (e.g. `"MM/DD/YY"`). When using `readXlsx()` with `schema` parameter all dates get parsed correctly in any case. But if using `readXlsx()` without `schema` parameter (to get "raw" data) then this library attempts to guess whether a cell value is a date or not by examining the cell "format" (e.g. `"MM/DD/YY"`), so in most cases dates are detected and parsed automatically. For exotic cases one can pass an explicit `dateFormat` parameter (e.g. `"MM/DD/YY"`) to instruct the library to parse numbers with such "format" as dates
|
|
61
|
+
XLSX format has no dedicated "date" type so dates are stored internally as simply numbers along with a "format" (e.g. `"MM/DD/YY"`). When using `readXlsx()` with `schema` parameter all dates get parsed correctly in any case. But if using `readXlsx()` without `schema` parameter (to get "raw" data) then this library attempts to guess whether a cell value is a date or not by examining the cell "format" (e.g. `"MM/DD/YY"`), so in most cases dates are detected and parsed automatically. For exotic cases one can pass an explicit `dateFormat` parameter (e.g. `"MM/DD/YY"`) to instruct the library to parse numbers with such "format" as dates:
|
|
62
|
+
|
|
63
|
+
```js
|
|
64
|
+
readXlsxFile(file, { dateFormat: 'MM/DD/YY' })
|
|
65
|
+
```
|
|
52
66
|
|
|
53
67
|
## JSON
|
|
54
68
|
|
|
@@ -56,11 +70,11 @@ To convert rows to JSON pass `schema` option to `readXlsxFile()`. It will return
|
|
|
56
70
|
|
|
57
71
|
```js
|
|
58
72
|
// An example *.xlsx document:
|
|
59
|
-
//
|
|
60
|
-
// | START DATE | NUMBER OF STUDENTS | IS FREE | COURSE TITLE | CONTACT |
|
|
61
|
-
//
|
|
62
|
-
// | 03/24/2018 | 123 | true | Chemistry | (123) 456-7890 |
|
|
63
|
-
//
|
|
73
|
+
// -----------------------------------------------------------------------------------------
|
|
74
|
+
// | START DATE | NUMBER OF STUDENTS | IS FREE | COURSE TITLE | CONTACT | STATUS |
|
|
75
|
+
// -----------------------------------------------------------------------------------------
|
|
76
|
+
// | 03/24/2018 | 123 | true | Chemistry | (123) 456-7890 | SCHEDULED |
|
|
77
|
+
// -----------------------------------------------------------------------------------------
|
|
64
78
|
|
|
65
79
|
const schema = {
|
|
66
80
|
'START DATE': {
|
|
@@ -75,6 +89,8 @@ const schema = {
|
|
|
75
89
|
type: Number,
|
|
76
90
|
required: true
|
|
77
91
|
},
|
|
92
|
+
// 'COURSE' is not a real Excel file column name,
|
|
93
|
+
// it can be any string — it's just for code readability.
|
|
78
94
|
'COURSE': {
|
|
79
95
|
prop: 'course',
|
|
80
96
|
type: {
|
|
@@ -94,13 +110,22 @@ const schema = {
|
|
|
94
110
|
'CONTACT': {
|
|
95
111
|
prop: 'contact',
|
|
96
112
|
required: true,
|
|
97
|
-
|
|
113
|
+
type: (value) => {
|
|
98
114
|
const number = parsePhoneNumber(value)
|
|
99
115
|
if (!number) {
|
|
100
116
|
throw new Error('invalid')
|
|
101
117
|
}
|
|
102
118
|
return number
|
|
103
119
|
}
|
|
120
|
+
},
|
|
121
|
+
'STATUS': {
|
|
122
|
+
prop: 'status',
|
|
123
|
+
type: String,
|
|
124
|
+
oneOf: [
|
|
125
|
+
'SCHEDULED',
|
|
126
|
+
'STARTED',
|
|
127
|
+
'FINISHED'
|
|
128
|
+
]
|
|
104
129
|
}
|
|
105
130
|
}
|
|
106
131
|
|
|
@@ -116,30 +141,74 @@ readXlsxFile(file, { schema }).then(({ rows, errors }) => {
|
|
|
116
141
|
title: 'Chemistry'
|
|
117
142
|
},
|
|
118
143
|
contact: '+11234567890',
|
|
144
|
+
status: 'SCHEDULED'
|
|
119
145
|
}]
|
|
120
146
|
})
|
|
121
147
|
```
|
|
122
148
|
|
|
149
|
+
If no `type` is specified then the cell value is returned "as is".
|
|
150
|
+
|
|
123
151
|
There are also some additional exported `type`s:
|
|
124
152
|
|
|
125
|
-
* `
|
|
126
|
-
* `
|
|
127
|
-
* `
|
|
153
|
+
* `Integer` for parsing integer `Number`s.
|
|
154
|
+
* `URL` for parsing URLs.
|
|
155
|
+
* `Email` for parsing email addresses.
|
|
156
|
+
|
|
157
|
+
A schema entry for a column may also define an optional `validate(value)` function for validating the parsed value: in that case, it must `throw` an `Error` if the `value` is invalid.
|
|
158
|
+
|
|
159
|
+
#### Map
|
|
160
|
+
|
|
161
|
+
Sometimes, a developer might want to use some other (more advanced) solution for schema parsing and validation (like [`yup`](https://github.com/jquense/yup)). If a developer passes a `map` instead of a `schema` to `readXlsxFile()`, then it would just map each data row to a JSON object without doing any parsing or validation.
|
|
162
|
+
|
|
163
|
+
```js
|
|
164
|
+
// An example *.xlsx document:
|
|
165
|
+
// -----------------------------------------------------------------------------------------
|
|
166
|
+
// | START DATE | NUMBER OF STUDENTS | IS FREE | COURSE TITLE | CONTACT | STATUS |
|
|
167
|
+
// -----------------------------------------------------------------------------------------
|
|
168
|
+
// | 03/24/2018 | 123 | true | Chemistry | (123) 456-7890 | SCHEDULED |
|
|
169
|
+
// -----------------------------------------------------------------------------------------
|
|
170
|
+
|
|
171
|
+
const map = {
|
|
172
|
+
'START DATE': 'date',
|
|
173
|
+
'NUMBER OF STUDENTS': 'numberOfStudents',
|
|
174
|
+
'COURSE': {
|
|
175
|
+
'course': {
|
|
176
|
+
'IS FREE': 'isFree',
|
|
177
|
+
'COURSE TITLE': 'title'
|
|
178
|
+
}
|
|
179
|
+
},
|
|
180
|
+
'CONTACT': 'contact',
|
|
181
|
+
'STATUS': 'status'
|
|
182
|
+
}
|
|
183
|
+
|
|
184
|
+
readXlsxFile(file, { map }).then(({ rows }) => {
|
|
185
|
+
rows === [{
|
|
186
|
+
date: new Date(2018, 2, 24),
|
|
187
|
+
numberOfStudents: 123,
|
|
188
|
+
course: {
|
|
189
|
+
isFree: true,
|
|
190
|
+
title: 'Chemistry'
|
|
191
|
+
},
|
|
192
|
+
contact: '(123) 456-7890',
|
|
193
|
+
status: 'SCHEDULED'
|
|
194
|
+
}]
|
|
195
|
+
})
|
|
196
|
+
```
|
|
128
197
|
|
|
129
|
-
|
|
198
|
+
#### Displaying schema errors
|
|
130
199
|
|
|
131
|
-
A React component for displaying
|
|
200
|
+
A React component for displaying schema parsing/validation errors could look like this:
|
|
132
201
|
|
|
133
202
|
```js
|
|
134
203
|
import { parseExcelDate } from 'read-excel-file'
|
|
135
204
|
|
|
136
205
|
function ParseExcelError({ children: error }) {
|
|
137
|
-
//
|
|
206
|
+
// Get a human-readable value.
|
|
138
207
|
let value = error.value
|
|
139
208
|
if (error.type === Date) {
|
|
140
209
|
value = parseExcelDate(value).toString()
|
|
141
210
|
}
|
|
142
|
-
//
|
|
211
|
+
// Render error summary.
|
|
143
212
|
return (
|
|
144
213
|
<div>
|
|
145
214
|
<code>"{error.error}"</code>
|
|
@@ -156,6 +225,8 @@ function ParseExcelError({ children: error }) {
|
|
|
156
225
|
}
|
|
157
226
|
```
|
|
158
227
|
|
|
228
|
+
#### Transforming rows/columns before schema is applied
|
|
229
|
+
|
|
159
230
|
When using a `schema` there's also an optional `transformData(data)` parameter which can be used for the cases when the spreadsheet rows/columns aren't in the correct format. For example, the heading row may be missing, or there may be some purely presentational or empty rows. Example:
|
|
160
231
|
|
|
161
232
|
```js
|
|
@@ -163,13 +234,17 @@ readXlsxFile(file, {
|
|
|
163
234
|
schema,
|
|
164
235
|
transformData(data) {
|
|
165
236
|
// Adds header row to the data.
|
|
166
|
-
return ['ID', 'NAME', ...].concat(data)
|
|
237
|
+
return [['ID', 'NAME', ...]].concat(data)
|
|
167
238
|
// Removes empty rows.
|
|
168
239
|
return data.filter(row => row.filter(column => column !== null).length > 0)
|
|
169
240
|
}
|
|
170
241
|
})
|
|
171
242
|
```
|
|
172
243
|
|
|
244
|
+
## TypeScript
|
|
245
|
+
|
|
246
|
+
See [testing `index.d.ts`](https://github.com/catamphetamine/read-excel-file/issues/71#issuecomment-675140448).
|
|
247
|
+
|
|
173
248
|
## Browser compatibility
|
|
174
249
|
|
|
175
250
|
Node.js `*.xlxs` parser uses `xpath` and `xmldom` packages for XML parsing. The same packages could be used in a browser because [all modern browsers](https://caniuse.com/#search=domparser) (except IE 11) have native `DOMParser` built-in which could is used instead (meaning smaller footprint and better performance) but since Internet Explorer 11 support is still required the browser version doesn't use the native `DOMParser` and instead uses `xpath` and `xmldom` packages for XML parsing just like the Node.js version.
|
|
@@ -204,6 +279,24 @@ readXlsxFile(file, { getSheets: true }).then((sheets) => {
|
|
|
204
279
|
})
|
|
205
280
|
```
|
|
206
281
|
|
|
282
|
+
## CDN
|
|
283
|
+
|
|
284
|
+
One can use any npm CDN service, e.g. [unpkg.com](https://unpkg.com) or [jsdelivr.net](https://jsdelivr.net)
|
|
285
|
+
|
|
286
|
+
```html
|
|
287
|
+
<script src="https://unpkg.com/read-excel-file@4.x/bundle/read-excel-file.min.js"></script>
|
|
288
|
+
|
|
289
|
+
<script>
|
|
290
|
+
var input = document.getElementById('input')
|
|
291
|
+
input.addEventListener('change', function() {
|
|
292
|
+
readXlsxFile(input.files[0]).then(function() {
|
|
293
|
+
// `rows` is an array of rows
|
|
294
|
+
// each row being an array of cells.
|
|
295
|
+
})
|
|
296
|
+
})
|
|
297
|
+
</script>
|
|
298
|
+
```
|
|
299
|
+
|
|
207
300
|
## References
|
|
208
301
|
|
|
209
302
|
For XML parsing [`xmldom`](https://github.com/jindw/xmldom) and [`xpath`](https://github.com/goto100/xpath) are used.
|
package/bundle/index.html
CHANGED
|
@@ -52,14 +52,14 @@
|
|
|
52
52
|
</head>
|
|
53
53
|
|
|
54
54
|
<body>
|
|
55
|
-
<a id="main-link" href="https://
|
|
55
|
+
<a id="main-link" href="https://gitlab.com/catamphetamine/read-excel-file">
|
|
56
56
|
read-excel-file
|
|
57
57
|
</a>
|
|
58
58
|
|
|
59
59
|
<input type="file" id="input" />
|
|
60
60
|
|
|
61
61
|
<div style="font-size: 12px">
|
|
62
|
-
* Parsing to JSON with a strict schema is supported. <a target="_blank" href="https://
|
|
62
|
+
* Parsing to JSON with a strict schema is supported. <a target="_blank" href="https://gitlab.com/catamphetamine/read-excel-file#json" style="color: #0093C4; text-decoration: none">Read more</a>.
|
|
63
63
|
</div>
|
|
64
64
|
|
|
65
65
|
<div id="result-table"></div>
|
|
@@ -75,20 +75,26 @@
|
|
|
75
75
|
// each row being an array of cells.
|
|
76
76
|
document.getElementById('result').innerText = JSON.stringify(data, null, 2)
|
|
77
77
|
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
78
|
+
// Applying `innerHTML` hangs the browser when there're a lot of rows/columns.
|
|
79
|
+
// For example, for a file having 2000 rows and 20 columns on a modern
|
|
80
|
+
// mid-tier CPU it parses the file (using a "schema") for 3 seconds
|
|
81
|
+
// (blocking) with 100% single CPU core usage.
|
|
82
|
+
// Then applying `innerHTML` hangs the browser.
|
|
83
|
+
|
|
84
|
+
// document.getElementById('result-table').innerHTML =
|
|
85
|
+
// '<table>' +
|
|
86
|
+
// '<tbody>' +
|
|
87
|
+
// data.map(function (row) {
|
|
88
|
+
// return '<tr>' +
|
|
89
|
+
// row.map(function (cell) {
|
|
90
|
+
// return '<td>' +
|
|
91
|
+
// (cell === null ? '' : cell) +
|
|
92
|
+
// '</td>'
|
|
93
|
+
// }).join('') +
|
|
94
|
+
// '</tr>'
|
|
95
|
+
// }).join('') +
|
|
96
|
+
// '</tbody>' +
|
|
97
|
+
// '</table>'
|
|
92
98
|
}, function (error) {
|
|
93
99
|
console.error(error)
|
|
94
100
|
alert("Error while parsing Excel file. See console output for the error stack trace.")
|