read-excel-file 8.0.1 → 8.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -26,11 +26,12 @@
26
26
  * `getEmptyArrayValue` → `transformEmptyArray`
27
27
  * The leading `.` character is now removed from the `path` parameter.
28
28
  * Previously, when parsing comma-separated values, it used to ignore any commas that're surrounded by quotes, similar to how it's done in `.csv` files. Now it no longer does that.
29
+ * Previously, when parsing comma-separated values, it used to allow empty-string elements. Now it no longer does that and such empty-string elements will now result in an error with properties: `{ error: "invalid", reason: "syntax" }`.
29
30
  * Previously, when parsing using a schema, it used to force-convert all `type: Date` schema properties from any numeric cell value to a `Date` with a given timestamp. Now it demands the cell values for all such `type: Date` schema properties to already be correctly recognized as `Date`s when they're returned from `readSheet()` or `readExcelFile()` function. And I'd personally assume that in any sane (non-contrived) real-world usage scenario that would be the case, so it doesn't really seem like a "breaking change". And if, for some strange reason, that happens not to be the case, `parseData()` function will throw an error: `not_a_date`.
30
31
  * Previously, when parsing using a schema, it used to skip `required` validation for completely-empty rows. It no longer does that.
31
32
  * Removed exported function `parseExcelDate()` because there seems to be no need to have it exported.
32
33
  * (TypeScript) Renamed exported types:
33
- * `Type` → `ParseDataValueType`
34
+ * `Type` → `ParseDataCustomType`
34
35
  * `Error` or `SchemaParseCellValueError` → `ParseDataError`
35
36
  * `CellValueRequiredError` → `ParseDataValueRequiredError`
36
37
  * `ParsedObjectsResult` → `ParseDataResult`
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # `read-excel-file`
2
2
 
3
- Read `.xlsx` files in a browser or in Node.js.
3
+ Read `.xlsx` files in a browser or Node.js.
4
4
 
5
5
  It also supports parsing spreadsheet rows into JSON objects using a [schema](#schema).
6
6
 
@@ -49,52 +49,67 @@ Also check out [`write-excel-file`](https://www.npmjs.com/package/write-excel-fi
49
49
  * `getEmptyArrayValue` → `transformEmptyArray`
50
50
  * The leading `.` character is now removed from the `path` parameter.
51
51
  * Previously, when parsing comma-separated values, it used to ignore any commas that're surrounded by quotes, similar to how it's done in `.csv` files. Now it no longer does that.
52
+ * Previously, when parsing comma-separated values, it used to allow empty-string elements. Now it no longer does that and such empty-string elements will now result in an error with properties: `{ error: "invalid", reason: "syntax" }`.
52
53
  * Previously, when parsing using a schema, it used to force-convert all `type: Date` schema properties from any numeric cell value to a `Date` with a given timestamp. Now it demands the cell values for all such `type: Date` schema properties to already be correctly recognized as `Date`s when they're returned from `readSheet()` or `readExcelFile()` function. And I'd personally assume that in any sane (non-contrived) real-world usage scenario that would be the case, so it doesn't really seem like a "breaking change". And if, for some strange reason, that happens not to be the case, `parseData()` function will throw an error: `not_a_date`.
53
54
  * Previously, when parsing using a schema, it used to skip `required` validation for completely-empty rows. It no longer does that.
54
55
  * Removed exported function `parseExcelDate()` because there seems to be no need to have it exported.
55
56
  * (TypeScript) Renamed exported types:
56
- * `Type` → `ParseDataValueType`
57
+ * `Type` → `ParseDataCustomType`
57
58
  * `Error` or `SchemaParseCellValueError` → `ParseDataError`
58
59
  * `CellValueRequiredError` → `ParseDataValueRequiredError`
59
60
  * `ParsedObjectsResult` → `ParseDataResult`
60
61
  </details>
61
62
 
62
- ## Performance
63
-
64
- Here're the results of reading [sample `.xlsx` files](https://examplefile.com/document/xlsx) of different size:
65
-
66
- |File Size| Browser | Node.js |
67
- |---------|---------|-----------|
68
- | 1 MB | 0.2 sec.| 0.25 sec. |
69
- | 10 MB | 1.5 sec.| 2 sec. |
70
- | 50 MB | 8.5 sec.| 14 sec. |
71
-
72
63
  ## Install
73
64
 
74
65
  ```js
75
66
  npm install read-excel-file --save
76
67
  ```
77
68
 
78
- Alternatively, one could include it on a web page [directly](#cdn) via a `<script/>` tag.
69
+ Alternatively, it could be included on a web page [directly](#cdn) via a `<script/>` tag.
79
70
 
80
71
  ## Use
81
72
 
82
- The default exported function let's call it `readExcelFile()` reads an `.xslx` file and returns a `Promise` that resolves to an array of "sheets". At least one "sheet" always exists. Each "sheet" is an object with properties:
83
- * `sheet` — Sheet name.
84
- * Example: `"Sheet1"`
85
- * `data` Sheet data. An array of rows. Each row is an array of values — `string`, `number`, `boolean` or `Date`.
86
- * Example: `[ ['John Smith',35,true,...], ['Kate Brown',28,false,...], ... ]`
73
+ If your `.xlsx` file only has a single "sheet", or if you only care for a single "sheet", or if you don't know or care what a "sheet" is, use `readSheet()` function.
74
+
75
+ | Name | Date of Birth | Married | Kids |
76
+ | ---------- | ------------- | ------- | ---- |
77
+ | John Smith | 1/1/1995 | TRUE | 3 |
78
+ | Kate Brown | 3/1/2010 | FALSE | 0 |
87
79
 
88
80
  ```js
81
+ import { readSheet } from 'read-excel-file/node'
82
+
83
+ await readSheet(file)
84
+
85
+ // Returns
86
+ [
87
+ ['Name', 'Date of Birth', 'Married', 'Kids'],
88
+ ['John Smith', 1995-01-01T00:00:00.000Z, true, 3],
89
+ ['Kate Brown', 2010-03-01T00:00:00.000Z, false, 0]
90
+ ]
91
+ ```
92
+
93
+ It resolves to an array of rows. Each row is an array of values — `string`, `number`, `boolean` or `Date`.
94
+
95
+ <!-- It's same as the default exported function shown above with the only difference that it returns just `data` instead of `[{ name: 'Sheet1', data }]`, so it's just a bit simpler to use. It has an optional second argument — `sheet` — which could be a sheet number (starting from `1`) or a sheet name. By default, it reads the first sheet. -->
96
+
97
+ And it has an optional second argument — `sheet` — which could be a sheet number (starting from `1`) or a sheet name. By default, it reads the first sheet.
98
+
99
+ But if you need to read all "sheets" for some reason, use the default exported function which resolves to an array of "sheets".
100
+
101
+ ```js
102
+ import readExcelFile from 'read-excel-file/node'
103
+
89
104
  await readExcelFile(file)
90
105
 
91
106
  // Returns
92
107
  [{
93
108
  sheet: 'Sheet1',
94
109
  data: [
95
- ['John Smith',35,true,...],
96
- ['Kate Brown',28,false,...],
97
- ...
110
+ ['Name', 'Age'],
111
+ ['John Smith', 30],
112
+ ['Kate Brown', 15]
98
113
  ]
99
114
  }, {
100
115
  sheet: 'Sheet2',
@@ -102,20 +117,15 @@ await readExcelFile(file)
102
117
  }]
103
118
  ```
104
119
 
105
- In simple cases when there're no multiple sheets in an `.xlsx` file, or if only one sheet in an `.xlsx` file is of any interest, use a named exported function `readSheet()`. It's same as the default exported function shown above with the only difference that it returns just `data` instead of `[{ name: 'Sheet1', data }]`, so it's just a bit simpler to use. It has an optional second argument — `sheet` — which could be a sheet number (starting from `1`) or a sheet name. By default, it reads the first sheet.
106
-
107
- ```js
108
- await readSheet(file)
120
+ At least one "sheet" always exists. Each "sheet" is an object with properties:
121
+ * `sheet` — Sheet name.
122
+ * Example: `"Sheet1"`
123
+ * `data` — Sheet data. An array of rows. Each row is an array of values — `string`, `number`, `boolean` or `Date`.
124
+ * Example: `[ ['Name','Age'], ['John Smith',30], ['Kate Brown',15] ]`
109
125
 
110
- // Returns
111
- [
112
- ['John Smith',35,true,...],
113
- ['Kate Brown',28,false,...],
114
- ...
115
- ]
116
- ```
126
+ ## API
117
127
 
118
- As for where to `import` those two functions from, the package provides a separate `import` path for each different environment, as described below.
128
+ This package provides a separate `import` path for each different environment, as described below.
119
129
 
120
130
  ### Browser
121
131
 
@@ -259,6 +269,16 @@ readExcelFile(file, {
259
269
 
260
270
  This package doesn't support reading cells that use formulas to calculate the value: `SUM`, `AVERAGE`, etc.
261
271
 
272
+ ## Performance
273
+
274
+ Here're the results of reading [sample `.xlsx` files](https://examplefile.com/document/xlsx) of different size:
275
+
276
+ |File Size| Browser | Node.js |
277
+ |---------|---------|-----------|
278
+ | 1 MB | 0.2 sec.| 0.25 sec. |
279
+ | 10 MB | 1.5 sec.| 2 sec. |
280
+ | 50 MB | 8.5 sec.| 14 sec. |
281
+
262
282
  ## Schema
263
283
 
264
284
  Oftentimes, the task is not just to read the "raw" spreadsheet data but also to convert each row of that data to a JSON object having a certain structure. Because it's such a common task, this package exports a named function `parseData(data, schema)` which does exactly that. It parses sheet data into an array of JSON objects according to a pre-defined `schema` which describes how should a row of data be converted to a JSON object.
@@ -339,105 +359,222 @@ Example:
339
359
 
340
360
  ```js
341
361
  // An example .xlsx document:
342
- // -----------------------------------------------------------------------------------------
343
- // | START DATE | NUMBER OF STUDENTS | IS FREE | COURSE TITLE | CONTACT | STATUS |
344
- // -----------------------------------------------------------------------------------------
345
- // | 03/24/2018 | 10 | true | Chemistry | (123) 456-7890 | SCHEDULED |
346
- // -----------------------------------------------------------------------------------------
362
+ // --------------------------------------------------------------------------------------------------------
363
+ // | START DATE | SEATS | STATUS | CONTACT | COURSE TITLE | COURSE CATEGORY | COURSE IS FREE |
364
+ // --------------------------------------------------------------------------------------------------------
365
+ // | 03/24/2018 | 10 | SCHEDULED | (123) 456-7890 | Basic Algebra | Math, Arithmetic | TRUE |
366
+ // --------------------------------------------------------------------------------------------------------
347
367
 
348
368
  const schema = {
349
- date: {
369
+ startDate: {
350
370
  column: 'START DATE',
351
371
  type: Date
352
372
  },
353
- numberOfStudents: {
354
- column: 'NUMBER OF STUDENTS',
373
+ seats: {
374
+ column: 'SEATS',
355
375
  type: Number,
356
376
  required: true
357
377
  },
358
- // Nested object example.
359
- course: {
360
- schema: {
361
- isFree: {
362
- column: 'IS FREE',
363
- type: Boolean
364
- },
365
- title: {
366
- column: 'COURSE TITLE',
367
- type: String
368
- }
369
- }
370
- // required: true/false
371
- },
372
- contact: {
373
- column: 'CONTACT',
374
- required: true,
375
- // A custom `type` transformation function can be specified.
376
- // It will transform the cell value if it's not empty.
377
- type: (value) => {
378
- const number = parsePhoneNumber(value)
379
- if (!number) {
380
- throw new Error('invalid')
381
- }
382
- return number
383
- }
384
- },
385
378
  status: {
386
379
  column: 'STATUS',
387
380
  type: String,
381
+ // An example of using `oneOf`
388
382
  oneOf: [
389
383
  'SCHEDULED',
390
384
  'STARTED',
391
385
  'FINISHED'
392
386
  ]
387
+ },
388
+ contact: {
389
+ column: 'CONTACT',
390
+ required: true,
391
+ // An example of using a custom `type`
392
+ type: PhoneNumber
393
+ },
394
+ // Nested object example
395
+ course: {
396
+ // required: true/false,
397
+ schema: {
398
+ title: {
399
+ column: 'COURSE TITLE',
400
+ type: String
401
+ },
402
+ categories: {
403
+ column: 'COURSE CATEGORY',
404
+ // An example of parsing comma-separated values
405
+ type: [String]
406
+ },
407
+ isFree: {
408
+ column: 'COURSE IS FREE',
409
+ type: Boolean
410
+ }
411
+ }
393
412
  }
394
413
  }
395
414
 
415
+ // If this code was written in TypeScript, `schema` would've been declared as:
416
+ // const schema: Schema<Object, ColumnTitle> = { ... }
417
+
418
+ // Read `data` from an `.xlsx` file
396
419
  const data = await readSheet(file)
397
420
 
398
- const { rows, errors } = parseData(data, schema)
421
+ // Parse `data` using the `schema`
422
+ const results = parseData(data, schema)
399
423
 
400
- // `errors` list items have shape: `{ row, column, error, reason?, value?, type? }`.
401
- errors.length === 0
424
+ // There's one data row in the `.xlsx` file.
425
+ results.length === 1
402
426
 
403
- rows === [{
404
- date: new Date(2018, 3 - 1, 24),
405
- numberOfStudents: 10,
406
- course: {
407
- isFree: true,
408
- title: 'Chemistry'
409
- },
427
+ // There have been no errors when parsing the first data row, so `errors` is `undefined`.
428
+ // Should there have been any errors when parsing the row, `errors` would've been an array
429
+ // with items having shape: `{ column, error, reason?, value?, type? }`.
430
+ results[0].errors === undefined
431
+
432
+ results[0].object === {
433
+ startDate: new Date(Date.UTC(2018, 3 - 1, 24)),
434
+ seats: 10,
435
+ status: 'SCHEDULED',
410
436
  contact: '+11234567890',
411
- status: 'SCHEDULED'
412
- }]
437
+ course: {
438
+ title: 'Basic Algebra',
439
+ categories: ['Math', 'Arithmetic']
440
+ isFree: true
441
+ }
442
+ }
443
+
444
+ // An example of a custom `type` parser function.
445
+ // It will parse the cell value when it's not empty.
446
+ function PhoneNumber(value) {
447
+ const number = parsePhoneNumber(value)
448
+ if (!number) {
449
+ throw new Error('invalid')
450
+ }
451
+ return number
452
+ }
413
453
  ```
414
454
 
415
- <!-- #### Schema: Tips and Features -->
455
+ An example of how an application could handle the `results`:
416
456
 
417
- <!-- If no `type` is specified then the cell value is returned "as is": as a string, number, date or boolean. -->
457
+ ```js
458
+ const errors = []
459
+ const objects = []
460
+
461
+ // If this code was written in TypeScript, `errors` and `objects` would've been declared as:
462
+ // const errors: { error: ParseDataError, row: number }[] = []
463
+ // const objects: Object[] = []
464
+
465
+ let row = 1
466
+ for (const { errors: errorsInRow, object } of results) {
467
+ if (errorsInRow) {
468
+ for (const error of errorsInRow) {
469
+ errors.push({ error, row })
470
+ }
471
+ } else {
472
+ objects.push(object)
473
+ }
474
+ row++
475
+ }
418
476
 
419
- <!-- There are also some additional exported `type`s available: -->
477
+ if (errors.length > 0) {
478
+ for (const { error, row } of errors) {
479
+ console.error('Error in data row', row, 'column', error.column, ':', error.error, error.reason || '')
480
+ }
481
+ } else {
482
+ console.log('Objects', objects)
483
+ }
484
+ ```
420
485
 
421
486
  <details>
422
- <summary>An example of a <strong>custom <code>type</code></strong></summary>
487
+ <summary>An example of defining a <strong>custom <code>type</code></strong> in <strong>TypeScript</strong></summary>
423
488
 
424
489
  #####
425
490
 
426
- Here's an example of a basic custom `type`. It calls a custom `parseValue()` function to parse a cell value, and produces an `"invalid"` error if the value couldn't be parsed. If a cell is empty, it will not be parsed.
491
+ ```ts
492
+ import type {
493
+ Schema,
494
+ CellValue,
495
+ ParseDataError,
496
+ ParseDataCustomType,
497
+ ParseDataCustomTypeErrorMessage
498
+ } from 'read-excel-file/node'
427
499
 
428
- ```js
429
- {
430
- property: {
431
- column: 'COLUMN TITLE',
432
- type: (value) => {
433
- try {
434
- return parseValue(value)
435
- } catch (error) {
436
- console.error(error)
437
- throw new Error('invalid')
438
- }
500
+ type ColumnTitle = 'COLUMN TITLE 1' | 'COLUMN TITLE 2'
501
+
502
+ type CustomTypeValue = string
503
+
504
+ function CustomType(value: CellValue): CustomTypeValue {
505
+ if (typeof value !== 'string') {
506
+ throw new Error('not_a_string')
507
+ }
508
+ return '~' + value + '~'
509
+ }
510
+
511
+ type CustomTypeErrorMessage<Type extends ParseDataCustomType<unknown>> =
512
+ Type extends typeof CustomType
513
+ ? 'not_a_string'
514
+ : never
515
+
516
+ // type CustomTypeErrorReason<
517
+ // Type extends ParseDataCustomType<unknown>,
518
+ // ErrorMessage extends ParseDataCustomTypeErrorMessage<Type>
519
+ // > =
520
+ // Type extends typeof CustomType
521
+ // ? (ErrorMessage extends 'not_a_string' ? undefined : never)
522
+ // : never
523
+
524
+ type PossibleError = ParseDataError<
525
+ ColumnTitle,
526
+ typeof CustomType,
527
+ CustomTypeErrorMessage<typeof CustomType>
528
+ // CustomTypeErrorReason<typeof CustomType, CustomTypeErrorMessage<typeof CustomType>>
529
+ >
530
+
531
+ interface Object {
532
+ property1: CustomTypeValue;
533
+ property2?: string;
534
+ }
535
+
536
+ const schema: Schema<Object, ColumnTitle> = {
537
+ property1: {
538
+ column: 'COLUMN TITLE 1',
539
+ type: CustomType,
540
+ required: true
541
+ },
542
+ property2: {
543
+ column: 'COLUMN TITLE 2',
544
+ type: String
545
+ }
546
+ }
547
+
548
+ const results = parseData<Object, ColumnTitle, PossibleError>([
549
+ ['COLUMN TITLE 1', 'COLUMN TITLE 2'],
550
+ ['Value 1', 'Value 2']
551
+ ], schema)
552
+
553
+ const errors: {
554
+ error: PossibleError,
555
+ row: number
556
+ }[] = []
557
+
558
+ const objects: Object[] = []
559
+
560
+ let row = 1
561
+ for (const { errors: errorsInRow, object } of results) {
562
+ if (errorsInRow) {
563
+ for (const error of errorsInRow) {
564
+ errors.push({ error, row })
439
565
  }
566
+ } else {
567
+ objects.push(object)
568
+ }
569
+ row++
570
+ }
571
+
572
+ if (errors.length > 0) {
573
+ for (const { error, row } of errors) {
574
+ console.error('Error in data row', row, 'column', error.column, ':', error.error, error.reason || '')
440
575
  }
576
+ } else {
577
+ console.log('Objects', objects)
441
578
  }
442
579
  ```
443
580
  </details>
@@ -18,6 +18,10 @@ import {
18
18
  Schema
19
19
  } from '../types/parseData/parseDataSchema.d.js';
20
20
 
21
+ import {
22
+ ParseDataError
23
+ } from '../types/parseData/parseDataError.d.js';
24
+
21
25
  export {
22
26
  CellValue,
23
27
  Row,
@@ -25,7 +29,12 @@ export {
25
29
  } from '../types/types.d.js';
26
30
 
27
31
  export {
28
- ParseDataValueCustomType as ParseDataValueType,
32
+ ParseDataCustomType,
33
+ // Base `type`s when parsing data.
34
+ StringType as String,
35
+ DateType as Date,
36
+ NumberType as Number,
37
+ BooleanType as Boolean,
29
38
  // Additional built-in `type`s when parsing data.
30
39
  Integer,
31
40
  Email,
@@ -33,6 +42,8 @@ export {
33
42
  } from '../types/parseData/parseDataValueType.d.js';
34
43
 
35
44
  export {
45
+ ParseDataCustomTypeErrorMessage,
46
+ ParseDataCustomTypeErrorReason,
36
47
  ParseDataError,
37
48
  ParseDataValueRequiredError
38
49
  } from '../types/parseData/parseDataError.d.js';
@@ -63,9 +74,10 @@ export function readSheet<ParsedNumber = number>(
63
74
 
64
75
  export function parseData<
65
76
  Object extends object,
66
- ColumnTitle extends string
77
+ ColumnTitle extends string,
78
+ Error extends ParseDataError
67
79
  >(
68
80
  data: SheetData,
69
81
  schema: Schema<Object, ColumnTitle>,
70
82
  options?: ParseDataOptions
71
- ): ParseDataResult<Object>;
83
+ ): ParseDataResult<Object, Error>;
@@ -253,6 +253,12 @@ function parseDataCellValue_(cellValue, schemaEntry, propertyPath, options) {
253
253
  if (errors.length > 0) {
254
254
  return;
255
255
  }
256
+ // If an empty substring was extracted, it means that there was an out-of-place separator.
257
+ if (!substring) {
258
+ errors.push('invalid');
259
+ reasons.push('syntax');
260
+ return;
261
+ }
256
262
  var _parseValue = parseValue(substring, schemaEntry, options),
257
263
  value = _parseValue.value,
258
264
  error = _parseValue.error,