@louis.jln/extract-date 3.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.flowconfig +3 -0
- package/LICENSE +24 -0
- package/README.md +184 -0
- package/bundle/extract-date.js +61276 -0
- package/dist/calculateSpecificity.js +23 -0
- package/dist/calculateSpecificity.js.flow +21 -0
- package/dist/calculateSpecificity.js.map +1 -0
- package/dist/createFormats.js +165 -0
- package/dist/createFormats.js.flow +373 -0
- package/dist/createFormats.js.map +1 -0
- package/dist/createMovingChunks.js +20 -0
- package/dist/createMovingChunks.js.flow +19 -0
- package/dist/createMovingChunks.js.map +1 -0
- package/dist/dictionary.json +3792 -0
- package/dist/extractDate.js +148 -0
- package/dist/extractDate.js.flow +214 -0
- package/dist/extractDate.js.map +1 -0
- package/dist/extractRelativeDate.js +32 -0
- package/dist/extractRelativeDate.js.flow +34 -0
- package/dist/extractRelativeDate.js.map +1 -0
- package/dist/index.js +11 -0
- package/dist/index.js.flow +6 -0
- package/dist/index.js.map +1 -0
- package/dist/normalizeInput.js +22 -0
- package/dist/normalizeInput.js.flow +26 -0
- package/dist/normalizeInput.js.map +1 -0
- package/dist/types.js +2 -0
- package/dist/types.js.flow +23 -0
- package/dist/types.js.map +1 -0
- package/package.json +96 -0
- package/src/calculateSpecificity.js +21 -0
- package/src/createFormats.js +373 -0
- package/src/createMovingChunks.js +19 -0
- package/src/dictionary.json +3792 -0
- package/src/extractDate.js +214 -0
- package/src/extractRelativeDate.js +34 -0
- package/src/index.js +6 -0
- package/src/normalizeInput.js +26 -0
- package/src/types.js +23 -0
package/.flowconfig
ADDED
package/LICENSE
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
Copyright (c) 2018, Gajus Kuizinas (http://gajus.com/)
|
|
2
|
+
All rights reserved.
|
|
3
|
+
|
|
4
|
+
Redistribution and use in source and binary forms, with or without
|
|
5
|
+
modification, are permitted provided that the following conditions are met:
|
|
6
|
+
* Redistributions of source code must retain the above copyright
|
|
7
|
+
notice, this list of conditions and the following disclaimer.
|
|
8
|
+
* Redistributions in binary form must reproduce the above copyright
|
|
9
|
+
notice, this list of conditions and the following disclaimer in the
|
|
10
|
+
documentation and/or other materials provided with the distribution.
|
|
11
|
+
* Neither the name of the Gajus Kuizinas (http://gajus.com/) nor the
|
|
12
|
+
names of its contributors may be used to endorse or promote products
|
|
13
|
+
derived from this software without specific prior written permission.
|
|
14
|
+
|
|
15
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
|
16
|
+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
|
17
|
+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
|
18
|
+
DISCLAIMED. IN NO EVENT SHALL ANUARY BE LIABLE FOR ANY
|
|
19
|
+
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
|
20
|
+
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
|
21
|
+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
|
22
|
+
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
|
23
|
+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
|
24
|
+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
package/README.md
ADDED
|
@@ -0,0 +1,184 @@
|
|
|
1
|
+
<a name="extract-date"></a>
|
|
2
|
+
# extract-date 📅
|
|
3
|
+
|
|
4
|
+
[](https://travis-ci.org/gajus/extract-date)
|
|
5
|
+
[](https://coveralls.io/github/gajus/extract-date)
|
|
6
|
+
[](https://www.npmjs.org/package/extract-date)
|
|
7
|
+
[](https://github.com/gajus/canonical)
|
|
8
|
+
[](https://twitter.com/kuizinas)
|
|
9
|
+
|
|
10
|
+
Extracts date from an arbitrary text input.
|
|
11
|
+
|
|
12
|
+
* [extract-date 📅](#extract-date)
|
|
13
|
+
* [Features](#extract-date-features)
|
|
14
|
+
* [Motivation](#extract-date-motivation)
|
|
15
|
+
* [Use case](#extract-date-use-case)
|
|
16
|
+
* [Usage](#extract-date-usage)
|
|
17
|
+
* [Configuration](#extract-date-usage-configuration)
|
|
18
|
+
* [Resolution of ambiguous dates](#extract-date-resolution-of-ambiguous-dates)
|
|
19
|
+
* [Date resolution without year](#extract-date-resolution-of-ambiguous-dates-date-resolution-without-year)
|
|
20
|
+
* [Implementation](#extract-date-implementation)
|
|
21
|
+
* [Input tokenisation](#extract-date-implementation-input-tokenisation)
|
|
22
|
+
* [Format specification](#extract-date-implementation-format-specification)
|
|
23
|
+
* [Related projects](#extract-date-related-projects)
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
<a name="extract-date-features"></a>
|
|
27
|
+
## Features
|
|
28
|
+
|
|
29
|
+
* Deterministic and unambiguous date parsing (input must include year; see [Date resolution without year](#date-resolution-without-year))
|
|
30
|
+
* No date format configuration.
|
|
31
|
+
* Recognises relative dates (yesterday, today, tomorrow).
|
|
32
|
+
* Recognises weekdays (Monday, Tuesday, etc.).
|
|
33
|
+
* Supports timezones (for relative date resolution) and locales.
|
|
34
|
+
|
|
35
|
+
<a name="extract-date-motivation"></a>
|
|
36
|
+
## Motivation
|
|
37
|
+
|
|
38
|
+
I am creating a large scale data aggregation platform (https://applaudience.com/). I have observed that the date-matching patterns and site specific date validation logic is repeating and could be abstracted into a universal function as long as minimum information about the expected pattern is provided (such as the `direction` configuration). My motivation for creating such abstraction is to reduce the amount of repetitive logic that we use to extract dates from multiple sources.
|
|
39
|
+
|
|
40
|
+
<a name="extract-date-use-case"></a>
|
|
41
|
+
## Use case
|
|
42
|
+
|
|
43
|
+
The intended use case is extracting date of future events from blobs of text that may contain auxiliary information, e.g. 'Event at 14:00 2019-01-01 (2D)'.
|
|
44
|
+
|
|
45
|
+
The emphasis on the _future_ events is because resolving dates such 'today' (relative dates) and 'Wednesday' (weekday dates) requires knowing the offset date. If your input sources refer predominantly to future events, then the ambiguity can be resolved using the present date.
|
|
46
|
+
|
|
47
|
+
<a name="extract-date-usage"></a>
|
|
48
|
+
## Usage
|
|
49
|
+
|
|
50
|
+
```js
|
|
51
|
+
import extractDate from 'extract-date';
|
|
52
|
+
|
|
53
|
+
extractDate('extracts date from anywhere within the input 2000-01-02');
|
|
54
|
+
// [{date: '2000-01-02'}]
|
|
55
|
+
|
|
56
|
+
extractDate('extracts multiple dates located anywhere within the input: 2000-01-02, 2000-01-03');
|
|
57
|
+
// [{date: '2000-01-02'}, {date: '2000-01-03'}]
|
|
58
|
+
|
|
59
|
+
extractDate('ignores ambiguous dates 02/01/2000');
|
|
60
|
+
// []
|
|
61
|
+
|
|
62
|
+
extractDate('uses `direction` to resolve ambiguous dates 02/01/2000', {direction: 'DMY'});
|
|
63
|
+
// [{date: '2000-01-02'}]
|
|
64
|
+
|
|
65
|
+
extractDate('uses `timezone` to resolve relative dates such as today or tomorrow', {timezone: 'Europe/London'});
|
|
66
|
+
// [{date: '2000-01-02'}, {date: '2000-01-03'}] (assuming that today is 2000-01-02)
|
|
67
|
+
|
|
68
|
+
extractDate('extracts dates using locales May 1, 2017', {locale: 'en'});
|
|
69
|
+
// [{date: '2015-05-01'}]
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
<a name="extract-date-usage-configuration"></a>
|
|
74
|
+
### Configuration
|
|
75
|
+
|
|
76
|
+
|Name|Description|Default|
|
|
77
|
+
|---|---|---|
|
|
78
|
+
|`direction`|Token identifying the order of numeric date attributes within the string. Possible values: DM, DMY, DYM, MD, YDM, YMD. Used to resolve ambiguous dates, e.g. DD/MM/YYYY and MM/DD/YYYY.|N/A|
|
|
79
|
+
|`locale`|Required when date includes localized names (e.g. month names)|N/A|
|
|
80
|
+
|`maximumAge`|See [Date resolution without year](#date-resolution-without-year).|`Infinity`|
|
|
81
|
+
|`minimumAge`|See [Date resolution without year](#date-resolution-without-year).|`Infinity`|
|
|
82
|
+
|`timezone`|[TZ database name](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). Used to resolve relative dates ("Today", "Tomorrow").|N/A|
|
|
83
|
+
|
|
84
|
+
<a name="extract-date-resolution-of-ambiguous-dates"></a>
|
|
85
|
+
## Resolution of ambiguous dates
|
|
86
|
+
|
|
87
|
+
<a name="extract-date-resolution-of-ambiguous-dates-date-resolution-without-year"></a>
|
|
88
|
+
### Date resolution without year
|
|
89
|
+
|
|
90
|
+
When year is not part of the input (e.g. March 2nd), then `minimumAge` and `maximumAge` configuration determines the year value.
|
|
91
|
+
|
|
92
|
+
* If the difference between the current month and the parsed month is greater or equal to `minimumAge`, then the year value is equal to the current year +1.
|
|
93
|
+
* If the difference between the current month and the parsed month is lower or equal to `maximumAge`, then the year value is equal to the current year -1.
|
|
94
|
+
* If the difference is within those two ranges, then the current year value is used.
|
|
95
|
+
|
|
96
|
+
Example:
|
|
97
|
+
|
|
98
|
+
* If the current date is 2000-12-01 and the parsed date is 10-01, then the month difference is -2.
|
|
99
|
+
* If `minimumAge` is 2, then the final date is 2001-10-01.
|
|
100
|
+
* If `minimumAge` is 3, then the final date is 2000-10-01.
|
|
101
|
+
|
|
102
|
+
* If the current date is 2000-01-01 and the input date is 10-01, then the month difference is 9.
|
|
103
|
+
* If `maximumAge` is 10, then the final date is 2000-10-01.
|
|
104
|
+
* If `maximumAge` is 9, then the final date is 1999-10-01.
|
|
105
|
+
|
|
106
|
+
Note: `minimumAge` comparison is done using absolute difference value.
|
|
107
|
+
|
|
108
|
+
<a name="extract-date-implementation"></a>
|
|
109
|
+
## Implementation
|
|
110
|
+
|
|
111
|
+
Note: This section of the documentation is included for contributors.
|
|
112
|
+
|
|
113
|
+
* `extract-date` includes a collection of formats ([`./src/createFormats.js`](./src/createFormats.js)).
|
|
114
|
+
* Individual formats define their expectations (see [Format specification](#format-specification)).
|
|
115
|
+
* Formats are attempted in the order of their specificity, i.e. "YYYY-MM-DD" is attempted before "MM-DD".
|
|
116
|
+
* Formats are attempted against a tokenised version of the input (see [Input tokenisation](#input-tokenisation)).
|
|
117
|
+
* Matching date format advances further search past the matching date string.
|
|
118
|
+
|
|
119
|
+
<a name="extract-date-implementation-input-tokenisation"></a>
|
|
120
|
+
### Input tokenisation
|
|
121
|
+
|
|
122
|
+
* Individual formats define how many words make up the date.
|
|
123
|
+
* `extract-date` splits input string into a collection of slices pairing words into phrases of the required length.
|
|
124
|
+
* Format is attempted against each resulting phrase.
|
|
125
|
+
|
|
126
|
+
Example:
|
|
127
|
+
|
|
128
|
+
Given input "foo bar baz qux" and format:
|
|
129
|
+
|
|
130
|
+
```js
|
|
131
|
+
{
|
|
132
|
+
direction: 'YMD',
|
|
133
|
+
localised: false,
|
|
134
|
+
dateFnsFormat: 'YYYY MM.DD',
|
|
135
|
+
wordCount: 2,
|
|
136
|
+
yearIsExplicit: true
|
|
137
|
+
}
|
|
138
|
+
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Input is broken down into:
|
|
142
|
+
|
|
143
|
+
* "foo bar"
|
|
144
|
+
* "bar baz"
|
|
145
|
+
* "baz qux"
|
|
146
|
+
|
|
147
|
+
collection and the format is attempted against each phrase until a match is found.
|
|
148
|
+
|
|
149
|
+
<a name="extract-date-implementation-format-specification"></a>
|
|
150
|
+
### Format specification
|
|
151
|
+
|
|
152
|
+
|Field|Description|
|
|
153
|
+
|---|---|
|
|
154
|
+
|`direction`|Identifies the order of numeric date attributes within the string. Possible values: DMY, DYM, YDM, YMD. Used to resolve ambiguous dates, e.g. DD/MM/YYYY and MM/DD/YYYY.|
|
|
155
|
+
|`localised`|Identifies if the date is localised, i.e. includes names of the week day or month. A format that is localised is used only when `locale` configuration is provided.|
|
|
156
|
+
|`dateFnsFormat`|Identifies [`date-fns`](https://www.npmjs.org/package/date-fns) format used to attempt date extraction.|
|
|
157
|
+
|`wordCount`|Identifies how many words make up the date format.|
|
|
158
|
+
|`yearIsExplicit`|Identifies whether the date format includes year.|
|
|
159
|
+
|
|
160
|
+
Example formats:
|
|
161
|
+
|
|
162
|
+
```js
|
|
163
|
+
{
|
|
164
|
+
direction: 'YMD',
|
|
165
|
+
localised: false,
|
|
166
|
+
dateFnsFormat: 'YYYY.MM.DD',
|
|
167
|
+
wordCount: 1,
|
|
168
|
+
yearIsExplicit: true
|
|
169
|
+
},
|
|
170
|
+
{
|
|
171
|
+
direction: 'DD MMMM',
|
|
172
|
+
localised: true,
|
|
173
|
+
dateFnsFormat: 'DD MMMM',
|
|
174
|
+
wordCount: 2,
|
|
175
|
+
yearIsExplicit: false
|
|
176
|
+
},
|
|
177
|
+
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
<a name="extract-date-related-projects"></a>
|
|
181
|
+
## Related projects
|
|
182
|
+
|
|
183
|
+
* [`extract-price`](https://github.com/gajus/extract-price) – Extracts price from an arbitrary text input.
|
|
184
|
+
* [`extract-time`](https://github.com/gajus/extract-time) – Extracts time from an arbitrary text input.
|