date_parser 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 83fcce6dfc6821ee2701b10b33bb9cead5aa05e7
4
+ data.tar.gz: 47922dd354451a884c45e02f0fd818f359aef78c
5
+ SHA512:
6
+ metadata.gz: 405f093a43e736aef635c60ca99655f891a42c9308d4da053e3a7aaa9cd9a8edabb93f2c1fa590141f87d289e80a348ab620468a35ef1433eeb16ef7819cd130
7
+ data.tar.gz: 4ce37b863c5901b1f4ddaa0bd68792f40f64cd3649e715d64dd7af49a6e6884088348487a3b76763b80f22380d2473a3eb1c2340105b553a77702e830604ef6d
data/NEWS.md ADDED
@@ -0,0 +1,13 @@
1
+ # DateParser 0.1.1
2
+ * Critical bug fix - Helper files now sent with the gem.
3
+
4
+ # DateParser 0.1.0
5
+
6
+ * Improved parsing algorithm.
7
+ * No longer allow "sub-solutions" inside stricter solutions.
8
+ * Parser can now correctly parse sentences like "meet me on the 10th"
9
+ * Parser can now correctly parse XX/XX dates relative to a released date.
10
+ * Now parses simple 'relative words' like "yesterday" and "today" correctly.
11
+ * Many, many more functions are now released sensitive!
12
+ * Option to parse single years (Such as "1302" alone.)
13
+ * Various smaller bug fixes.
data/README.md ADDED
@@ -0,0 +1,126 @@
1
+ # DateParser
2
+
3
+ DateParser is a simple, fast, effective way of parsing dates from natural language
4
+ text in a flexible way.
5
+
6
+ # Installation
7
+ ```
8
+ $ gem install date_parser
9
+ ```
10
+
11
+ # Usage
12
+ ```ruby
13
+ require 'date_parser'
14
+
15
+ # DateParser::parse(txt, creation_date = nil, opts)
16
+
17
+ text = "Newsflash: things happen on 02/12!"
18
+ creation_date = Date.parse("January 1st, 1994")
19
+
20
+ DateParser::parse(text, creation_date).to_s
21
+ #=> [#<Date: 1994-02-12 ((2449396j,0s,0n),+0s,2299161j)>]
22
+
23
+
24
+ text = "We should go on the 4th if we can."
25
+ creation_date = Date.parse("July 1st 2016")
26
+
27
+ DateParser::parse(text, creation_date).to_s
28
+ #=> [#<Date: 2016-07-04 ((2457574j,0s,0n),+0s,2299161j)>]
29
+
30
+
31
+ text = "Yesterday was certainly a day."
32
+ creation_date = Date.parse("January 12, 1994")
33
+
34
+ DateParser::parse(text, creation_date).to_s
35
+ #=> [#<Date: 1994-01-11 ((2449364j,0s,0n),+0s,2299161j)>]
36
+
37
+
38
+ text = "The first time I asked someone on a date, I was in Iowa, it was a " +
39
+ "winterly mid-month January, we were to go to finding nemo, I was 9, and " +
40
+ "although she was interested, her parents said no."
41
+ creation_date = Date.parse("July 6, 2016")
42
+
43
+ DateParser::parse(text, creation_date).to_s
44
+ #=> [#<Date: 2016-01-01 ((2457389j,0s,0n),+0s,2299161j)>]
45
+
46
+
47
+ text = "7-24-2015"
48
+ DateParser::parse(text).to_s
49
+ #=> [#<Date: 2015-07-24 ((2457228j,0s,0n),+0s,2299161j)>]
50
+
51
+
52
+ text = "2012-02-12"
53
+ DateParser::parse(text).to_s
54
+ #=> [#<Date: 2012-02-12 ((2455970j,0s,0n),+0s,2299161j)>]
55
+
56
+
57
+ text = "24-07-2015"
58
+ DateParser::parse(text).to_s
59
+ #=> [#<Date: 2015-07-24 ((2457228j,0s,0n),+0s,2299161j)>]
60
+
61
+
62
+ text = "In 1492 Columbus sailed the ocean blue."
63
+ creation_date = nil
64
+
65
+ DateParser::parse(text, creation_date, parse_single_years: true).to_s
66
+ #=> [#<Date: 1492-01-01 ((2266011j,0s,0n),+0s,2299161j)>]
67
+
68
+
69
+ text = "Charlie Chaplin and Jason Earles (Hannah Montana's brother) were " +
70
+ "alive at the same time for eight months in 1977"
71
+ creation_date = Date.parse("July 6, 2016")
72
+
73
+ DateParser::parse(text, creation_date, parse_single_years: true).to_s
74
+ #=> [#<Date: 1977-01-01 ((2443145j,0s,0n),+0s,2299161j)>]
75
+
76
+
77
+ text = "Sunday, Sunday, Sunday!"
78
+ DateParser::parse(text, nil, unique: false).to_s
79
+ #=> [#<Date: 2016-07-24 ((2457594j,0s,0n),+0s,2299161j)>,
80
+ #<Date: 2016-07-24 ((2457594j,0s,0n),+0s,2299161j)>,
81
+ #<Date: 2016-07-24 ((2457594j,0s,0n),+0s,2299161j)>]
82
+
83
+
84
+ text = "Sunday, Sunday, Sunday!"
85
+ DateParser::parse(text, nil, unique: true).to_s
86
+ #=> [#<Date: 2016-07-24 ((2457594j,0s,0n),+0s,2299161j)>]
87
+
88
+
89
+ DateParser::parse("No dates here", nil).to_s
90
+ #=> []
91
+
92
+
93
+ DateParser::parse("No dates here",
94
+ nil,
95
+ nil_date: Date.parse("Jan 1, 2016")
96
+ ).to_s
97
+ #=> [#<Date: 2016-01-01 ((2457389j,0s,0n),+0s,2299161j)>]
98
+ ```
99
+
100
+ # Examples
101
+
102
+ DateParser has just one function: `parse(txt, creation_date, opts)`, which
103
+ always returns an array with Date elements parsed from the text.
104
+
105
+ `parse` is case-insensitive, robust to crazy punctuation and spacing, and will
106
+ try to interpret dates in the strictest possible way before trying to find
107
+ looser interpretations. Additionally, no word can be used in more than one
108
+ Date.
109
+
110
+ For example: `DateParser::parse("Jan 12, 2013", nil, parse_single_years: true)`
111
+ will return `["Jan 12, 2013"]`, and not `["Jan 12, 2013", "Jan 1, 2013"]`
112
+
113
+ ## What is creation_date?
114
+ It's meant to make the parser smarter! `creation_date` is the date the text was
115
+ written. If provided, the parser will try to interpret dates like "Monday" relative
116
+ to the `creation_date`.
117
+
118
+ ## Options!
119
+ * `unique`: (boolean) Return only unique dates in the output array.
120
+ * `nil_date`: (Date) If no dates are found, instead of returning an empty array,
121
+ return an array containing only `nil_date`.
122
+ * `parse_single_years`: (boolean) Should the parser interpret integers as years?
123
+
124
+ # Requests or Bugs?
125
+ Leave an issue on this Github page! I'll most likely get back to you within 24
126
+ hours.
@@ -0,0 +1,82 @@
1
+ require 'date'
2
+
3
+ require_relative 'date_parser/natural_date_parsing'
4
+
5
+ # DateParser is the main interface between the user and the parser
6
+ #
7
+ # == Methods
8
+ #
9
+ # <b>parse(txt, options)</b>: Parse a block of text and return an array of the parsed
10
+ # dates as Date objects.
11
+ #
12
+
13
+ module DateParser
14
+
15
+ # Parses a text object and returns an array of parsed dates.
16
+ #
17
+ # ==== Attributes
18
+ #
19
+ # * +txt+ - The text to parse.
20
+ # * +creation_date+ - A Date object of when the text was created or released.
21
+ # Defaults to nil, but if provided can make returned dates more accurate.
22
+ #
23
+ # ==== Options
24
+ #
25
+ # * +:unique+ - Return only unique Date objects. Defaults to false
26
+ # * +:nil_date+ - A date to return if no dates are parsed. Defaults to nil.
27
+ # * +:parse_single_years+ - Should we parse single ints as years? Defaults to false.
28
+ #
29
+ # ==== Examples
30
+ #
31
+ # text = "Henry and Hanke created a calendar that causes each day to fall " +
32
+ # "on the same day of the week every year. They recommend its " +
33
+ # "implementation on January 1, 2018, a Monday."
34
+ # creation_date = Date.parse("July 6, 2016")
35
+ #
36
+ # DateParser::parse(text, creation_date)
37
+ # #=> [#<Date: 2018-01-01 ((2458120j,0s,0n),+0s,2299161j)>,
38
+ # #<Date: 2016-07-11 ((2457581j,0s,0n),+0s,2299161j)>]
39
+ #
40
+ # ######################################
41
+ # ##
42
+ # ## Option Examples:
43
+ # ##
44
+ #
45
+ # text = "Sunday, Sunday, Sunday!"
46
+ # DateParser::parse(text, nil, unique: false)
47
+ # #=> [#<Date: 2016-07-24 ((2457594j,0s,0n),+0s,2299161j)>,
48
+ # #<Date: 2016-07-24 ((2457594j,0s,0n),+0s,2299161j)>,
49
+ # #<Date: 2016-07-24 ((2457594j,0s,0n),+0s,2299161j)>]
50
+ #
51
+ # DateParser::parse(text, nil unique: true)
52
+ # #=> [#<Date: 2016-07-24 ((2457594j,0s,0n),+0s,2299161j)>]
53
+ #
54
+ # DateParser::parse("No dates here", nil)
55
+ # #=> []
56
+ #
57
+ # DateParser::parse("No dates here", nil, nil_date: Date.parse("Jan 1 2012"))
58
+ # #=> [#<Date: 2012-01-01 ((2455928j,0s,0n),+0s,2299161j)>]
59
+ #
60
+ def DateParser.parse(txt, creation_date = nil, opts = {})
61
+ unique = opts[:unique] || false
62
+ nil_date = opts[:nil_date] || nil
63
+ parse_single_years = opts[:parse_single_years] || false
64
+
65
+ interpreted_dates = NaturalDateParsing::interpret_date(txt,
66
+ creation_date,
67
+ parse_single_years)
68
+
69
+ if unique
70
+ interpreted_dates.uniq!
71
+ end
72
+
73
+ if interpreted_dates.empty?
74
+ interpreted_dates = [nil_date].flatten
75
+ end
76
+
77
+ # Possibility of getting [nil]
78
+ interpreted_dates.delete(nil)
79
+
80
+ return interpreted_dates
81
+ end
82
+ end
File without changes
@@ -0,0 +1,455 @@
1
+ require_relative 'date_utils'
2
+ require_relative "utils"
3
+
4
+ # Handles the mechanics of natural language processing.
5
+ #
6
+ # == Methods
7
+ #
8
+ # <b>interpret_date(txt, creation_date, parse_single_years)</b>:
9
+ # Return an array of dates from the set of parameters.
10
+ #
11
+ # We parse in order of decreasing
12
+ # strictness. I.e., a very specific phrase like "January 1st, 2013" will be parsed
13
+ # before "January 1st," which will be parsed before just "2013". Whenever we
14
+ # determine a phrase is part of a date, we remove the phrase after parsing.
15
+ # So in the example "January 1st, 2013" we'll return only one date.
16
+ #
17
+ # If no dates are found, returns an empty array.
18
+ #
19
+ # <b>parse_one_word(word, creation_date, parse_single_years)</b>: Given a single word,
20
+ # a string, tries to return a Date object.
21
+ #
22
+
23
+ module NaturalDateParsing
24
+
25
+ ###############################################
26
+ ##
27
+ ## Constants
28
+ ##
29
+
30
+ # Names of days as well as common shortened versions.
31
+ SINGLE_DAYS = [
32
+ 'mon', 'tue', 'wed', 'thur', 'fri', 'sat', 'sun',
33
+ 'monday', 'tuesday', 'wednesday', 'thursday', 'friday',
34
+ 'saturday', 'sunday', 'tues'
35
+ ]
36
+
37
+ # Phrases that denote a date relative to today (here often
38
+ # called the creation_date)
39
+ RELATIVE_DAYS = ['today', 'tomorrow', 'tonight', 'yesterday']
40
+
41
+ # Names of months as well as common shortened versions
42
+ MONTH = [
43
+ 'jan', 'feb', 'mar', 'may', 'june', 'july', 'aug', 'sept', 'oct',
44
+ 'nov', 'dec',
45
+ 'january', 'february', 'march', 'april', 'august', 'september',
46
+ 'october', 'november', 'december'
47
+ ]
48
+
49
+ # A list of numbers from [1, 31]
50
+ NUMERIC_DAY = ('1'..'31').to_a
51
+
52
+ # Numbers from [1, 31] as well as the common suffixes (such as 1st, 2nd, e.t.c.)
53
+ SUFFIXED_NUMERIC_DAY = [
54
+ '1st', '2nd', '3rd', '4th', '5th', '6th', '7th',
55
+ '8th', '9th', '10th', '11th', '12th', '13th', '14th',
56
+ '15th', '16th', '17th', '18th', '19th', '20th',
57
+ '21st', '22nd', '23rd','24th', '25th',
58
+ '26th', '27th', '28th', '29th', '30th', '31st'
59
+ ]
60
+
61
+ ###############################################
62
+ ##
63
+ ## Main Parsing/Processing Function
64
+ ##
65
+
66
+ # Processes a given text and returns an array of probable dates contained within.
67
+ #
68
+ # ==== Description
69
+ #
70
+ # Tries to interpret dates from the given text, in order from strictest
71
+ # interpretation to looser interpretations. No word can be part of two
72
+ # different dates.
73
+ #
74
+ # Works by calling parse_three_words, parse_two_words, and parse_one_word
75
+ # on the text.
76
+ #
77
+ # ==== Attributes
78
+ #
79
+ # * +txt+ - The text to parse.
80
+ # * +creation_date+ - A Date object of when the text was created or released.
81
+ # Defaults to nil, but if provided can make returned dates more accurate.
82
+ # * +parse_single_years+ - A boolean. If true, we interpret single numbers as
83
+ # years. This is a very broad assumption, and so defaults to false.
84
+ #
85
+ # ==== Examples
86
+ #
87
+ # text = "Henry and Hanke created a calendar that causes each day to fall " +
88
+ # "on the same day of the week every year. They recommend its " +
89
+ # "implementation on January 1, 2018, a Monday."
90
+ # creation_date = Date.parse("July 6, 2016")
91
+ #
92
+ # NaturalDateParsing.interpret_date(text, creation_date)
93
+ # #=> [#<Date: 2018-01-01 ((2458120j,0s,0n),+0s,2299161j)>,
94
+ # #<Date: 2016-07-11 ((2457581j,0s,0n),+0s,2299161j)>]
95
+ #
96
+ # NaturalDateParsing.interpret_date("No dates here!")
97
+ # #=> []
98
+ #
99
+ # NaturalDateParsing.interpret_date("2012", nil, true)
100
+ # #=> [#<Date: 2012-01-01 ((2455928j,0s,0n),+0s,2299161j)>]
101
+ #
102
+ def NaturalDateParsing.interpret_date(
103
+ txt,
104
+ creation_date = nil,
105
+ parse_single_years = false
106
+ )
107
+ possible_dates = []
108
+ txt = Utils::clean_str(txt)
109
+ words = txt.split(" ").map{|x| x.strip}
110
+
111
+ # We use the while loop, as apparently there are cases where we try to subset
112
+ # words despite the value of i being >= words.length - 3
113
+ # TODO: Figure out why the above happens. Preferably return to for loop.
114
+ # TODO: Cleaner way of structuring the below? I could break up the loops
115
+ # into single functions. Consider.
116
+ i = 0
117
+
118
+ while (i <= words.length - 3) do
119
+ subset_words = words[i..(i+2)]
120
+
121
+ proposed_date = parse_three_words(subset_words, creation_date)
122
+
123
+ if(! proposed_date.nil?)
124
+ possible_dates << proposed_date
125
+ words = Utils::delete_at_indices(words, i..(i+2))
126
+ i -= 1
127
+ end
128
+
129
+ i += 1
130
+ end
131
+
132
+ i = 0
133
+
134
+ while (i <= words.length - 2) do
135
+ subset_words = words[i..(i+1)]
136
+ proposed_date = parse_two_words(subset_words, creation_date)
137
+
138
+ if(! proposed_date.nil?)
139
+ possible_dates << proposed_date
140
+ words = Utils::delete_at_indices(words, i..(i+1))
141
+ i -= 1
142
+ end
143
+
144
+ i += 1
145
+ end
146
+
147
+ i = 0
148
+
149
+ while (i <= words.length - 1) do
150
+ subset_words = words[i]
151
+
152
+ proposed_date = parse_one_word(subset_words, creation_date, parse_single_years)
153
+
154
+ if(! proposed_date.nil?)
155
+ possible_dates << proposed_date
156
+ words.delete_at(i)
157
+ i -= 1
158
+ end
159
+
160
+ i += 1
161
+ end
162
+
163
+ return possible_dates
164
+ end
165
+
166
+
167
+ ###############################################
168
+ ##
169
+ ## Parse Cases (1 word, 2 words, 3 words)
170
+ ##
171
+
172
+ # Takes a single word and tries to return a date.
173
+ #
174
+ # If no date can be interpreted from the word, returns nil. We consider these
175
+ # cases:
176
+ # * DAY (mon, tuesday, e.t.c.)
177
+ # * A relative day (today, tomorrow, tonight, yesterday)
178
+ # * Dates of the form MM/DD
179
+ # * Numbers such as [1st, 31st]
180
+ # * MONTH (jan, february, e.t.c.)
181
+ # * YYYY (2012, 102. Must be enabled.)
182
+ # * YYYY-MM-DD, DD-MM-YYYY, MM-DD-YYYY
183
+ #
184
+ # ==== Attributes
185
+ #
186
+ # * +word+ - A String, preferably consisting of a single word.
187
+ # * +creation_date+ - A Date object of when the text was created or released.
188
+ # Defaults to nil, but if provided can make returned dates more accurate.
189
+ # * +parse_single_years+ - A boolean. If true, we interpret single numbers as
190
+ # years. This is a very broad assumption, and so defaults to false.
191
+ #
192
+ def NaturalDateParsing.parse_one_word(
193
+ word,
194
+ creation_date = nil,
195
+ parse_single_years = false
196
+ )
197
+
198
+ if SINGLE_DAYS.include? word
199
+ proposed_date = Date.parse(word)
200
+
201
+ # If we have the creation_date date, we can try to be a little smarter
202
+ if(! creation_date.nil?)
203
+ weeks_to_shift = difference_in_weeks(Date.today, creation_date)
204
+
205
+ proposed_date = proposed_date - (weeks_to_shift * 7)
206
+
207
+ # Right now though, it should be within 1 week of accuracy, and either one
208
+ # week ahead or one week behind.
209
+ # The solution is pretty simple. If the proposed date
210
+ # is more than a week ahead of the creation date, then go back one week.
211
+ if proposed_date - creation_date > 7
212
+ proposed_date = proposed_date - 7
213
+ elsif proposed_date - creation_date < 0
214
+ proposed_date = proposed_date + 7
215
+ end
216
+ end
217
+
218
+ return proposed_date
219
+ end
220
+
221
+ # Parsing phrases like "yesterday", "today", "tonight"
222
+ if RELATIVE_DAYS.include? word
223
+ if word == 'today' || word == 'tonight'
224
+ if creation_date.nil?
225
+ return Date.today
226
+ else
227
+ return creation_date
228
+ end
229
+ elsif word == 'yesterday'
230
+ if creation_date.nil?
231
+ return Date.today - 1
232
+ else
233
+ return creation_date - 1
234
+ end
235
+ elsif word == "tomorrow"
236
+ return creation_date + 1
237
+ end
238
+ end
239
+
240
+ # Parsing strings of the form XX/XX
241
+ if word.include? '/'
242
+ return slash_date(word, creation_date)
243
+ end
244
+
245
+ # Parsing strings like "23rd"
246
+ if SUFFIXED_NUMERIC_DAY.include? word
247
+ return numeric_single_day(word, creation_date)
248
+ end
249
+
250
+ # Parsing month names
251
+ if MONTH.include? word
252
+ return default_month(word, creation_date)
253
+ end
254
+
255
+ # In this case, we assume it's a year!
256
+ if parse_single_years && (Utils::is_int? word)
257
+ return default_year(word)
258
+ end
259
+
260
+ # Parsing XX-XX-XXXX or XXXX-XX-XX
261
+ if full_numeric_date?(word)
262
+ return full_numeric_date(word)
263
+ end
264
+
265
+ end
266
+
267
+
268
+ # Takes two words and tries to return a date.
269
+ #
270
+ # If no date can be interpreted from the word, returns nil. In this case,
271
+ # we look for dates of this form:
272
+ # * MONTH DAY
273
+ #
274
+ # ==== Attributes
275
+ #
276
+ # * +words+ - An array of two words, downcased and stripped.
277
+ # * +creation_date+ - A Date object of when the text was created or released.
278
+ # Defaults to nil, but if provided can make returned dates more accurate.
279
+ #
280
+ def NaturalDateParsing.parse_two_words(words, creation_date = nil)
281
+
282
+ if MONTH.include?(words[0]) && _weak_day?(words[1])
283
+ return month_day(words, creation_date)
284
+ end
285
+
286
+ end
287
+
288
+
289
+ # Takes three words and tries to return a date.
290
+ #
291
+ # If no date can be interpreted from the word, returns nil. In this case,
292
+ # assumes the word can take these forms:
293
+ # * MONTH DAY YEAR
294
+ #
295
+ # ==== Attributes
296
+ #
297
+ # * +words+ - An array of three words, downcased and stripped.
298
+ # * +creation_date+ - A Date object of when the text was created or released.
299
+ # Defaults to nil, but if provided can make returned dates more accurate.
300
+ #
301
+ def NaturalDateParsing.parse_three_words(words, creation_date = nil)
302
+
303
+ if MONTH.include?(words[0]) && _weak_day?(words[1]) && Utils::is_int?(words[2])
304
+ return Date.parse(words.join(" "))
305
+ end
306
+
307
+ end
308
+
309
+ ###############################################
310
+ ##
311
+ ## Parse Functions
312
+ ##
313
+
314
+ # Given a single word, assumes the word is of the form XX/XX and returns
315
+ # the appropriate Date object. If not possible, returns nil.
316
+ def NaturalDateParsing.slash_date(word, creation_date = nil)
317
+ samp = word.split('/')
318
+ month = samp[0].to_i
319
+ day = samp[1].to_i
320
+
321
+ if month > 0 && month <= 12 && day > 0 && day <= 31
322
+ # TODO: IMPROVE EXCEPTION HANDLING.
323
+ begin
324
+ proposed_date = Date.parse(word)
325
+ if(! creation_date.nil?) ## We're sensitive to only years here.
326
+ years_diff = Date.today.year - creation_date.year
327
+ proposed_date = proposed_date << (12 * years_diff)
328
+ end
329
+ return proposed_date
330
+ rescue ArgumentError
331
+ return nil
332
+ end
333
+ end
334
+ end
335
+
336
+ # Parses an array containing two elements (single words) on the assumption
337
+ # that the array is of the form ["MONTH", "DAY"]
338
+ def NaturalDateParsing.month_day(words, creation_date = nil)
339
+ begin
340
+ proposed_date = Date.parse(words.join(" "))
341
+
342
+ diff_in_years = creation_date.nil? ? 0 : (creation_date.year - Date.today.year)
343
+
344
+ return proposed_date >> diff_in_years * 12
345
+ rescue ArgumentError
346
+ return nil
347
+ end
348
+ end
349
+
350
+ # Parses a single numeric date (1st, 2nd, 3rd, e.t.c.).
351
+ def NaturalDateParsing.numeric_single_day(word, creation_date = nil)
352
+ diff_in_months = creation_date.nil? ? 0 : (creation_date.year * 12 + creation_date.month) -
353
+ (Date.today.year * 12 + Date.today.month)
354
+
355
+ begin
356
+ return Date.parse(word) >> diff_in_months
357
+ rescue ArgumentError
358
+ ## If an ArgumentError arises, Date is not convinced it's a date.
359
+ return nil
360
+ end
361
+ end
362
+
363
+ # Parses a single word of the form XXXX-XX-XX, DD-MM-YYYY or MM-DD-YYYY
364
+ def NaturalDateParsing.full_numeric_date(word)
365
+ subparts = word.split("-")
366
+
367
+ # This is a weak check to see where the year is
368
+ year_index = (subparts[0].to_i).abs > 31 ? 0 : 2
369
+
370
+ # Then we assume it's of the form YYYY-MM-DD
371
+ if year_index == 0
372
+ return Date.parse(word)
373
+ else
374
+ # We check the subparts to try to see which part is DD.
375
+ # If we can't determine it, we assume it's International Standard Format,
376
+ # or DD-MM-YY
377
+
378
+ if subparts[1].to_i > 12
379
+ # American Standard (MM-DD-YYYY)
380
+ subparts[0] = numeric_month_to_string(subparts[0].to_i)
381
+ return Date.parse(subparts.join(" "))
382
+
383
+ else
384
+ # International Standard (DD-MM-YYYY)
385
+ return Date.parse(word)
386
+ end
387
+ end
388
+
389
+ return Date.parse(word)
390
+ end
391
+
392
+
393
+ private
394
+
395
+ ##############################################
396
+ ##
397
+ ## Private Functions
398
+ ##
399
+
400
+ def NaturalDateParsing._weak_day?(word)
401
+ return (NUMERIC_DAY.include? word) || (SUFFIXED_NUMERIC_DAY.include? word)
402
+ end
403
+
404
+ def NaturalDateParsing.default_year(year)
405
+ return Date.parse("Jan 1 " + year)
406
+ end
407
+
408
+ ## TODO. NOT SENSITIVE TO YEAR.
409
+ def NaturalDateParsing.default_month(month, released = nil)
410
+ this_year = released.nil? ? Date.today.year : released.year
411
+ return Date.parse(month + " " + this_year.to_s)
412
+ end
413
+
414
+ def NaturalDateParsing.suffix(number)
415
+ int = number.to_i
416
+
417
+ ## Check to see if the least significant digit is 1.
418
+ if int & 1 == 1
419
+ return int.to_s + "st"
420
+ else
421
+ return int.to_s + "th"
422
+ end
423
+ end
424
+
425
+ ## Be careful with this.
426
+ ## date1 is the later date.
427
+ def NaturalDateParsing.difference_in_weeks(date1, date2)
428
+ return ((date1 - date2) / 7).to_i
429
+ end
430
+
431
+ # Is it of the form XXXX-XX-XX?
432
+ def NaturalDateParsing.full_numeric_date?(word)
433
+ output = true
434
+
435
+ if word.include? "-"
436
+ substrings = word.split("-")
437
+ for substring in substrings do
438
+ output = output && Utils.is_int?(substring)
439
+ end
440
+ else
441
+ output = false
442
+ end
443
+
444
+ return output
445
+ end
446
+
447
+ def NaturalDateParsing.numeric_month_to_string(numeric)
448
+ months = ["january", "february", "march", "april", "may", "june",
449
+ "july", "august", "september", "october", "november",
450
+ "december"]
451
+
452
+ return months[numeric - 1]
453
+ end
454
+
455
+ end
@@ -0,0 +1,42 @@
1
+ # Extra utility functions for broader use in the gem.
2
+
3
+ module Utils
4
+
5
+ # Determine whether or not a String is a base 10 integer.
6
+ def Utils.is_int?(str)
7
+ str.to_i.to_s == str || strong_is_int?(str)
8
+ end
9
+
10
+ # A more rigorous check to see if the String is an int.
11
+ def Utils.strong_is_int?(str)
12
+ nums = ("0".."9").to_a
13
+
14
+ for char in str.each_char do
15
+ if ! nums.include? char
16
+ return false
17
+ end
18
+ end
19
+
20
+ return true
21
+ end
22
+
23
+ # Removes punctuation.
24
+ def Utils.clean_out_punctuation(str)
25
+ str.gsub(/[^a-z0-9\s\/-]/i, '')
26
+ end
27
+
28
+ # Removes punctuation and downcases the str.
29
+ def Utils.clean_str(str)
30
+ clean_out_punctuation(str).downcase
31
+ end
32
+
33
+ # Performs delete_at for a range of integers
34
+ def Utils.delete_at_indices(array, range)
35
+ first_val = range.first
36
+ for _ in range do
37
+ array.delete_at(first_val)
38
+ end
39
+
40
+ return array
41
+ end
42
+ end
@@ -0,0 +1,301 @@
1
+ require_relative "../date_parser"
2
+
3
+ ## Run this file with this command:
4
+ ## rspec lib/spec/date_parser_spec.rb
5
+
6
+ describe NaturalDateParsing do
7
+
8
+ #########################################################
9
+ ##
10
+ ## Basic Mechanics Testing
11
+ ##
12
+
13
+ before do
14
+
15
+ @date = "April 6th, 2014"
16
+ @text = "Remember to meet me on April 6th, 2014, alright?"
17
+ @paragraph = "April 6th, 2014 isn't good for me. We should meet instead on\n" +
18
+ "February 4th, 2013. Or even March 31st, 2017. No rush."
19
+
20
+ @parsed_date = [Date.parse("April 6th, 2014")]
21
+ @parsed_date_paragraph = [
22
+ Date.parse("April 6th, 2014"),
23
+ Date.parse("February 4th, 2013"),
24
+ Date.parse("March 31st, 2017")
25
+ ]
26
+ end
27
+
28
+ describe ".interpret_date" do
29
+ context "given 'April 6th, 2014'" do
30
+ it "returns Sun, 06 Apr 2014 as a date object" do
31
+ expect(NaturalDateParsing::interpret_date(@date)).to eql(@parsed_date)
32
+ end
33
+ end
34
+
35
+ context "given a sentence containing April 6th, 2014" do
36
+ it "returns Sun, 06 Apr 2014 as a date object" do
37
+ expect(NaturalDateParsing::interpret_date(@text)).to eql(@parsed_date)
38
+ end
39
+ end
40
+
41
+ context "given a paragraph containing several dates" do
42
+ it "returns a list of all dates mentioned in the paragraph" do
43
+ expect(NaturalDateParsing::interpret_date(@paragraph)).to eql(@parsed_date_paragraph)
44
+ end
45
+ end
46
+ end
47
+ end
48
+
49
+
50
+ describe NaturalDateParsing do
51
+
52
+ #########################################################
53
+ ##
54
+ ## Edge Cases. More Colloquial Language. More complicated phrases.
55
+ ##
56
+
57
+ describe ".interpret_date" do
58
+
59
+ context "Given a longer sentence containing a date" do
60
+ text = "La Puerta del Conde (The Count's Gate) is the site in Santo" +
61
+ "Domingo, Dominican Republic where Francisco del Rosario Sánchez, one of the" +
62
+ "Dominican Founding Fathers, proclaimed Dominican independence and raised the" +
63
+ "first Dominican Flag, on February 27, 1844."
64
+ creation_date = nil
65
+ answer = [Date.parse("February 27, 1844")]
66
+
67
+ it "captures the single date" do
68
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
69
+ end
70
+ end
71
+
72
+ context "Colloquial use of date 1" do
73
+ text = "We should go on the 4th if we can."
74
+ creation_date = Date.parse("July 1st 2016")
75
+ answer = [Date.parse("July 4th, 2016")]
76
+
77
+ it "correctly uses the creation_date parameter" do
78
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
79
+ end
80
+ end
81
+
82
+ context "Correct use of XX/XX format 1" do
83
+ text = "Newsflash: things happen on 02/12!"
84
+ creation_date = Date.parse("January 1st, 1994")
85
+ answer = [Date.parse("February 12, 1994")]
86
+
87
+ it "correctly grabs the date" do
88
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
89
+ end
90
+ end
91
+
92
+ context "Given yesterday" do
93
+ text = "Yesterday was certainly a day."
94
+ creation_date = nil
95
+ answer = [Date.today - 1]
96
+
97
+ it "correctly grabs the date" do
98
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
99
+ end
100
+ end
101
+
102
+ context "Given yesterday with a creation_date date" do
103
+ text = "Yesterday was certainly a day."
104
+ creation_date = Date.parse("January 12, 1994")
105
+ answer = [Date.parse("January 11, 1994")]
106
+
107
+ it "correctly grabs the date" do
108
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
109
+ end
110
+ end
111
+
112
+ context "Single year is correctly parsed" do
113
+ text = "In 1492 Columbus sailed the ocean blue."
114
+ creation_date = nil
115
+ answer = [Date.parse("January 1, 1492")]
116
+ parse_single_years = true
117
+
118
+ it "correctly grabs the date" do
119
+ expect(NaturalDateParsing::interpret_date(text, creation_date, parse_single_years)).to eql(answer)
120
+ end
121
+ end
122
+
123
+ context "Correctly parses month and day in middle of sentence" do
124
+ text = "Something something something march 4 something something"
125
+ creation_date = Date.parse("Jan 1, 2004")
126
+ answer = [Date.parse("March 4, 2004")]
127
+
128
+ it "correctly grabs the date" do
129
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
130
+ end
131
+ end
132
+
133
+ end
134
+ end
135
+
136
+
137
+ describe NaturalDateParsing do
138
+
139
+ #########################################################
140
+ ##
141
+ ## Colloquial Language samples from volunteers
142
+ ##
143
+
144
+ describe ".interpret_date" do
145
+
146
+ context "Colloquial example 1" do
147
+ text = "Charlie Chaplin and Jason Earles (Hannah Montana's brother) were " +
148
+ "alive at the same time for eight months in 1977"
149
+ creation_date = Date.parse("July 6, 2016")
150
+ answer = [Date.parse("January 1, 1977")]
151
+ parse_single_years = true
152
+
153
+ it "correctly grabs the date" do
154
+ expect(NaturalDateParsing::interpret_date(text, creation_date, parse_single_years)).to eql(answer)
155
+ end
156
+ end
157
+
158
+ ## Stopping this test for right now. Hard to tell which April.
159
+ ## Ambiguous in the general case.
160
+ #context "Colloquial example 2" do
161
+ # text = "For a job I started this April, I had to parse dates of various " +
162
+ # "formats, such as MM-DD-YY, MM/YYYY, and YY-MM-DD. It was infuriating, " +
163
+ # "and what I assume you to be doing reminds me of this."
164
+ # creation_date = Date.parse("July 6, 2016")
165
+ # answer = [Date.parse("April 1, 2016")] ## Reconsider
166
+ #
167
+ # it "correctly grabs the date" do
168
+ # expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
169
+ # end
170
+ #end
171
+
172
+ context "Colloquial example 3" do
173
+ text = "August 25, 2013, I met So-and-So"
174
+ creation_date = Date.parse("July 6, 2016")
175
+ answer = [Date.parse("August 25, 2013")]
176
+
177
+ it "correctly grabs the date" do
178
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
179
+ end
180
+ end
181
+
182
+ context "Colloquial example 4" do
183
+ text = "Quincy Jones (producer of Thriller) and Michael Caine (veteran " +
184
+ "British actor) were both born on the same day and the same hour on March " +
185
+ "14, 1933. They are still friends to this day."
186
+ creation_date = Date.parse("July 6, 2016")
187
+ answer = [Date.parse("March 14, 1933")]
188
+
189
+ it "correctly grabs the date" do
190
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
191
+ end
192
+ end
193
+
194
+ context "Colloquial example 5" do
195
+ text = "Two days ago (July 4, 2016) was the 190th death anniversary of " +
196
+ "the second and third US Presidents: John Adams and Thomas Jefferson, who died 5 hours apart."
197
+ creation_date = Date.parse("July 6, 2016")
198
+ answer = [Date.parse("July 4, 2016")]
199
+
200
+ it "correctly grabs the date" do
201
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
202
+ end
203
+ end
204
+
205
+ context "Colloquial example 6" do
206
+ text = "On October 3rd So-and-So asked me what day it was"
207
+ creation_date = Date.parse("July 6, 2016")
208
+ answer = [Date.parse("October 3, 2016")]
209
+
210
+ it "correctly grabs the date" do
211
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
212
+ end
213
+ end
214
+
215
+ context "Colloquial example 7" do
216
+ text = "Henry and Hanke created a calendar that causes each day to fall " +
217
+ "on the same day of the week every year. They recommend its " +
218
+ "implementation on January 1, 2018, a Monday."
219
+ creation_date = Date.parse("July 6, 2016")
220
+ answer = [Date.parse("January 1, 2018"),
221
+ Date.parse("July 11, 2016")] # Reconsider
222
+
223
+ it "correctly grabs the date" do
224
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
225
+ end
226
+ end
227
+
228
+ context "Colloquial example 8" do
229
+ text = "Beyoncé Giselle Knowles-Carter was born on September 4, 1981."
230
+ creation_date = Date.parse("July 6, 2016")
231
+ answer = [Date.parse("September 4, 1981")]
232
+
233
+ it "correctly grabs the date" do
234
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
235
+ end
236
+ end
237
+
238
+ context "Colloquial example 9" do
239
+ text = "The first time I asked someone on a date, I was in Iowa, it was a " +
240
+ "winterly mid-month January, we were to go to finding nemo, I was 9, and " +
241
+ "although she was interested, her parents said no."
242
+ creation_date = Date.parse("July 6, 2016")
243
+ answer = [Date.parse("January 1st, 2016")]
244
+
245
+ it "correctly grabs the date" do
246
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
247
+ end
248
+ end
249
+
250
+ context "Colloquial example 10" do
251
+ text = "You ate 10 dates from a crate of dates that were picked on June " +
252
+ "8th from a date tree that was cultivated one score and three years ago. " +
253
+ "Good luck parsing through the antics of my semantics."
254
+ creation_date = Date.parse("July 6, 2016")
255
+ answer = [Date.parse("June 8, 2016")]
256
+
257
+ it "correctly grabs the date" do
258
+ expect(NaturalDateParsing::interpret_date(text, creation_date)).to eql(answer)
259
+ end
260
+ end
261
+
262
+ end
263
+ end
264
+
265
+
266
+ describe DateParser do
267
+
268
+ #########################################################
269
+ ##
270
+ ## More Edge Cases looking at specific features.
271
+ ##
272
+
273
+ describe ".parse" do
274
+ context "Parse fully numeric date" do
275
+ text = "2012-02-12"
276
+ answer = [Date.parse("2012-02-12")]
277
+
278
+ it "correctly grabs the date" do
279
+ expect(DateParser::parse(text)).to eql(answer)
280
+ end
281
+ end
282
+
283
+ context "Parse American fully numeric date" do
284
+ text = "7-24-2015"
285
+ answer = [Date.parse("July 24, 2015")]
286
+
287
+ it "correctly grabs the date" do
288
+ expect(DateParser::parse(text)).to eql(answer)
289
+ end
290
+ end
291
+
292
+ context "Parse International Standard fully numeric date" do
293
+ text = "24-07-2015"
294
+ answer = [Date.parse("24-07-2015")]
295
+
296
+ it "correctly grabs the date" do
297
+ expect(DateParser::parse(text)).to eql(answer)
298
+ end
299
+ end
300
+ end
301
+ end
metadata ADDED
@@ -0,0 +1,54 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: date_parser
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.1
5
+ platform: ruby
6
+ authors:
7
+ - Ryan Kwon
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-07-26 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: DateParser is a simple, fast, and effective way to parse dates from natural
14
+ language text in a robust way.
15
+ email: rynkwn@gmail.com
16
+ executables: []
17
+ extensions: []
18
+ extra_rdoc_files:
19
+ - README.md
20
+ - NEWS.md
21
+ files:
22
+ - NEWS.md
23
+ - README.md
24
+ - lib/date_parser.rb
25
+ - lib/date_parser/date_utils.rb
26
+ - lib/date_parser/natural_date_parsing.rb
27
+ - lib/date_parser/utils.rb
28
+ - lib/spec/date_parser_spec.rb
29
+ homepage: https://github.com/rynkwn/DateParser
30
+ licenses:
31
+ - MIT
32
+ metadata: {}
33
+ post_install_message:
34
+ rdoc_options: []
35
+ require_paths:
36
+ - lib
37
+ required_ruby_version: !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - ">="
40
+ - !ruby/object:Gem::Version
41
+ version: '0'
42
+ required_rubygems_version: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ requirements: []
48
+ rubyforge_project:
49
+ rubygems_version: 2.5.1
50
+ signing_key:
51
+ specification_version: 4
52
+ summary: Quickly parse natural language into dates robustly.
53
+ test_files:
54
+ - lib/spec/date_parser_spec.rb