pangel-chronic 0.3.0.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (47) hide show
  1. data/README.rdoc +182 -0
  2. data/lib/chronic/chronic.rb +303 -0
  3. data/lib/chronic/grabber.rb +26 -0
  4. data/lib/chronic/handlers.rb +560 -0
  5. data/lib/chronic/ordinal.rb +39 -0
  6. data/lib/chronic/pointer.rb +29 -0
  7. data/lib/chronic/repeater.rb +139 -0
  8. data/lib/chronic/repeaters/repeater_day.rb +52 -0
  9. data/lib/chronic/repeaters/repeater_day_name.rb +53 -0
  10. data/lib/chronic/repeaters/repeater_day_portion.rb +94 -0
  11. data/lib/chronic/repeaters/repeater_fortnight.rb +70 -0
  12. data/lib/chronic/repeaters/repeater_hour.rb +58 -0
  13. data/lib/chronic/repeaters/repeater_minute.rb +57 -0
  14. data/lib/chronic/repeaters/repeater_month.rb +66 -0
  15. data/lib/chronic/repeaters/repeater_month_name.rb +98 -0
  16. data/lib/chronic/repeaters/repeater_season.rb +150 -0
  17. data/lib/chronic/repeaters/repeater_season_name.rb +45 -0
  18. data/lib/chronic/repeaters/repeater_second.rb +41 -0
  19. data/lib/chronic/repeaters/repeater_time.rb +124 -0
  20. data/lib/chronic/repeaters/repeater_week.rb +73 -0
  21. data/lib/chronic/repeaters/repeater_weekday.rb +77 -0
  22. data/lib/chronic/repeaters/repeater_weekend.rb +65 -0
  23. data/lib/chronic/repeaters/repeater_year.rb +64 -0
  24. data/lib/chronic/scalar.rb +76 -0
  25. data/lib/chronic/separator.rb +91 -0
  26. data/lib/chronic/time_zone.rb +23 -0
  27. data/lib/chronic.rb +57 -0
  28. data/lib/numerizer/numerizer.rb +98 -0
  29. data/test/suite.rb +9 -0
  30. data/test/test_Chronic.rb +50 -0
  31. data/test/test_Handler.rb +110 -0
  32. data/test/test_Numerizer.rb +54 -0
  33. data/test/test_RepeaterDayName.rb +52 -0
  34. data/test/test_RepeaterFortnight.rb +63 -0
  35. data/test/test_RepeaterHour.rb +68 -0
  36. data/test/test_RepeaterMonth.rb +47 -0
  37. data/test/test_RepeaterMonthName.rb +57 -0
  38. data/test/test_RepeaterTime.rb +72 -0
  39. data/test/test_RepeaterWeek.rb +63 -0
  40. data/test/test_RepeaterWeekday.rb +56 -0
  41. data/test/test_RepeaterWeekend.rb +75 -0
  42. data/test/test_RepeaterYear.rb +63 -0
  43. data/test/test_Span.rb +33 -0
  44. data/test/test_Time.rb +50 -0
  45. data/test/test_Token.rb +26 -0
  46. data/test/test_parsing.rb +797 -0
  47. metadata +111 -0
data/README.rdoc ADDED
@@ -0,0 +1,182 @@
1
+ = Chronic
2
+
3
+ http://chronic.rubyforge.org/
4
+
5
+ by Tom Preston-Werner
6
+
7
+ = WARNING:
8
+
9
+ If you haven't noticed already, this is a fork of mojombo's (Tom's) stable
10
+ chronic. I decided on my own volition that the 40-some (as reported by Github)
11
+ network should be merged together. I got it to run, but quite haphazardly. There
12
+ are a lot of new features (mostly undocumented except the git logs) so be a
13
+ little flexible in your language passed to Chronic.
14
+
15
+ Given that, if there is a bug, more than likely it's my own fault, not
16
+ mojombo's and therefore bug reports should be sent to my fork, not his.
17
+
18
+ Enjoy Chronic!
19
+
20
+ == DESCRIPTION:
21
+
22
+ Chronic is a natural language date/time parser written in pure Ruby. See below for the wide variety of formats Chronic will parse.
23
+
24
+ == INSTALLATION:
25
+
26
+ Chronic can be installed via RubyGems:
27
+
28
+ $ sudo gem install evaryont-chronic
29
+
30
+ == CODE:
31
+
32
+ Browse the code and get an RSS feed of the commit log at:
33
+
34
+ http://github.com/evaryont/chronic.git
35
+
36
+ You can grab the code (and help with development) via git:
37
+
38
+ $ git clone git://github.com/evaryont/chronic.git
39
+
40
+ == USAGE:
41
+
42
+ You can parse strings containing a natural language date using the Chronic.parse method.
43
+
44
+ require 'rubygems'
45
+ require 'chronic'
46
+
47
+ Time.now #=> Sun Aug 27 23:18:25 PDT 2006
48
+
49
+ #---
50
+
51
+ Chronic.parse('tomorrow')
52
+ #=> Mon Aug 28 12:00:00 PDT 2006
53
+
54
+ Chronic.parse('monday', :context => :past)
55
+ #=> Mon Aug 21 12:00:00 PDT 2006
56
+
57
+ Chronic.parse('this tuesday 5:00')
58
+ #=> Tue Aug 29 17:00:00 PDT 2006
59
+
60
+ Chronic.parse('this tuesday 5:00', :ambiguous_time_range => :none)
61
+ #=> Tue Aug 29 05:00:00 PDT 2006
62
+
63
+ Chronic.parse('may 27th', :now => Time.local(2000, 1, 1))
64
+ #=> Sat May 27 12:00:00 PDT 2000
65
+
66
+ Chronic.parse('may 27th', :guess => false)
67
+ #=> Sun May 27 00:00:00 PDT 2007..Mon May 28 00:00:00 PDT 2007
68
+
69
+ See Chronic.parse for detailed usage instructions.
70
+
71
+ == EXAMPLES:
72
+
73
+ Chronic can parse a huge variety of date and time formats. Following is a small sample of strings that will be properly parsed. Parsing is case insensitive and will handle common abbreviations and misspellings.
74
+
75
+ Simple
76
+
77
+ thursday
78
+ november
79
+ summer
80
+ friday 13:00
81
+ mon 2:35
82
+ 4pm
83
+ 6 in the morning
84
+ friday 1pm
85
+ sat 7 in the evening
86
+ yesterday
87
+ today
88
+ tomorrow
89
+ this tuesday
90
+ next month
91
+ last winter
92
+ this morning
93
+ last night
94
+ this second
95
+ yesterday at 4:00
96
+ last friday at 20:00
97
+ last week tuesday
98
+ tomorrow at 6:45pm
99
+ afternoon yesterday
100
+ thursday last week
101
+
102
+ Complex
103
+
104
+ 3 years ago
105
+ 5 months before now
106
+ 7 hours ago
107
+ 7 days from now
108
+ 1 week hence
109
+ in 3 hours
110
+ 1 year ago tomorrow
111
+ 3 months ago saturday at 5:00 pm
112
+ 7 hours before tomorrow at noon
113
+ 3rd wednesday in november
114
+ 3rd month next year
115
+ 3rd thursday this september
116
+ 4th day last week
117
+
118
+ Specific Dates
119
+
120
+ January 5
121
+ dec 25
122
+ may 27th
123
+ October 2006
124
+ oct 06
125
+ jan 3 2010
126
+ february 14, 2004
127
+ 3 jan 2000
128
+ 17 april 85
129
+ 5/27/1979
130
+ 27/5/1979
131
+ 05/06
132
+ 1979-05-27
133
+ Friday
134
+ 5
135
+ 4:00
136
+ 17:00
137
+ 0800
138
+
139
+ Specific Times (many of the above with an added time)
140
+
141
+ January 5 at 7pm
142
+ 1979-05-27 05:00:00
143
+ etc
144
+
145
+ == TIME ZONES:
146
+
147
+ Chronic allows you to set which Time class to use when constructing times. By default, the built in Ruby time class creates times in your system's
148
+ local time zone. You can set this to something like ActiveSupport's TimeZone class to get full time zone support.
149
+
150
+ >> Time.zone = "UTC"
151
+ >> Chronic.time_class = Time.zone
152
+ >> Chronic.parse("June 15 2006 at 5:45 AM")
153
+ => Thu, 15 Jun 2006 05:45:00 UTC +00:00
154
+
155
+ == LIMITATIONS:
156
+
157
+ Chronic uses Ruby's built in Time class for all time storage and computation. Because of this, only times that the Time class can handle will be properly parsed. Parsing for times outside of this range will simply return nil. Support for a wider range of times is planned for a future release.
158
+
159
+ == LICENSE:
160
+
161
+ (The MIT License)
162
+
163
+ Copyright (c) 2008 Tom Preston-Werner
164
+
165
+ Permission is hereby granted, free of charge, to any person obtaining
166
+ a copy of this software and associated documentation files (the
167
+ "Software"), to deal in the Software without restriction, including
168
+ without limitation the rights to use, copy, modify, merge, publish,
169
+ distribute, sublicense, and/or sell copies of the Software, and to
170
+ permit persons to whom the Software is furnished to do so, subject to
171
+ the following conditions:
172
+
173
+ The above copyright notice and this permission notice shall be
174
+ included in all copies or substantial portions of the Software.
175
+
176
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
177
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
178
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
179
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
180
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
181
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
182
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,303 @@
1
+ module Chronic
2
+ class << self
3
+
4
+ # Parses a string containing a natural language date or time. If the parser
5
+ # can find a date or time, either a Time or Chronic::Span will be returned
6
+ # (depending on the value of <tt>:guess</tt>). If no date or time can be found,
7
+ # +nil+ will be returned.
8
+ #
9
+ # Options are:
10
+ #
11
+ # [<tt>:context</tt>]
12
+ # <tt>:past</tt> or <tt>:future</tt> (defaults to <tt>:future</tt>)
13
+ #
14
+ # If your string represents a birthday, you can set <tt>:context</tt> to <tt>:past</tt>
15
+ # and if an ambiguous string is given, it will assume it is in the
16
+ # past. Specify <tt>:future</tt> or omit to set a future context.
17
+ #
18
+ # [<tt>:now</tt>]
19
+ # Time (defaults to Time.now)
20
+ #
21
+ # By setting <tt>:now</tt> to a Time, all computations will be based off
22
+ # of that time instead of Time.now. If set to nil, Chronic will use Time.now.
23
+ #
24
+ # [<tt>:guess</tt>]
25
+ # +true+, +false+, +"start"+, +"middle"+, and +"end"+ (defaults to +true+)
26
+ #
27
+ # By default, the parser will guess a single point in time for the
28
+ # given date or time. +:guess+ => +true+ or +"middle"+ will return the middle
29
+ # value of the range. If +"start"+ is specified, Chronic::Span will return the
30
+ # beginning of the range. If +"end"+ is specified, the last value in
31
+ # Chronic::Span will be returned. If you'd rather have the entire time span returned,
32
+ # set <tt>:guess</tt> to +false+ and a Chronic::Span will be returned.
33
+ #
34
+ # [<tt>:ambiguous_time_range</tt>]
35
+ # Integer or <tt>:none</tt> (defaults to <tt>6</tt> (6am-6pm))
36
+ #
37
+ # If an Integer is given, ambiguous times (like 5:00) will be
38
+ # assumed to be within the range of that time in the AM to that time
39
+ # in the PM. For example, if you set it to <tt>7</tt>, then the parser will
40
+ # look for the time between 7am and 7pm. In the case of 5:00, it would
41
+ # assume that means 5:00pm. If <tt>:none</tt> is given, no assumption
42
+ # will be made, and the first matching instance of that time will
43
+ # be used.
44
+ def parse(text, specified_options = {})
45
+ # strip any non-tagged tokens
46
+ @tokens = tokenize(text, specified_options).select { |token| token.tagged? }
47
+
48
+ if Chronic.debug
49
+ puts "+---------------------------------------------------"
50
+ puts "| " + @tokens.to_s
51
+ puts "+---------------------------------------------------"
52
+ end
53
+
54
+ options = default_options(specified_options)
55
+
56
+ # do the heavy lifting
57
+ begin
58
+ span = self.tokens_to_span(@tokens, options)
59
+ rescue
60
+ raise
61
+ return nil
62
+ end
63
+
64
+ # guess a time within a span if required
65
+ if options[:guess]
66
+ return self.guess(span, options[:guess])
67
+ else
68
+ return span
69
+ end
70
+ end
71
+
72
+ def strip_tokens(text, specified_options = {})
73
+ # strip any tagged tokens
74
+ return tokenize(text, specified_options).select { |token| !token.tagged? }.map { |token| token.word }.join(' ')
75
+ end
76
+
77
+ # Returns an array with text tokenized by the respective classes
78
+ def tokenize(text, specified_options = {})
79
+ @text = text
80
+
81
+ options = default_options(specified_options)
82
+ # store now for later =)
83
+ @now = options[:now]
84
+
85
+ # put the text into a normal format to ease scanning
86
+ puts "+++ text = #{text}" if Chronic.debug
87
+ text = self.pre_normalize(text)
88
+ puts "--- text = #{text}" if Chronic.debug
89
+
90
+ # get base tokens for each word
91
+ @tokens = self.base_tokenize(text)
92
+ puts @tokens if Chronic.debug
93
+
94
+ # scan the tokens with each token scanner
95
+ [Repeater].each do |tokenizer|
96
+ @tokens = tokenizer.scan(@tokens, options)
97
+ end
98
+
99
+ [Grabber, Pointer, Scalar, Ordinal, Separator, TimeZone].each do |tokenizer|
100
+ @tokens = tokenizer.scan(@tokens)
101
+ end
102
+
103
+ return @tokens
104
+ end
105
+
106
+ def default_options(specified_options)
107
+ # get options and set defaults if necessary
108
+ default_options = {:context => :future,
109
+ :now => Chronic.time_class.now,
110
+ :guess => true,
111
+ :guess_how => :middle,
112
+ :ambiguous_time_range => 6,
113
+ :endian_precedence => nil}
114
+ options = default_options.merge specified_options
115
+
116
+ # handle options that were set to nil
117
+ options[:context] = :future unless options[:context]
118
+ options[:now] = Chronic.time_class.now unless options[:context]
119
+ options[:ambiguous_time_range] = 6 unless options[:ambiguous_time_range]
120
+
121
+ # ensure the specified options are valid
122
+ specified_options.keys.each do |key|
123
+ default_options.keys.include?(key) || raise(InvalidArgumentException, "#{key} is not a valid option key.")
124
+ end
125
+ [:past, :future, :none].include?(options[:context]) || raise(InvalidArgumentException, "Invalid value ':#{options[:context]}' for :context specified. Valid values are :past and :future.")
126
+ ["start", "middle", "end", true, false].include?(options[:guess]) || validate_percentness_of(options[:guess]) || raise(InvalidArgumentException, "Invalid value ':#{options[:guess]}' for :guess how specified. Valid values are true, false, \"start\", \"middle\", and \"end\". true will default to \"middle\". :guess can also be a percent(0.60)")
127
+
128
+ return options
129
+ end
130
+
131
+ # Clean up the specified input text by stripping unwanted characters,
132
+ # converting idioms to their canonical form, converting number words
133
+ # to numbers (three => 3), and converting ordinal words to numeric
134
+ # ordinals (third => 3rd)
135
+ def pre_normalize(text) #:nodoc:
136
+ normalized_text = text.to_s
137
+ normalized_text = numericize_numbers(normalized_text)
138
+ normalized_text.gsub!(/['",]/, '')
139
+ normalized_text.gsub!(/(\d+\:\d+)\.(\d+)/, '\1\2')
140
+ normalized_text.gsub!(/ \-(\d{4})\b/, ' tzminus\1')
141
+ normalized_text.gsub!(/([\/\-\,\@])/) { ' ' + $1 + ' ' }
142
+ normalized_text.gsub!(/\btoday\b/i, 'this day')
143
+ normalized_text.gsub!(/\btomm?orr?ow\b/i, 'next day')
144
+ normalized_text.gsub!(/\byesterday\b/i, 'last day')
145
+ normalized_text.gsub!(/\bnoon\b/i, '12:00')
146
+ normalized_text.gsub!(/\bmidnight\b/i, '24:00')
147
+ normalized_text.gsub!(/\bbefore now\b/i, 'past')
148
+ normalized_text.gsub!(/\bnow\b/i, 'this second')
149
+ normalized_text.gsub!(/\b(ago|before)\b/i, 'past')
150
+ normalized_text.gsub!(/\bthis past\b/i, 'last')
151
+ normalized_text.gsub!(/\bthis last\b/i, 'last')
152
+ normalized_text.gsub!(/\b(?:in|during) the (morning)\b/i, '\1')
153
+ normalized_text.gsub!(/\b(?:in the|during the|at) (afternoon|evening|night)\b/i, '\1')
154
+ normalized_text.gsub!(/\btonight\b/i, 'this night')
155
+ normalized_text.gsub!(/\b\d+:?\d*[ap]\b/i,'\0m')
156
+ normalized_text.gsub!(/(\d)([ap]m|oclock)\b/i, '\1 \2')
157
+ normalized_text.gsub!(/\b(hence|after|from)\b/i, 'future')
158
+ normalized_text.gsub!(/\bh[ou]{0,2}rs?\b/i, 'hour')
159
+ #not needed - see test_parse_before_now (test_parsing.rb +726)
160
+ #normalized_text.gsub!(/\bbefore now\b/, 'past')
161
+
162
+ normalized_text = numericize_ordinals(normalized_text)
163
+ end
164
+
165
+ # Convert number words to numbers (three => 3)
166
+ def numericize_numbers(text) #:nodoc:
167
+ Numerizer.numerize(text)
168
+ end
169
+
170
+ # Convert ordinal words to numeric ordinals (third => 3rd)
171
+ def numericize_ordinals(text) #:nodoc:
172
+ text
173
+ end
174
+
175
+ # Split the text on spaces and convert each word into
176
+ # a Token
177
+ def base_tokenize(text) #:nodoc:
178
+ text.split(' ').map { |word| Token.new(word) }
179
+ end
180
+
181
+ # Guess a specific time within the given span
182
+ def guess(span, guess=true) #:nodoc:
183
+ return nil if span.nil?
184
+ if span.width > 1
185
+ # Account for a timezone difference between the start and end of the range.
186
+ # This most likely will happen when dealing with a Daylight Saving Time start
187
+ # or end day.
188
+ gmt_offset_diff = span.begin.gmt_offset - span.end.gmt_offset
189
+ case guess
190
+ when "start"
191
+ span.begin
192
+ when true, "middle"
193
+ span.begin + ((span.width - gmt_offset_diff) / 2)
194
+ when "end"
195
+ span.begin + (span.width - gmt_offset_diff)
196
+ else
197
+ span.begin + ((span.width - gmt_offset_diff) * guess)
198
+ end
199
+ else
200
+ span.begin
201
+ end
202
+ end
203
+
204
+ # Validates numericality of something
205
+ def validate_percentness_of(number) #:nodoc:
206
+ number.to_s.to_f == number && number >= 0 && number <= 1
207
+ end
208
+ end
209
+
210
+ class Token #:nodoc:
211
+ attr_accessor :word, :tags
212
+
213
+ def initialize(word)
214
+ @word = word
215
+ @tags = []
216
+ end
217
+
218
+ # Tag this token with the specified tag
219
+ def tag(new_tag)
220
+ @tags << new_tag
221
+ end
222
+
223
+ # Remove all tags of the given class
224
+ def untag(tag_class)
225
+ @tags = @tags.select { |m| !m.kind_of? tag_class }
226
+ end
227
+
228
+ # Return true if this token has any tags
229
+ def tagged?
230
+ @tags.size > 0
231
+ end
232
+
233
+ # Return the Tag that matches the given class
234
+ def get_tag(tag_class)
235
+ matches = @tags.select { |m| m.kind_of? tag_class }
236
+ #matches.size < 2 || raise("Multiple identical tags found")
237
+ return matches.first
238
+ end
239
+
240
+ # Print this Token in a pretty way
241
+ def to_s
242
+ "#{@word}(#{@tags.join(', ')})"
243
+ end
244
+ end
245
+
246
+ # A Span represents a range of time. Since this class extends
247
+ # Range, you can use #begin and #end to get the beginning and
248
+ # ending times of the span (they will be of class Time)
249
+ class Span < Range
250
+
251
+ def initialize(range_begin, range_end)
252
+ # Use exclusive range.
253
+ super(range_begin, range_end, true)
254
+ end
255
+
256
+ # Returns the width of this span in seconds
257
+ def width
258
+ (self.end - self.begin).to_i
259
+ end
260
+
261
+ # Add a number of seconds to this span, returning the
262
+ # resulting Span
263
+ def +(seconds)
264
+ Span.new(self.begin + seconds, self.end + seconds)
265
+ end
266
+
267
+ # Subtract a number of seconds to this span, returning the
268
+ # resulting Span
269
+ def -(seconds)
270
+ self + -seconds
271
+ end
272
+
273
+ # Prints this span in a nice fashion
274
+ def to_s
275
+ '(' << self.begin.to_s << '...' << self.end.to_s << ')'
276
+ end
277
+ end
278
+
279
+ # Tokens are tagged with subclassed instances of this class when
280
+ # they match specific criteria
281
+ class Tag #:nodoc:
282
+ attr_accessor :type
283
+
284
+ def initialize(type)
285
+ @type = type
286
+ end
287
+
288
+ def start=(s)
289
+ @now = s
290
+ end
291
+ end
292
+
293
+ # Internal exception
294
+ class ChronicPain < Exception #:nodoc:
295
+
296
+ end
297
+
298
+ # This exception is raised if an invalid argument is provided to
299
+ # any of Chronic's methods
300
+ class InvalidArgumentException < Exception
301
+
302
+ end
303
+ end
@@ -0,0 +1,26 @@
1
+ #module Chronic
2
+
3
+ class Chronic::Grabber < Chronic::Tag #:nodoc:
4
+ def self.scan(tokens)
5
+ tokens.each_index do |i|
6
+ if t = self.scan_for_all(tokens[i]) then tokens[i].tag(t); next end
7
+ end
8
+ tokens
9
+ end
10
+
11
+ def self.scan_for_all(token)
12
+ scanner = {/last/i => :last,
13
+ /this/i => :this,
14
+ /next/i => :next}
15
+ scanner.keys.each do |scanner_item|
16
+ return self.new(scanner[scanner_item]) if scanner_item =~ token.word
17
+ end
18
+ return nil
19
+ end
20
+
21
+ def to_s
22
+ 'grabber-' << @type.to_s
23
+ end
24
+ end
25
+
26
+ #end