pseudo_date 0.1.5 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 194f7ad213e4723e13dadd0bb7e4808b0e7d078d96302f27e67c926a471740a7
4
+ data.tar.gz: 5924349cb3530554acc4778d3983351afecb4f22a413906555259b3a70a008b8
5
+ SHA512:
6
+ metadata.gz: 6af99fbd3a7e1a14c4e0966e145e60abd25242b9bc127e24304134eee90a35c66224878da002fb8f7e168d61065cdf22b7ed99c4ab56d3e134d13e7ef56e9a1f
7
+ data.tar.gz: f18fde6cdade727c95c9768a713a664dc4842a80afc8e4d464c2c44d02d78d30b4cf83930a0359b22f03772f2f9a464ed79227844995412ee8ee27024277161c
data/Gemfile.lock CHANGED
@@ -1,20 +1,24 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- pseudo_date (0.1.5)
4
+ pseudo_date (0.2.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
- diff-lcs (1.2.4)
10
- rspec (2.13.0)
11
- rspec-core (~> 2.13.0)
12
- rspec-expectations (~> 2.13.0)
13
- rspec-mocks (~> 2.13.0)
14
- rspec-core (2.13.1)
15
- rspec-expectations (2.13.0)
16
- diff-lcs (>= 1.1.3, < 2.0)
17
- rspec-mocks (2.13.1)
9
+ diff-lcs (1.2.5)
10
+ rspec (3.0.0)
11
+ rspec-core (~> 3.0.0)
12
+ rspec-expectations (~> 3.0.0)
13
+ rspec-mocks (~> 3.0.0)
14
+ rspec-core (3.0.1)
15
+ rspec-support (~> 3.0.0)
16
+ rspec-expectations (3.0.1)
17
+ diff-lcs (>= 1.2.0, < 2.0)
18
+ rspec-support (~> 3.0.0)
19
+ rspec-mocks (3.0.1)
20
+ rspec-support (~> 3.0.0)
21
+ rspec-support (3.0.0)
18
22
 
19
23
  PLATFORMS
20
24
  ruby
@@ -22,3 +26,6 @@ PLATFORMS
22
26
  DEPENDENCIES
23
27
  pseudo_date!
24
28
  rspec
29
+
30
+ BUNDLED WITH
31
+ 2.1.4
data/README.mdown CHANGED
@@ -6,15 +6,17 @@ It's a date but not really. A PseudoDate object has a day, month, and year but
6
6
 
7
7
  ## What Is This For?
8
8
 
9
- PseudoDate was created to parse odd dates in odd formats and attempt to extract as much information from them as possible. It's especially handy when you're trying to convert a date string that has come from an OCR'd source.
9
+ PseudoDate was created to parse odd dates in odd formats and attempt to extract as much information from them as possible. It's especially handy when you're trying to convert a date string that has come from an OCR'd source. It was primarily written to parse dates in American public record data in an effort to have a common date format when doing record matching.
10
10
 
11
11
  ## Assumptions
12
12
 
13
- As with all parsing, one needs to make assumptions. The main assumption made here is that all dates will be in the past.
13
+ As with all parsing, one needs to make assumptions. The main assumption made here is that all dates will be in the past. Dates that appear to be far-future are generally labeled as "invalid."
14
+
15
+ Since this gem was built for trying to wrangle OCR'd dates we have to make some assumptions when it comes to date formats. As of `0.2.0` the gem now assumes that dates separated by a "/" are American dates and those that are separated by a "-" are European dates. Future versions may allow some configuration for this depending on your usage but in my experience there has not been a need for that. This mimics the behavior from Ruby 1.8.7 which was changed in Ruby 1.9+.
14
16
 
15
17
  ## Other Notes
16
18
 
17
- PseudoDate stores date attributes in strings instead of integers to avoid losing the preceding '0' on various attributes. This was a decision made when first creating the class because of the way things were being output in the project it was created for.
19
+ PseudoDate stores date attributes in strings instead of integers to avoid losing the preceding '0' on various attributes. This was a decision made when first creating the class because of the way things were being output in the project it was created for. There has been some discussion about switching these to integers in order to help save on memory but no decision has been made here either way.
18
20
 
19
21
  ## Compatability
20
22
 
@@ -1,11 +1,11 @@
1
1
  class String
2
2
 
3
3
  def to_date_hash
4
- Parser.parse(self)
4
+ PseudoDate::Parser.parse(self)
5
5
  end
6
6
 
7
7
  def to_pseudo_date
8
8
  PseudoDate.new(self)
9
9
  end
10
10
 
11
- end
11
+ end
@@ -1,90 +1,115 @@
1
1
  require 'date'
2
- class Parser
2
+ require_relative 'pseudo_date'
3
+
4
+ class PseudoDate
5
+ class Parser
3
6
 
4
- def self.parse(input)
5
- date_hash = {}
6
- # Minor Pre Cleanup
7
- input.strip!; input.gsub!('~','')
7
+ AMERICAN_DATE_FORMAT = '%m/%d/%Y'
8
+ EUROPEAN_DATE_FORMAT = '%Y-%m-%d'
8
9
 
9
- date = input.split(/\/|-/).length < 3 ? nil : Date.parse(input) rescue nil
10
- if date
11
- date_hash = { :year => date.year.to_s, :month => date.month.to_s, :day => date.day.to_s }
12
- else
13
- year, month, day = parse_string(input)
14
- date_hash = { :year => year, :month => month, :day => day }
15
- end
16
-
17
- # Post parsing cleanup
18
- date_hash.each do |key, value|
19
- date_hash[key] = if value.nil?
20
- key.to_s == 'year' ? '0000' : '00'
10
+ def self.parse(input)
11
+ date_hash = {}
12
+ # Minor Pre Cleanup
13
+ input.strip!; input.gsub!('~','')
14
+
15
+ date = parse_with_poro_date(input)
16
+
17
+ if date
18
+ date_hash = { :year => date.year.to_s, :month => date.month.to_s, :day => date.day.to_s }
21
19
  else
22
- date_hash[key] = value.to_s.strip
20
+ year, month, day = parse_string(input)
21
+ date_hash = { :year => year, :month => month, :day => day }
23
22
  end
24
- end
25
-
26
- # Cleanup the single digit values
27
- unless date_hash.empty?
28
- date_hash.each do |key,value|
29
- date_hash[key] = "0#{value}" if value.to_s.length == 1
23
+
24
+ # Post parsing cleanup
25
+ date_hash.each do |key, value|
26
+ date_hash[key] = if value.nil?
27
+ key.to_s == 'year' ? '0000' : '00'
28
+ else
29
+ date_hash[key] = value.to_s.strip
30
+ end
30
31
  end
32
+
33
+ # Cleanup the single digit values
34
+ unless date_hash.empty?
35
+ date_hash.each do |key,value|
36
+ date_hash[key] = "0#{value}" if value.to_s.length == 1
37
+ end
38
+ end
39
+
40
+ # Two character years
41
+ if date_hash[:year].length == 2
42
+ date_hash[:year] = date_hash[:year].to_i > Date.today.year.to_s.slice(2..4).to_i ? "19#{date_hash[:year]}" : "20#{date_hash[:year]}"
43
+ end
44
+
45
+ # Attempt to correct some known OCR issues
46
+ if date_hash[:year].to_s.match('00') && date_hash[:year] != '0000'
47
+ date_hash[:year] = "2#{date_hash[:year].slice(1..3)}"
48
+ end
49
+
50
+ return date_hash.empty? ? nil : date_hash
31
51
  end
32
52
 
33
- # Two character years
34
- if date_hash[:year].length == 2
35
- date_hash[:year] = date_hash[:year].to_i > Date.today.year.to_s.slice(2..4).to_i ? "19#{date_hash[:year]}" : "20#{date_hash[:year]}"
36
- end
53
+ private
37
54
 
38
- # Attempt to correct some known OCR issues
39
- if date_hash[:year].to_s.match('00') && date_hash[:year] != '0000'
40
- date_hash[:year] = "2#{date_hash[:year].slice(1..3)}"
55
+ def self.parse_with_poro_date(string)
56
+ # If our date has 3 parts then let's try to parse it with Date::strptime
57
+ if string.split(/\/|-/).length < 3
58
+ case string
59
+ when /-/ # Europeans generally use hyphens to separate date pieces
60
+ Date.strptime(string, EUROPEAN_DATE_FORMAT)
61
+ when /\// # Americans usually use a / to separate date pieces
62
+ Date.strptime(string, AMERICAN_DATE_FORMAT)
63
+ end
64
+ else
65
+ nil # Not enough parts so just return nil
66
+ end
67
+ rescue
68
+ nil # We don't actually care why Date is complaining. We'll fall back to slower parsing later.
41
69
  end
42
70
 
43
- return date_hash.empty? ? nil : date_hash
44
- end
45
-
46
- private
47
-
48
- def self.parse_string(input)
49
- day, month, year = "00", "00", "0000"
50
- if input.match('/') # 02/25/2008
51
- date_array = input.split('/')
52
- if date_array.length == 3
53
- begin
54
- parsed_date = Date.parse(self)
55
- month, day, year = parsed_date.month, parsed_date.day, parsed_date.year
56
- rescue
57
- month, day, year = date_array
71
+ def self.parse_string(input)
72
+ day, month, year = "00", "00", "0000"
73
+ if input.match('/') # 02/25/2008
74
+ date_array = input.split('/')
75
+ if date_array.length == 3
76
+ begin
77
+ parsed_date = Date.parse(self)
78
+ month, day, year = parsed_date.month, parsed_date.day, parsed_date.year
79
+ rescue
80
+ month, day, year = date_array
81
+ end
82
+ elsif date_array.length == 2
83
+ month, year = date_array
58
84
  end
59
- elsif date_array.length == 2
60
- month, year = date_array
61
- end
62
- elsif input.length == 8 && is_numeric?(input) # 20080225
63
- year, month, day = input.slice(0..3), input.slice(4..5), input.slice(6..7)
64
- elsif input.match('-') # 1985-09-25 or 02-25-2008
65
- date_array = input.split('-')
66
- year = date_array.select{ |part| part.length == 4 }.first
67
- unless year.nil? || date_array.length != 3
68
- if date_array.first == year
69
- month = date_array.last
70
- day = date_array[1]
71
- else
72
- month = date_array.first
73
- day = date_array[1]
85
+ elsif input.length == 8 && is_numeric?(input) # 20080225
86
+ year, month, day = input.slice(0..3), input.slice(4..5), input.slice(6..7)
87
+ elsif input.match('-') # 1985-09-25 or 02-25-2008
88
+ date_array = input.split('-')
89
+ year = date_array.select{ |part| part.length == 4 }.first
90
+ unless year.nil? || date_array.length != 3
91
+ if date_array.first == year
92
+ month = date_array.last
93
+ day = date_array[1]
94
+ else
95
+ month = date_array.first
96
+ day = date_array[1]
97
+ end
98
+ month, day = [day, month] if month.to_i > 12 && month.to_i > day.to_i
74
99
  end
75
- month, day = [day, month] if month.to_i > 12 && month.to_i > day.to_i
100
+ elsif input.length == 4 # 2004
101
+ year = input.to_s
102
+ elsif input.length == 2 # 85
103
+ year = (input.to_i > Date.today.year.to_s.slice(2..4).to_i) ? "19#{input}" : "20#{input}"
104
+ elsif input.match(/\w/) # Jun 23, 2004
105
+ begin
106
+ d = Date.parse(input)
107
+ year, month, day = d.year.to_s, d.month.to_s, d.day.to_s
108
+ rescue; end
76
109
  end
77
- elsif input.length == 4 # 2004
78
- year = input.to_s if (input.slice(0..1) == '19' || input.slice(0..1) == '20')
79
- elsif input.length == 2 # 85
80
- year = (input.to_i > Date.today.year.to_s.slice(2..4).to_i) ? "19#{input}" : "20#{input}"
81
- elsif input.match(/\w/) # Jun 23, 2004
82
- begin
83
- d = Date.parse(input)
84
- year, month, day = d.year.to_s, d.month.to_s, d.day.to_s
85
- rescue; end
110
+ return [year, month, day]
86
111
  end
87
- return [year, month, day]
112
+
88
113
  end
89
-
90
- end
114
+
115
+ end
@@ -86,22 +86,22 @@ class PseudoDate
86
86
  when 'exact'
87
87
  self.to_date < other.to_date
88
88
  when 'year_month'
89
- self.year == other.year ? (self.month < other.month) : (self.year < other.year)
89
+ self.year == other.year ? (self.month.to_i < other.month.to_i) : (self.year.to_i < other.year.to_i)
90
90
  when 'year'
91
- self.year < other.year
91
+ self.year.to_i < other.year.to_i
92
92
  when 'mixed'
93
93
  if self.precision == 'invalid'
94
94
  true
95
95
  elsif other.precision == 'invalid'
96
96
  false
97
- elsif self.year == other.year
98
- if self.month == other.month
97
+ elsif self.year.to_i == other.year.to_i
98
+ if self.month.to_i == other.month.to_i
99
99
  self.day.to_i < other.day.to_i
100
100
  else
101
- self.month < other.month
101
+ self.month.to_i < other.month.to_i
102
102
  end
103
103
  else
104
- self.year < other.year
104
+ self.year.to_i < other.year.to_i
105
105
  end
106
106
  else
107
107
  false
@@ -1,3 +1,3 @@
1
1
  class PseudoDate
2
- VERSION = "0.1.5"
2
+ VERSION = "0.3.0"
3
3
  end
data/pseudo_date.gemspec CHANGED
@@ -16,6 +16,7 @@ Gem::Specification.new do |gem|
16
16
  gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
17
  gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
18
18
  gem.require_paths = ["lib"]
19
-
19
+ gem.license = 'MIT'
20
+
20
21
  gem.add_development_dependency("rspec")
21
22
  end
data/spec/parser_spec.rb CHANGED
@@ -8,7 +8,7 @@ describe "PseudoDate Parsing" do
8
8
  @year = '1985'
9
9
  @string_date = 'Jun 25, 1985'
10
10
  end
11
-
11
+
12
12
  context "date hash" do
13
13
  it "should parse an exact hash" do
14
14
  pd = PseudoDate.new(:year => @year, :day => @day, :month => @month)
@@ -24,7 +24,7 @@ describe "PseudoDate Parsing" do
24
24
  pd.year.should == @year
25
25
  end
26
26
  end
27
-
27
+
28
28
  # 19850625
29
29
  context "yearmonthday" do
30
30
  it 'should be exact precision' do
@@ -38,7 +38,7 @@ describe "PseudoDate Parsing" do
38
38
  pd.year.should == @year
39
39
  end
40
40
  end
41
-
41
+
42
42
  # 1985-25-06
43
43
  context "year-day-month" do
44
44
  it 'should be exact precision' do
@@ -52,7 +52,7 @@ describe "PseudoDate Parsing" do
52
52
  pd.year.should == @year
53
53
  end
54
54
  end
55
-
55
+
56
56
  # 06-25-1985
57
57
  context "month-day-year" do
58
58
  it 'should be exact precision' do
@@ -66,7 +66,7 @@ describe "PseudoDate Parsing" do
66
66
  pd.year.should == @year
67
67
  end
68
68
  end
69
-
69
+
70
70
  # 25-06-1985
71
71
  context "day-month-year" do
72
72
  it 'should be exact precision' do
@@ -80,7 +80,7 @@ describe "PseudoDate Parsing" do
80
80
  pd.year.should == @year
81
81
  end
82
82
  end
83
-
83
+
84
84
  # 06/25/1985
85
85
  context "month/day/year" do
86
86
  it 'should be exact precision' do
@@ -94,7 +94,21 @@ describe "PseudoDate Parsing" do
94
94
  pd.year.should == @year
95
95
  end
96
96
  end
97
-
97
+
98
+ # 06/7/1985
99
+ context "month/day/year" do
100
+ it 'should be exact precision' do
101
+ PseudoDate.new("#{@month}/7/#{@year}").precision.should == 'exact'
102
+ end
103
+
104
+ it 'should match original input' do
105
+ pd = PseudoDate.new("#{@month}/7/#{@year}")
106
+ pd.day.should == "07"
107
+ pd.month.should == @month
108
+ pd.year.should == @year
109
+ end
110
+ end
111
+
98
112
  # 06/1985
99
113
  context "month/year" do
100
114
  it 'should be partial precision' do
@@ -107,7 +121,7 @@ describe "PseudoDate Parsing" do
107
121
  pd.year.should == @year
108
122
  end
109
123
  end
110
-
124
+
111
125
  # 85
112
126
  context "two digit year" do
113
127
  it 'should be year precision' do
@@ -120,16 +134,36 @@ describe "PseudoDate Parsing" do
120
134
  end
121
135
  end
122
136
 
123
- # 1985
124
137
  context "four digit year" do
138
+
125
139
  it 'should be year precision' do
126
140
  PseudoDate.new(@year).precision.should == 'year'
127
141
  end
128
-
129
- it 'should match original input' do
130
- pd = PseudoDate.new(@year)
131
- pd.year.should == @year
132
- end
142
+
143
+ # 1885
144
+ context "in the 19th century" do
145
+ it 'should match original input' do
146
+ pd = PseudoDate.new('1885')
147
+ pd.year.should == '1885'
148
+ end
149
+ end
150
+
151
+ # 1985
152
+ context "in the 20th century" do
153
+ it 'should match original input' do
154
+ pd = PseudoDate.new('1985')
155
+ pd.year.should == '1985'
156
+ end
157
+ end
158
+
159
+ # 2085
160
+ context "in the 21st century" do
161
+ it 'should match original input' do
162
+ pd = PseudoDate.new('2085')
163
+ pd.year.should == '2085'
164
+ end
165
+ end
166
+
133
167
  end
134
168
 
135
169
  # Jun 25, 1985
@@ -22,7 +22,7 @@ describe "PseudoDate" do
22
22
  it "should demonstrate later dates as greater than older dates" do
23
23
  old_date = PseudoDate.new(:year => @year, :month => @month)
24
24
  new_date = PseudoDate.new(:year => 1996, :month => @month)
25
- (old_date < new_date).should be_true
25
+ (old_date < new_date).should == true
26
26
  end
27
27
  it "should respond properly with the spaceship operator" do
28
28
  old_date = PseudoDate.new(:year => @year, :month => @month)
@@ -37,7 +37,7 @@ describe "PseudoDate" do
37
37
  it "should demonstrate later dates as greater than older dates" do
38
38
  old_date = PseudoDate.new(:year => @year, :month => @month, :day => @day)
39
39
  new_date = PseudoDate.new(:year => 1996, :month => @month, :day => @day)
40
- (old_date < new_date).should be_true
40
+ (old_date < new_date).should == true
41
41
  end
42
42
  it "should respond properly with the spaceship operator" do
43
43
  old_date = PseudoDate.new(:year => @year, :month => @month, :day => @day)
@@ -52,12 +52,12 @@ describe "PseudoDate" do
52
52
  it "should demonstrate later dates as greater than older dates" do
53
53
  old_date = PseudoDate.new(:year => @year, :month => @month)
54
54
  new_date = PseudoDate.new(:year => 1996, :month => @month, :day => @day)
55
- (old_date < new_date).should be_true
55
+ (old_date < new_date).should == true
56
56
  end
57
57
  it "should demonstrate invalid dates as less than complete dates" do
58
58
  complete = PseudoDate.new(:year => @year, :month => @month)
59
59
  invalid = PseudoDate.new("")
60
- (complete > invalid).should be_true
60
+ (complete > invalid).should == true
61
61
  end
62
62
  it "should respond properly with the spaceship operator" do
63
63
  old_date = PseudoDate.new(:year => @year, :month => @month)
data/spec/spec_helper.rb CHANGED
@@ -1,3 +1,9 @@
1
1
  require 'rubygems'
2
2
  require 'rspec'
3
3
  require File.dirname(__FILE__) + '/../lib/pseudo_date'
4
+
5
+ RSpec.configure do |config|
6
+ config.expect_with :rspec do |c|
7
+ c.syntax = :should
8
+ end
9
+ end
metadata CHANGED
@@ -1,30 +1,27 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pseudo_date
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.5
5
- prerelease:
4
+ version: 0.3.0
6
5
  platform: ruby
7
6
  authors:
8
7
  - Patrick Tulskie
9
- autorequire:
8
+ autorequire:
10
9
  bindir: bin
11
10
  cert_chain: []
12
- date: 2013-05-01 00:00:00.000000000 Z
11
+ date: 2022-06-09 00:00:00.000000000 Z
13
12
  dependencies:
14
13
  - !ruby/object:Gem::Dependency
15
14
  name: rspec
16
15
  requirement: !ruby/object:Gem::Requirement
17
- none: false
18
16
  requirements:
19
- - - ! '>='
17
+ - - ">="
20
18
  - !ruby/object:Gem::Version
21
19
  version: '0'
22
20
  type: :development
23
21
  prerelease: false
24
22
  version_requirements: !ruby/object:Gem::Requirement
25
- none: false
26
23
  requirements:
27
- - - ! '>='
24
+ - - ">="
28
25
  - !ruby/object:Gem::Version
29
26
  version: '0'
30
27
  description: Date parser and container for partial or incomplete dates.
@@ -34,7 +31,7 @@ executables: []
34
31
  extensions: []
35
32
  extra_rdoc_files: []
36
33
  files:
37
- - .gitignore
34
+ - ".gitignore"
38
35
  - Gemfile
39
36
  - Gemfile.lock
40
37
  - Manifest
@@ -52,28 +49,27 @@ files:
52
49
  - spec/pseudo_date_spec.rb
53
50
  - spec/spec_helper.rb
54
51
  homepage: http://github.com/PatrickTulskie/pseudo_date
55
- licenses: []
56
- post_install_message:
52
+ licenses:
53
+ - MIT
54
+ metadata: {}
55
+ post_install_message:
57
56
  rdoc_options: []
58
57
  require_paths:
59
58
  - lib
60
59
  required_ruby_version: !ruby/object:Gem::Requirement
61
- none: false
62
60
  requirements:
63
- - - ! '>='
61
+ - - ">="
64
62
  - !ruby/object:Gem::Version
65
63
  version: '0'
66
64
  required_rubygems_version: !ruby/object:Gem::Requirement
67
- none: false
68
65
  requirements:
69
- - - ! '>='
66
+ - - ">="
70
67
  - !ruby/object:Gem::Version
71
68
  version: '0'
72
69
  requirements: []
73
- rubyforge_project:
74
- rubygems_version: 1.8.24
75
- signing_key:
76
- specification_version: 3
70
+ rubygems_version: 3.2.3
71
+ signing_key:
72
+ specification_version: 4
77
73
  summary: Date parser and container for partial or incomplete dates.
78
74
  test_files:
79
75
  - spec/parser_spec.rb