sps_bill 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
data/.travis.yml CHANGED
@@ -1,3 +1,8 @@
1
1
  # These are specific configuration settings required for travis-ci
2
2
  # see http://travis-ci.org/tardate/sps_bill_scanner
3
- rvm: 1.9.3
3
+ language: ruby
4
+ rvm:
5
+ - 1.9.2
6
+ - 1.9.3
7
+ - rbx-19mode
8
+ - jruby-19mode
data/CHANGELOG ADDED
@@ -0,0 +1,9 @@
1
+ Version 0.1.1 Release: 1st August 2012
2
+ ==================================================
3
+ * update to use pdf-reader-turtletext 0.2.2
4
+ * convert bill parsing to use the more idiomatic bounding_box
5
+ syntax available in pdf-reader-turtletext
6
+
7
+ Version 0.1.0 Release: 20th July 2012
8
+ ==================================================
9
+ * Initial packaging and release
data/Gemfile CHANGED
@@ -1,12 +1,11 @@
1
1
  source "http://rubygems.org"
2
2
 
3
- gem 'pdf-reader', '1.1.1'
3
+ gem 'pdf-reader-turtletext', '~> 0.2.2'
4
4
  gem 'getoptions', '~> 0.3'
5
5
 
6
6
  group :development do
7
7
  gem 'bundler', '~> 1.1.4'
8
8
  gem 'jeweler', '~> 1.6.4'
9
- gem 'rcov', '>= 0'
10
9
  end
11
10
 
12
11
  group :development, :test do
data/Gemfile.lock CHANGED
@@ -3,7 +3,7 @@ GEM
3
3
  specs:
4
4
  Ascii85 (1.0.1)
5
5
  diff-lcs (1.1.3)
6
- ffi (1.0.11)
6
+ ffi (1.1.0)
7
7
  getoptions (0.3)
8
8
  git (1.2.5)
9
9
  guard (1.2.3)
@@ -15,7 +15,7 @@ GEM
15
15
  bundler (~> 1.0)
16
16
  git (>= 1.2.5)
17
17
  rake
18
- json (1.6.4)
18
+ json (1.7.3)
19
19
  listen (0.4.7)
20
20
  rb-fchange (~> 0.0.5)
21
21
  rb-fsevent (~> 0.9.1)
@@ -23,13 +23,14 @@ GEM
23
23
  pdf-reader (1.1.1)
24
24
  Ascii85 (~> 1.0.0)
25
25
  ruby-rc4
26
+ pdf-reader-turtletext (0.2.2)
27
+ pdf-reader (= 1.1.1)
26
28
  rake (0.9.2.2)
27
29
  rb-fchange (0.0.5)
28
30
  ffi
29
31
  rb-fsevent (0.9.1)
30
32
  rb-inotify (0.8.8)
31
33
  ffi (>= 0.5.0)
32
- rcov (0.9.11)
33
34
  rdoc (3.12)
34
35
  json (~> 1.4)
35
36
  rspec (2.8.0)
@@ -51,8 +52,7 @@ DEPENDENCIES
51
52
  getoptions (~> 0.3)
52
53
  guard-rspec
53
54
  jeweler (~> 1.6.4)
54
- pdf-reader (= 1.1.1)
55
+ pdf-reader-turtletext (~> 0.2.2)
55
56
  rake (~> 0.9.2.2)
56
- rcov
57
57
  rdoc (~> 3.11)
58
58
  rspec (~> 2.8.0)
data/README.rdoc CHANGED
@@ -1,12 +1,17 @@
1
1
  = SP Services Bill Scanner {<img src="https://secure.travis-ci.org/tardate/sps_bill_scanner.png" />}[http://travis-ci.org/tardate/sps_bill_scanner]
2
2
 
3
- Extracts bill details from SP Services PDF bills so that you can, um, do geeky data analysis n'stuff.
3
+ Extracts bill details from SP Services PDF bills so that you can, um, do geeky data analysis n'stuff,
4
+ and because I loathe data entry!
5
+ One day we'll have {smart meters}[http://en.wikipedia.org/wiki/Smart_meter]
6
+ and SP Services will let us download our raw meter data. But until then...
4
7
 
5
8
  If you are an SP Services subscriber, download your bills from https://services.spservices.sg
6
9
 
7
10
  If you are not an SP Services subscriber, this gem ain't going to be much use for you!
8
11
 
9
12
  Some example analysis using {R}[http://www.r-project.org/] is included in the <tt>scripts</tt> folder.
13
+ The inspiration for hacking away with R comes from reading Sau Sheong's new book
14
+ {Exploring Everyday Things with Ruby and R}[http://www.bookjetty.com/books/1449315151/exploring-data-learning-everyday]. Check it out!
10
15
 
11
16
  == Requirements and Known Limitations
12
17
 
@@ -53,11 +58,14 @@ Here's the basic outline:
53
58
  - copy this to <tt>spec/fixtures/personal_pdf_samples/expectations.yml</tt>
54
59
  - enter in the details that describe each bill you have added
55
60
  - now when you run <tt>rake</tt> it will also verify the data extracted from your
56
- bills using expectations.yml
61
+ bills using <tt>expectations.yml</tt>
57
62
 
58
63
  Feel free to get in touch or discuss in the github issues area if you are trying to help but run
59
64
  into problems with this!
60
65
 
66
+ If you are more interested in the data analytics, I'm keen to add more interesting R scripts to the collection.
67
+ Your contributions are most welcome.
68
+
61
69
  == Installation
62
70
 
63
71
  gem install sps_bill
@@ -152,14 +160,16 @@ in the <tt>scripts</tt> folder.
152
160
  === sample data and analysis
153
161
 
154
162
  [data/all_services.csv.sample] sample CSV data for a years worth of elec, gas, and water
155
- [data/all_services.sample.pdf] PDF analysis produced by this script for all_services.csv.sample
163
+ [data/all_services.sample.pdf] PDF analysis produced by <tt>full_analysis.R</tt> using
164
+ the <tt>all_services.csv.sample</tt> data set.
156
165
  [data/elec_and_water_only.csv.sample] sample CSV data for a years worth of elec and water
157
- [data/elec_and_water_only.sample.pdf] PDF analysis produced by this script for elec_and_water_only.csv.sample
166
+ [data/elec_and_water_only.sample.pdf] PDF analysis produced by <tt>full_analysis.R</tt> using
167
+ the <tt>elec_and_water_only.csv.sample</tt> data set.
158
168
 
159
169
  === example run
160
170
 
161
171
  ./scan_all_bills.sh ../path_to_my_bills/*.pdf > my_bill_data.csv
162
- ./full_analysis.R data_file.csv my_bill_data.csv
172
+ ./full_analysis.R my_bill_data.csv
163
173
 
164
174
  This will have produced an analysis of all your bills in <tt>full_analysis.pdf</tt>.
165
175
 
data/lib/sps_bill.rb CHANGED
@@ -1,8 +1,4 @@
1
- require 'pdf-reader'
2
- require 'pdf/object_hash'
3
- require 'pdf/positional_text_receiver'
4
- require 'pdf/textangle'
5
- require 'pdf/structured_reader'
1
+ require 'pdf-reader-turtletext'
6
2
 
7
3
  module SpsBill
8
4
  end
data/lib/sps_bill/bill.rb CHANGED
@@ -10,15 +10,18 @@ class SpsBill::Bill
10
10
 
11
11
  # accessors for the various bill components
12
12
  #
13
- # electricity_usage charges is an array of hashed values:
14
- # [{ kwh: float, rate: float, amount: float }]
15
- # gas_usage charges is an array of hashed values:
16
- # [{ kwh: float, rate: float, amount: float }]
17
- # water_usage charges is an array of hashed values:
18
- # [{ cubic_m: float, rate: float, amount: float }]
19
- #
13
+
20
14
  attr_reader :account_number,:total_amount,:invoice_date,:invoice_month
21
- attr_reader :electricity_usage,:gas_usage,:water_usage
15
+
16
+ # electricity_usage is an array of hashed values:
17
+ # [{ kwh: float, rate: float, amount: float }]
18
+ attr_reader :electricity_usage
19
+ # gas_usage is an array of hashed values:
20
+ # [{ kwh: float, rate: float, amount: float }]
21
+ attr_reader :gas_usage
22
+ # water_usage is an array of hashed values:
23
+ # [{ cubic_m: float, rate: float, amount: float }]
24
+ attr_reader :water_usage
22
25
 
23
26
  # +source+ is a file name or stream-like object
24
27
  def initialize(source)
@@ -28,7 +31,7 @@ class SpsBill::Bill
28
31
 
29
32
  # Returns the PDF reader isntance
30
33
  def reader
31
- @reader ||= PDF::StructuredReader.new(source_file) if source_file
34
+ @reader ||= PDF::Reader::Turtletext.new(source_file) if source_file
32
35
  end
33
36
 
34
37
  # Return a pretty(-ish) text format of the core bill details
@@ -12,8 +12,8 @@ class SpsBill::BillCollection < Array
12
12
 
13
13
  # Returns an array of Bill objects for PDF files matching +path_spec+.
14
14
  # +path_spec+ may be either:
15
- # - an array of filenames e.g. ['data/file1.pdf','file2.pdf']
16
- # - or a single file or path spec e.g. './somepath/file1.pdf' or './somepath/*.pdf'
15
+ # - an array of filenames e.g. ['data/file1.pdf','file2.pdf']
16
+ # - or a single file or path spec e.g. './somepath/file1.pdf' or './somepath/*.pdf'
17
17
  def load(path_spec)
18
18
  path_spec = Dir[path_spec] unless path_spec.class <= Array
19
19
  path_spec.each_with_object(new) do |filename,memo|
@@ -23,30 +23,33 @@ class SpsBill::BillCollection < Array
23
23
 
24
24
  end
25
25
 
26
+ # Returns the suitable array of headers for +dataset_selector+
26
27
  def headers(dataset_selector)
27
28
  case dataset_selector
28
29
  when :total_amounts
29
- ['invoice_month','amount']
30
+ %w(invoice_month amount)
30
31
  when :electricity_usages
31
- ['invoice_month','kwh','rate','amount']
32
+ %w(invoice_month kwh rate amount)
32
33
  when :gas_usages
33
- ['invoice_month','kwh','rate','amount']
34
+ %w(invoice_month kwh rate amount)
34
35
  when :water_usages
35
- ['invoice_month','cubic_m','rate','amount']
36
+ %w(invoice_month cubic_m rate amount)
36
37
  when :all_data
37
- ['invoice_month','measure','kwh','cubic_m','rate','amount']
38
+ %w(invoice_month measure kwh cubic_m rate amount)
38
39
  end
39
40
  end
40
41
 
41
- # Returns a hash of all data by month
42
- # [[month,measure,kwh,cubic_m,rate,amount]]
42
+ # Returns an array of all data by month
43
+ # [[month,measure,kwh,cubic_m,rate,amount]]
43
44
  # measure: total_charges,electricity,gas,water
44
45
  def all_data
45
46
  total_amounts(:all) + electricity_usages(:all) + gas_usages(:all) + water_usages(:all)
46
47
  end
47
48
 
48
- # Returns a hash of total bill amounts by month
49
- # [[month,amount]]
49
+ # Returns an array of total bill amounts by month
50
+ # [[month,amount]]
51
+ # when +style+ is :solo, returns minimal array to describe this data set in isolation,
52
+ # else returns a normalised sparse array that is common to all data sets
50
53
  def total_amounts(style=:solo)
51
54
  each_with_object([]) do |bill,memo|
52
55
  if style==:solo
@@ -57,8 +60,10 @@ class SpsBill::BillCollection < Array
57
60
  end
58
61
  end
59
62
 
60
- # Returns a hash of electricity_usages by month
61
- # [[month,kwh,rate,amount]]
63
+ # Returns an array of electricity_usages by month
64
+ # [[month,kwh,rate,amount]]
65
+ # when +style+ is :solo, returns minimal array to describe this data set in isolation,
66
+ # else returns a normalised sparse array that is common to all data sets
62
67
  def electricity_usages(style=:solo)
63
68
  each_with_object([]) do |bill,memo|
64
69
  bill.electricity_usage.each do |usage|
@@ -71,8 +76,10 @@ class SpsBill::BillCollection < Array
71
76
  end
72
77
  end
73
78
 
74
- # Returns a hash of gas_usages by month
75
- # [[month,kwh,rate,amount]]
79
+ # Returns an array of gas_usages by month
80
+ # [[month,kwh,rate,amount]]
81
+ # when +style+ is :solo, returns minimal array to describe this data set in isolation,
82
+ # else returns a normalised sparse array that is common to all data sets
76
83
  def gas_usages(style=:solo)
77
84
  each_with_object([]) do |bill,memo|
78
85
  bill.gas_usage.each do |usage|
@@ -85,8 +92,10 @@ class SpsBill::BillCollection < Array
85
92
  end
86
93
  end
87
94
 
88
- # Returns a hash of water_usages by month
89
- # [[month,kwh,rate,amount]]
95
+ # Returns an array of water_usages by month
96
+ # [[month,kwh,rate,amount]]
97
+ # when +style+ is :solo, returns minimal array to describe this data set in isolation,
98
+ # else returns a normalised sparse array that is common to all data sets
90
99
  def water_usages(style=:solo)
91
100
  each_with_object([]) do |bill,memo|
92
101
  bill.water_usage.each do |usage|
@@ -3,9 +3,12 @@ require 'date'
3
3
  # all the bill scanning and parsing intelligence
4
4
  module SpsBill::BillParser
5
5
 
6
- ELECTRICITY_SERVICE_HEAD = "Electricity Services"
7
- GAS_SERVICE_HEAD = "Gas Services by City Gas Pte Ltd"
8
- WATER_SERVICE_HEAD = "Water Services by Public Utilities Board"
6
+ ELECTRICITY_SERVICE_HEADER = /Electricity Services/i
7
+ ELECTRICITY_SERVICE_FOOTER = /Gas Services|Water Services/i
8
+ GAS_SERVICE_HEADER = /Gas Services/i
9
+ GAS_SERVICE_FOOTER = /Water Services/i
10
+ WATER_SERVICE_HEADER = /Water Services/i
11
+ WATER_SERVICE_FOOTER = /Waterborne Fee/i
9
12
 
10
13
  # Returns a collection of parser errors
11
14
  def errors
@@ -26,66 +29,109 @@ module SpsBill::BillParser
26
29
 
27
30
  # Command: extracts the account number
28
31
  def parse_account_number
29
- @account_number = reader.text_in_rect(383.0,999.0,785.0,790.0,1).flatten.join('')
32
+ region = reader.bounding_box do
33
+ exclusive!
34
+ below 'Dated'
35
+ above 'Type'
36
+ right_of 'Account No'
37
+ end
38
+ # text will be returned like this:
39
+ # [[":", "8123123123"]]
40
+ @account_number = region.text.flatten.last
30
41
  end
31
42
 
32
43
  # Command: extracts the total amount due for the current month
33
44
  def parse_total_amount
34
- @total_amount = if ref = reader.text_position(/^Total Current Charges due on/)
35
- total_parts = reader.text_in_rect(ref[:x] + 1,400.0,ref[:y] - 1,ref[:y] + 1,1)
36
- total_parts.flatten.first.to_f
45
+ region = reader.bounding_box do
46
+ inclusive!
47
+ below /^Total Current Charges due on/
48
+ above /^Total Current Charges due on/
49
+ right_of /^Total Current Charges due on/
50
+ left_of 400.0
37
51
  end
52
+ # text will be returned like this:
53
+ # [["Total Current Charges due on 14 Jun 2011 (Tue)", "251.44"]]
54
+ @total_amount = region.text.flatten.last.to_f
38
55
  end
39
56
 
40
57
  # Command: extracts the invoice date
41
58
  def parse_invoice_date
42
- @invoice_date = if ref = reader.text_position("Dated")
43
- date_parts = reader.text_in_rect(ref[:x] + 1,999.0,ref[:y] - 1,ref[:y] + 1,1)
44
- Date.parse(date_parts.first.join('-'))
59
+ region = reader.bounding_box do
60
+ inclusive!
61
+ below 'Dated'
62
+ above 'Dated'
63
+ right_of 'Dated'
45
64
  end
65
+ # text will be returned like this:
66
+ # [["Dated", "31", "May", "2011"]]
67
+ date_string = region.text.flatten.slice(1..3).join('-')
68
+ @invoice_date = Date.parse(date_string)
46
69
  end
47
70
 
48
71
  # Command: extracts the invoice month (as Date, set to 1st of the month)
49
72
  def parse_invoice_month
50
- @invoice_month = if ref = reader.text_position("Dated")
51
- date_parts = reader.text_in_rect(ref[:x] + 1,999.0,ref[:y] - 1,ref[:y] + 1,1)
52
- m_parts = reader.text_in_rect(ref[:x]-200,ref[:x]-1,ref[:y] - 1,ref[:y] + 1,1)
53
- Date.parse("#{date_parts.first.last}-#{m_parts.first.first}-01")
73
+ region = reader.bounding_box do
74
+ inclusive!
75
+ below 'Dated'
76
+ above 'Dated'
77
+ end
78
+ # text will be returned like this:
79
+ # [["May", "11", "Bill", "Dated", "31", "May", "2011"]]
80
+ date_array = ['01'] + region.text.flatten.slice(0..1)
81
+ if (yy = date_array[2]).length == 2
82
+ date_array[2] = "20#{yy}" # WARNING: converting 2-digit date. Assumed to be 21st C
54
83
  end
84
+ @invoice_month = Date.parse(date_array.join('-'))
55
85
  end
56
86
 
57
- # Command: extracts an array of electricity usage charges. Each charge is a Hash:
58
- # { kwh: float, rate: float, amount: float }
87
+ # Command: extracts an array of electricity usage charges. Each element is a Hash:
88
+ # { kwh: float, rate: float, amount: float }
59
89
  def parse_electricity_usage
60
- @electricity_usage = if upper_ref = reader.text_position(ELECTRICITY_SERVICE_HEAD)
61
- lower_ref = reader.text_position(GAS_SERVICE_HEAD)
62
- lower_ref ||= reader.text_position(WATER_SERVICE_HEAD)
63
- if lower_ref
64
- raw_data = reader.text_in_rect(240.0,450.0,lower_ref[:y]+1,upper_ref[:y],1)
65
- raw_data.map{|l| {:kwh => l[0].gsub(/kwh/i,'').to_f, :rate => l[1].to_f, :amount => l[2].to_f} }
66
- end
90
+ region = reader.bounding_box do
91
+ exclusive!
92
+ below ELECTRICITY_SERVICE_HEADER
93
+ above ELECTRICITY_SERVICE_FOOTER
94
+ right_of 240.0
95
+ left_of 450.0
96
+ end
97
+ # text will be returned like this:
98
+ # [["4 kWh", "0.2410", "0.97"], ["616 kWh", "0.2558", "157.57"]]
99
+ @electricity_usage = unless (raw_data = region.text).empty?
100
+ raw_data.map{|l| {:kwh => l[0].gsub(/kwh/i,'').to_f, :rate => l[1].to_f, :amount => l[2].to_f} }
67
101
  end
68
102
  end
69
103
 
70
- # Command: extracts an array of gas usage charges. Each charge is a Hash:
71
- # { kwh: float, rate: float, amount: float }
104
+ # Command: extracts an array of gas usage charges. Each element is a Hash:
105
+ # { kwh: float, rate: float, amount: float }
72
106
  def parse_gas_usage
73
- @gas_usage = if upper_ref = reader.text_position(GAS_SERVICE_HEAD)
74
- if lower_ref = reader.text_position(WATER_SERVICE_HEAD)
75
- raw_data = reader.text_in_rect(240.0,450.0,lower_ref[:y]+1,upper_ref[:y],1)
76
- raw_data.map{|l| {:kwh => l[0].gsub(/kwh/i,'').to_f, :rate => l[1].to_f, :amount => l[2].to_f} }
77
- end
107
+ region = reader.bounding_box do
108
+ exclusive!
109
+ below GAS_SERVICE_HEADER
110
+ above GAS_SERVICE_FOOTER
111
+ right_of 240.0
112
+ left_of 450.0
113
+ end
114
+ # text will be returned like this:
115
+ # [["4 kWh", "0.2410", "0.97"], ["616 kWh", "0.2558", "157.57"]]
116
+ @gas_usage = unless (raw_data = region.text).empty?
117
+ raw_data.map{|l| {:kwh => l[0].gsub(/kwh/i,'').to_f, :rate => l[1].to_f, :amount => l[2].to_f} }
78
118
  end
79
119
  end
80
120
 
81
- # Command: extracts an array of water usage charges. Each charge is a Hash:
82
- # { cubic_m: float, rate: float, amount: float }
121
+ # Command: extracts an array of water usage charges. Each element is a Hash:
122
+ # { cubic_m: float, rate: float, amount: float }
83
123
  def parse_water_usage
84
- @water_usage = if upper_ref = reader.text_position(WATER_SERVICE_HEAD)
85
- if lower_ref = reader.text_position("Waterborne Fee")
86
- raw_data = reader.text_in_rect(240.0,450.0,lower_ref[:y]+1,upper_ref[:y],1)
87
- raw_data.map{|l| {:cubic_m => l[0].gsub(/cu m/i,'').to_f, :rate => l[1].to_f, :amount => l[2].to_f} }
88
- end
124
+ region = reader.bounding_box do
125
+ exclusive!
126
+ below WATER_SERVICE_HEADER
127
+ above WATER_SERVICE_FOOTER
128
+ right_of 240.0
129
+ left_of 450.0
130
+ end
131
+ # text will be returned like this:
132
+ # [["36.1 Cu M", "1.1700", "42.24"], ["-3.0 Cu M", "1.4000", "-4.20"]]
133
+ @water_usage = unless (raw_data = region.text).empty?
134
+ raw_data.map{|l| {:cubic_m => l[0].gsub(/cu m/i,'').to_f, :rate => l[1].to_f, :amount => l[2].to_f} }
89
135
  end
90
136
  end
91
137
 
@@ -12,13 +12,15 @@ SP Services Bill Scanner v#{SpsBill::Version::STRING}
12
12
  ===================================
13
13
 
14
14
  Usage:
15
- sps_bill [options]
15
+ sps_bill [options] file-spec
16
16
 
17
17
  Command Options
18
18
  -r | --raw raw data format (without headers)
19
19
  -c | --csv output in CSV format (default)
20
20
  -d= | --data=[charges,electricity,gas,water,all]
21
21
 
22
+ file-spec is a path to the PDF bill(s) to read.
23
+
22
24
  EOS
23
25
  end
24
26
 
@@ -29,7 +31,7 @@ Command Options
29
31
  end
30
32
 
31
33
  def run
32
- if options[:help]
34
+ if options[:help] or fileset.empty?
33
35
  self.class.usage
34
36
  return
35
37
  end