pdf-reader-turtletext 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,3 +1,11 @@
1
1
  # These are specific configuration settings required for travis-ci
2
2
  # see http://travis-ci.org/tardate/pdf-reader-turtletext
3
- rvm: 1.9.3
3
+ language: ruby
4
+ rvm:
5
+ - 1.8.7
6
+ - 1.9.2
7
+ - 1.9.3
8
+ - rbx-18mode
9
+ - rbx-19mode
10
+ - jruby-18mode
11
+ - jruby-19mode
@@ -0,0 +1,11 @@
1
+ Version 0.2.0 Release: n/a
2
+ ==================================================
3
+ * add bounding_box / textangle semantics
4
+ * improve documentation
5
+ * MRI 1.8.7, 1.9.2, 1.9.3, Rubinius (1.8 and 1.9 mode), JRuby (1.8 and 1.9 mode)
6
+
7
+ Version 0.1.0 Release: 22nd July 2012
8
+ ==================================================
9
+ * Initial packaging and release of core functionality directly extracted
10
+ from https://github.com/tardate/sps_bill_scanner/
11
+ * MRI 1.9 only
@@ -14,30 +14,121 @@ For an example of how this is works in practice, see the
14
14
 
15
15
  == Requirements and Known Limitations
16
16
 
17
- * currently only tested with Ruby 1.9
18
- * fixed dependency on PDF::Reader v 1.1.1
17
+ * Tested with MRI 1.8.7, 1.9.2, 1.9.3, Rubinius (1.8 and 1.9 mode), JRuby (1.8 and 1.9 mode)
18
+ * Has a fixed dependency on PDF::Reader v1.1.1
19
19
 
20
- == Installation
20
+ == The PDF::Reader::Turtletext Cookbook
21
21
 
22
- gem install pdf-reader-turtletext
22
+ === How do I install it for normal use?
23
23
 
24
- == Usage
24
+ It is distributed as a gem, so all normal gem installation procedures apply. To install the
25
+ gem directly from the command line:
25
26
 
26
- === PDF::Reader::Turtletext
27
+ $ gem install pdf-reader-turtletext
27
28
 
28
- Provides a range of methods to extract structured text from a PDF file,
29
- such as <tt>text_position</tt> and <tt>text_in_region</tt>.
29
+ If you are using bundler or Rails, add to your Gemfile:
30
30
 
31
- A typical usage:
31
+ gem 'pdf-reader-turtletext'
32
32
 
33
+ Then bundle install:
34
+
35
+ $ bundle
36
+
37
+ === How do I install it for gem development?
38
+
39
+ If you want to work on enhancements of fix bugs in PDF::Reader::Turtletext, fork and clone the github repository. See the section below on 'Contributing to PDF::Reader::Turtletext'
40
+
41
+ === How to instantiate Turtletext in code
42
+
43
+ All interaction is done using an instance of the PDF::Reader::Turtletext class. It is
44
+ initialised given a filename or IO-like object, and any required options.
45
+
46
+ Typical usage:
47
+
48
+ pdf_filename = '../some_path/some.pdf'
33
49
  reader = PDF::Reader::Turtletext.new(pdf_filename)
34
- page = 1
35
- heading_position = reader.text_position(/transaction table/i)
36
- next_section = reader.text_position(/transaction summary/i)
37
- transaction_rows = reader.text_in_region(
38
- heading_position[x], 900,
39
- heading_position[y] + 1,next_section[:y] -1
50
+ options = { :y_precision => 5 }
51
+ reader_with_options = PDF::Reader::Turtletext.new(pdf_filename,options)
52
+
53
+ === How to extract text within a region described in relation to other text
54
+
55
+ Problem: we don't know exactly where the required text will be on the page, and it is not encoded
56
+ within the PDF as a single object. But we do know that it will be relatively positioned (for example)
57
+ below a certain bit of text, to the left of another, and above some other text.
58
+
59
+ Solution: use the <tt>bounding_box</tt> method to describe the region and extract the matching text.
60
+
61
+ textangle = reader.bounding_box do
62
+ page 1
63
+ below /electricity/i
64
+ above 10
65
+ right_of 240.0
66
+ left_of "Total ($)"
67
+ end
68
+ textangle.text
69
+ => [['string','string'],['string']] # array of rows, each row is an array of text elements in the row
70
+
71
+ The range of methods that can be used within the <tt>bounding_box</tt> block are all optional, and include:
72
+ * <tt>page</tt> - specifies the PDF page from which to extract text (default is 1).
73
+ * <tt>below</tt> - a string, regex or number that describes the upper limit of the text box
74
+ (default is top border of the page).
75
+ * <tt>above</tt> - a string, regex or number that describes the lower limit of the text box
76
+ (default is bottom border of the page).
77
+ * <tt>left_of</tt> - a string, regex or number that describes the right limit of the text box
78
+ (default is right border of the page).
79
+ * <tt>right_of</tt> - a string, regex or number that describes the left limit of the text box
80
+ (default is left border of the page).
81
+
82
+ Note that <tt>left_of</tt> and <tt>right_of</tt> constraints do *not* need to be within the vertical
83
+ range of the box being described.
84
+ For example, you could use an element in the page header to describe the <tt>left_of</tt> limit
85
+ for a table at the bottom of the page, if it has the correct alignment needed to describe your text region.
86
+
87
+ Similarly, <tt>above</tt> and <tt>below</tt> constraints do *not* need to be within the horizontal
88
+ range of the box being described.
89
+
90
+ === Using a block parameter with the <tt>bounding_box</tt> method
91
+
92
+ An explicit block parameter may be used with the <tt>bounding_box</tt> method:
93
+
94
+ textangle = reader.bounding_box do |r|
95
+ r.below /electricity/i
96
+ r.left_of "Total ($)"
97
+ end
98
+ textangle.text
99
+ => [['string','string'],['string']] # array of rows, each row is an array of text elements in the row
100
+
101
+ === Extract text for a region with known positional co-ordinates
102
+
103
+ If you know (or can calculate) the x,y positions of the required text region, you can extract the region's
104
+ text using the <tt>text_in_region</tt> method.
105
+
106
+ text = reader.text_in_region(
107
+ 10, # minimum x (left-most) (inclusive)
108
+ 900, # maximum x (right-most) (inclusive)
109
+ 200, # minimum y (bottom-most) (inclusive)
110
+ 400, # maximum y (top-most) (inclusive)
111
+ 1 # page
40
112
  )
113
+ => [['string','string'],['string']] # array of rows, each row is an array of text elements in the row
114
+
115
+ Note that the x,y origin is at the bottom-left of the page.
116
+
117
+ === How to find the x,y co-ordinate of a specific text element
118
+
119
+ Problem: if you are doing low-level text extraction with <tt>text_in_region</tt> for example,
120
+ it is usually necessary to locate specific text to provide a positional reference.
121
+
122
+ Solution: use the <tt>text_position</tt> method to locate text by exact or partial match.
123
+ It returns a Hash of x/y co-ordinates that is the bottom-left corner of the text.
124
+
125
+ page = 1
126
+ text_by_exact_match = reader.text_position("Transaction Table", page)
127
+ => { :x => 10.0, :y => 600.0 }
128
+ text_by_regex_match = reader.text_position(/transaction summary/i, page)
129
+ => { :x => 10.0, :y => 300.0 }
130
+
131
+ Note: in the case of multitple matches, only the first match is returned.
41
132
 
42
133
 
43
134
  == Contributing to PDF::Reader::Turtletext
@@ -16,6 +16,8 @@ class PDF::Reader::Turtletext
16
16
  attr_reader :options
17
17
 
18
18
  # +source+ is a file name or stream-like object
19
+ # Supported +options+ include:
20
+ # * :y_precision
19
21
  def initialize(source, options={})
20
22
  @options = options
21
23
  @reader = PDF::Reader.new(source)
@@ -31,7 +33,7 @@ class PDF::Reader::Turtletext
31
33
  end
32
34
 
33
35
  # Returns positional (with fuzzed y positioning) text content collection as a hash:
34
- # { y_position: { x_position: content}}
36
+ # [ fuzzed_y_position, [[x_position,content]] ]
35
37
  def content(page=1)
36
38
  @content ||= []
37
39
  if @content[page]
@@ -41,18 +43,24 @@ class PDF::Reader::Turtletext
41
43
  end
42
44
  end
43
45
 
44
- # Returns a hash with fuzzed positioning:
45
- # { fuzzed_y_position: { x_position: content}}
46
+ # Returns an Array with fuzzed positioning, ordered by decreasing y position. Row content order by x position.
47
+ # [ fuzzed_y_position, [[x_position,content]] ]
46
48
  # Given +input+ as a hash:
47
49
  # { y_position: { x_position: content}}
48
50
  # Fuzz factors: +y_precision+
49
51
  def fuzzed_y(input)
50
- output = {}
51
- input.keys.sort.each do |precise_y|
52
- # matching_y = (precise_y / 5.0).truncate * 5.0
53
- matching_y = output.keys.select{|new_y| (new_y - precise_y).abs < y_precision }.first || precise_y
54
- output[matching_y] ||= {}
55
- output[matching_y].merge!(input[precise_y])
52
+ output = []
53
+ input.keys.sort.reverse.each do |precise_y|
54
+ matching_y = output.map(&:first).select{|new_y| (new_y - precise_y).abs < y_precision }.first || precise_y
55
+ y_index = output.index{|y| y.first == matching_y }
56
+ new_row_content = input[precise_y].to_a
57
+ if y_index
58
+ row_content = output[y_index].last
59
+ row_content += new_row_content
60
+ output[y_index] = [matching_y,row_content]
61
+ else
62
+ output << [matching_y,new_row_content]
63
+ end
56
64
  end
57
65
  output
58
66
  end
@@ -69,21 +77,24 @@ class PDF::Reader::Turtletext
69
77
  end
70
78
 
71
79
  # Returns an array of text elements found within the x,y limits,
80
+ # x ranges from +xmin+ (left of page) to +xmax+ (right of page)
81
+ # y ranges from +ymin+ (bottom of page) to +ymax+ (top of page)
72
82
  # Each line of text found is returned as an array element.
73
83
  # Each line of text is an array of the seperate text elements found on that line.
74
84
  # [["first line first text", "first line last text"],["second line text"]]
75
85
  def text_in_region(xmin,xmax,ymin,ymax,page=1)
76
86
  text_map = content(page)
77
87
  box = []
78
- text_map.keys.sort.reverse.each do |y|
88
+
89
+ text_map.each do |y,text_row|
79
90
  if y >= ymin && y<= ymax
80
91
  row = []
81
- text_map[y].keys.sort.each do |x|
92
+ text_row.each do |x,element|
82
93
  if x >= xmin && x<= xmax
83
- row << text_map[y][x]
94
+ row << [x,element]
84
95
  end
85
96
  end
86
- box << row unless row.empty?
97
+ box << row.sort{|a,b| a.first <=> b.first }.map(&:last) unless row.empty?
87
98
  end
88
99
  end
89
100
  box
@@ -94,7 +105,11 @@ class PDF::Reader::Turtletext
94
105
  # +text+ may be a string (exact match required) or a Regexp
95
106
  def text_position(text,page=1)
96
107
  item = if text.class <= Regexp
97
- content(page).map {|k,v| if x = v.reduce(nil){|memo,vv| memo = (vv[1] =~ text) ? vv[0] : memo } ; [k,x] ; end }
108
+ content(page).map do |k,v|
109
+ if x = v.reduce(nil){|memo,vv| memo = (vv[1] =~ text) ? vv[0] : memo }
110
+ [k,x]
111
+ end
112
+ end
98
113
  else
99
114
  content(page).map {|k,v| if x = v.rassoc(text) ; [k,x] ; end }
100
115
  end
@@ -104,17 +119,30 @@ class PDF::Reader::Turtletext
104
119
  end
105
120
  end
106
121
 
107
- # WIP - not using Textangle yet for text extraction.
108
- # Ideal usage is something like this:
122
+ # Returns a text region definition using a descriptive block.
123
+ #
124
+ # Usage:
125
+ #
126
+ # textangle = reader.bounding_box do
127
+ # page 1
128
+ # below /electricity/i
129
+ # above 10
130
+ # right_of 240.0
131
+ # left_of "Total ($)"
132
+ # end
133
+ # textangle.text
134
+ #
135
+ # Alternatively, an explicit block parameter may be used:
109
136
  #
110
- # textangle = reader.bounding_box do
111
- # page 1
112
- # below "Electricity Services"
113
- # above "Gas Services by City Gas Pte Ltd"
114
- # right_of 240.0
115
- # left_of "Total ($)"
116
- # end
117
- # textangle.text
137
+ # textangle = reader.bounding_box do |r|
138
+ # r.page 1
139
+ # r.below /electricity/i
140
+ # r.above 10
141
+ # r.right_of 240.0
142
+ # r.left_of "Total ($)"
143
+ # end
144
+ # textangle.text
145
+ # => [['string','string'],['string']] # array of rows, each row is an array of column text element
118
146
  #
119
147
  def bounding_box(&block)
120
148
  PDF::Reader::Turtletext::Textangle.new(self,&block)
@@ -1,27 +1,103 @@
1
1
  # A DSL syntax for text extraction.
2
- # WIP - not using this yet
3
2
  #
4
- # textangle = PDF::Reader::Turtletext::Textangle.new(reader) do
5
- # page 1
6
- # below "Electricity Services"
7
- # above "Gas Services by City Gas Pte Ltd"
8
- # right_of 240.0
9
- # left_of "Total ($)"
3
+ # textangle = PDF::Reader::Turtletext::Textangle.new(reader) do |r|
4
+ # r.page = 1
5
+ # r.below = "Electricity Services"
6
+ # r.above = "Gas Services by City Gas Pte Ltd"
7
+ # r.right_of = 240.0
8
+ # r.left_of = "Total ($)"
10
9
  # end
11
10
  # textangle.text
12
11
  #
13
12
  class PDF::Reader::Turtletext::Textangle
14
13
  attr_reader :reader
15
- attr_writer :page,:above,:below,:left_of,:right_of
14
+ attr_accessor :page
15
+ attr_writer :above,:below,:left_of,:right_of
16
16
 
17
- # +structured_reader+ is a PDF::StructuredReader
18
- def initialize(structured_reader,&block)
19
- @reader = structured_reader
20
- instance_eval( &block ) if block
17
+ # +turtletext_reader+ is a PDF::Reader::Turtletext
18
+ def initialize(turtletext_reader,&block)
19
+ @reader = turtletext_reader
20
+ @page = 1
21
+ if block_given?
22
+ if block.arity == 1
23
+ yield self
24
+ else
25
+ instance_eval &block
26
+ end
27
+ end
21
28
  end
22
29
 
30
+ def above(*args)
31
+ if value = args.first
32
+ @above = value
33
+ end
34
+ @above
35
+ end
36
+
37
+ def below(*args)
38
+ if value = args.first
39
+ @below = value
40
+ end
41
+ @below
42
+ end
43
+
44
+ def left_of(*args)
45
+ if value = args.first
46
+ @left_of = value
47
+ end
48
+ @left_of
49
+ end
50
+
51
+ def right_of(*args)
52
+ if value = args.first
53
+ @right_of = value
54
+ end
55
+ @right_of
56
+ end
57
+
58
+ # Returns the text
23
59
  def text
24
- # TODO
60
+ return unless reader
61
+
62
+ xmin = if right_of
63
+ if [Fixnum,Float].include?(right_of.class)
64
+ right_of
65
+ else
66
+ reader.text_position(right_of,page)[:x] + 1
67
+ end
68
+ else
69
+ 0
70
+ end
71
+ xmax = if left_of
72
+ if [Fixnum,Float].include?(left_of.class)
73
+ left_of
74
+ else
75
+ reader.text_position(left_of,page)[:x] - 1
76
+ end
77
+ else
78
+ 99999 # TODO actual limit
79
+ end
80
+
81
+ ymin = if above
82
+ if [Fixnum,Float].include?(above.class)
83
+ above
84
+ else
85
+ reader.text_position(above,page)[:y] + 1
86
+ end
87
+ else
88
+ 0
89
+ end
90
+ ymax = if below
91
+ if [Fixnum,Float].include?(below.class)
92
+ below
93
+ else
94
+ reader.text_position(below,page)[:y] - 1
95
+ end
96
+ else
97
+ 99999 # TODO actual limit
98
+ end
99
+
100
+ reader.text_in_region(xmin,xmax,ymin,ymax,page)
25
101
  end
26
102
 
27
103
  end
@@ -3,7 +3,7 @@ module PDF
3
3
  class Turtletext
4
4
  class Version
5
5
  MAJOR = 0
6
- MINOR = 1
6
+ MINOR = 2
7
7
  PATCH = 0
8
8
 
9
9
  STRING = [MAJOR, MINOR, PATCH].compact.join('.')
@@ -5,11 +5,11 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = "pdf-reader-turtletext"
8
- s.version = "0.1.0"
8
+ s.version = "0.2.0"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Paul Gallagher"]
12
- s.date = "2012-07-22"
12
+ s.date = "2012-07-31"
13
13
  s.description = "a library that can read structured and positional text from PDFs. Ideal for asembling structured data from invoices and the like."
14
14
  s.email = "gallagher.paul@gmail.com"
15
15
  s.extra_rdoc_files = [
@@ -20,6 +20,7 @@ Gem::Specification.new do |s|
20
20
  ".rspec",
21
21
  ".rvmrc",
22
22
  ".travis.yml",
23
+ "CHANGELOG",
23
24
  "Gemfile",
24
25
  "Gemfile.lock",
25
26
  "Guardfile",
@@ -34,8 +35,10 @@ Gem::Specification.new do |s|
34
35
  "lib/pdf/reader/turtletext/version.rb",
35
36
  "pdf-reader-turtletext.gemspec",
36
37
  "spec/fixtures/pdf_samples/.gitkeep",
38
+ "spec/fixtures/pdf_samples/expectations.yml",
37
39
  "spec/fixtures/pdf_samples/hello_world.pdf",
38
40
  "spec/fixtures/pdf_samples/junk_prefix.pdf",
41
+ "spec/fixtures/pdf_samples/simple_table_text.pdf",
39
42
  "spec/integration/pdf_samples_spec.rb",
40
43
  "spec/spec_helper.rb",
41
44
  "spec/support/pdf_samples_helper.rb",
@@ -0,0 +1,95 @@
1
+ # this file defines the test expectations for PDF samples in spec/fixtures/pdf_samples.
2
+ #
3
+ # This is a YAML-format file, so beware that indentation is significant
4
+ ---
5
+ hello_world.pdf:
6
+ :test_above:
7
+ :above: 100
8
+ :expected_text:
9
+ -
10
+ - "Hello World"
11
+ :test_below:
12
+ :below: 900
13
+ :expected_text:
14
+ -
15
+ - "Hello World"
16
+ :test_below_na:
17
+ :below: 10
18
+ :expected_text: []
19
+ simple_table_text.pdf:
20
+ :test_above:
21
+ :above: Table Header
22
+ :expected_text:
23
+ -
24
+ - "Simple Table Text"
25
+ :test_below:
26
+ :below: row 2
27
+ :expected_text:
28
+ -
29
+ - "Table Footer"
30
+ :test_right_of:
31
+ :right_of: row 1
32
+ :expected_text:
33
+ -
34
+ - "val 1"
35
+ - "val 2"
36
+ - "val 3"
37
+ -
38
+ - "val 1"
39
+ - "val 2"
40
+ - "val 3"
41
+ :test_left_of:
42
+ :left_of: val 1
43
+ :expected_text:
44
+ -
45
+ - "Simple Table Text"
46
+ -
47
+ - "Table Header"
48
+ -
49
+ - "row 1"
50
+ -
51
+ - "row 2"
52
+ -
53
+ - "Table Footer"
54
+ :test_above_and_below:
55
+ :below: Table Header
56
+ :above: Table Footer
57
+ :expected_text:
58
+ -
59
+ - "row 1"
60
+ - "val 1"
61
+ - "val 2"
62
+ - "val 3"
63
+ -
64
+ - "row 2"
65
+ - "val 1"
66
+ - "val 2"
67
+ - "val 3"
68
+ :test_above_and_below_and_left_of:
69
+ :below: Table Header
70
+ :above: Table Footer
71
+ :left_of: val 2
72
+ :expected_text:
73
+ -
74
+ - "row 1"
75
+ - "val 1"
76
+ -
77
+ - "row 2"
78
+ - "val 1"
79
+ :test_above_and_below_and_left_of_and_right_of:
80
+ :below: Table Header
81
+ :above: Table Footer
82
+ :left_of: val 2
83
+ :right_of: row 1
84
+ :expected_text:
85
+ -
86
+ - "val 1"
87
+ -
88
+ - "val 1"
89
+
90
+
91
+
92
+
93
+
94
+
95
+
@@ -0,0 +1,139 @@
1
+ %PDF-1.3
2
+ %����
3
+ 1 0 obj
4
+ << /Creator <feff0050007200610077006e>
5
+ /Producer <feff0050007200610077006e>
6
+ >>
7
+ endobj
8
+ 2 0 obj
9
+ << /Type /Catalog
10
+ /Pages 3 0 R
11
+ >>
12
+ endobj
13
+ 3 0 obj
14
+ << /Type /Pages
15
+ /Count 1
16
+ /Kids [5 0 R]
17
+ >>
18
+ endobj
19
+ 4 0 obj
20
+ << /Length 795
21
+ >>
22
+ stream
23
+ q
24
+
25
+ BT
26
+ 36 747.384 Td
27
+ /F1.0 12 Tf
28
+ [<53696d706c652054> 120 <6162> 20 <6c652054> 120 <65> 30 <7874>] TJ
29
+ ET
30
+
31
+
32
+ BT
33
+ 46 327.384 Td
34
+ /F1.0 12 Tf
35
+ [<54> 120 <6162> 20 <6c6520486561646572>] TJ
36
+ ET
37
+
38
+
39
+ BT
40
+ 46 277.384 Td
41
+ /F1.0 12 Tf
42
+ [<726f> 15 <772031>] TJ
43
+ ET
44
+
45
+
46
+ BT
47
+ 136 277.384 Td
48
+ /F1.0 12 Tf
49
+ [<76> 25 <616c2031>] TJ
50
+ ET
51
+
52
+
53
+ BT
54
+ 186 277.384 Td
55
+ /F1.0 12 Tf
56
+ [<76> 25 <616c2032>] TJ
57
+ ET
58
+
59
+
60
+ BT
61
+ 236 277.384 Td
62
+ /F1.0 12 Tf
63
+ [<76> 25 <616c2033>] TJ
64
+ ET
65
+
66
+
67
+ BT
68
+ 46 227.38400000000001 Td
69
+ /F1.0 12 Tf
70
+ [<726f> 15 <772032>] TJ
71
+ ET
72
+
73
+
74
+ BT
75
+ 136 227.38400000000001 Td
76
+ /F1.0 12 Tf
77
+ [<76> 25 <616c2031>] TJ
78
+ ET
79
+
80
+
81
+ BT
82
+ 186 227.38400000000001 Td
83
+ /F1.0 12 Tf
84
+ [<76> 25 <616c2032>] TJ
85
+ ET
86
+
87
+
88
+ BT
89
+ 236 227.38400000000001 Td
90
+ /F1.0 12 Tf
91
+ [<76> 25 <616c2033>] TJ
92
+ ET
93
+
94
+
95
+ BT
96
+ 46 177.38400000000001 Td
97
+ /F1.0 12 Tf
98
+ [<54> 120 <6162> 20 <6c652046> 30 <6f6f746572>] TJ
99
+ ET
100
+
101
+ Q
102
+
103
+ endstream
104
+ endobj
105
+ 5 0 obj
106
+ << /Type /Page
107
+ /Parent 3 0 R
108
+ /MediaBox [0 0 612.0 792.0]
109
+ /Contents 4 0 R
110
+ /Resources << /ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
111
+ /Font << /F1.0 6 0 R
112
+ >>
113
+ >>
114
+ >>
115
+ endobj
116
+ 6 0 obj
117
+ << /Type /Font
118
+ /Subtype /Type1
119
+ /BaseFont /Helvetica
120
+ /Encoding /WinAnsiEncoding
121
+ >>
122
+ endobj
123
+ xref
124
+ 0 7
125
+ 0000000000 65535 f
126
+ 0000000015 00000 n
127
+ 0000000109 00000 n
128
+ 0000000158 00000 n
129
+ 0000000215 00000 n
130
+ 0000001061 00000 n
131
+ 0000001239 00000 n
132
+ trailer
133
+ << /Size 7
134
+ /Root 2 0 R
135
+ /Info 1 0 R
136
+ >>
137
+ startxref
138
+ 1336
139
+ %%EOF
@@ -3,5 +3,33 @@ include PdfSamplesHelper
3
3
 
4
4
  describe "PDF Samples" do
5
5
 
6
+ # This will scan all *.pdf files in spec/fixtures/personal_pdf_samples
7
+ # and do basic verification of the file structure without any effort from you.
8
+ pdf_sample_expectations.each do |sample_name,test_specifications|
9
+ describe "sample" do
10
+ let(:options) { test_specifications[:options] || {} }
11
+ let(:sample_file) { pdf_sample(sample_name) }
12
+ let(:turtletext_reader) { PDF::Reader::Turtletext.new(sample_file,options) }
13
+
14
+ (test_specifications||{}).each do |test_name,expectations|
15
+ context test_name do
16
+ let(:bounding_box) {
17
+ turtletext_reader.bounding_box do
18
+ above expectations[:above]
19
+ below expectations[:below]
20
+ left_of expectations[:left_of]
21
+ right_of expectations[:right_of]
22
+ end
23
+ }
24
+ # it {
25
+ # puts "bounding_box"
26
+ # puts bounding_box.inspect
27
+ # }
28
+ subject { bounding_box.text }
29
+ it { should eql(expectations[:expected_text])}
30
+ end
31
+ end
32
+ end
33
+ end
6
34
 
7
35
  end
@@ -31,6 +31,7 @@ module PdfSamplesHelper
31
31
  require 'prawn'
32
32
  puts "Making PDF samples for tests.."
33
33
  make_sample_hello_world
34
+ make_sample_simple_table_text
34
35
  end
35
36
 
36
37
  def make_sample_hello_world
@@ -40,4 +41,26 @@ module PdfSamplesHelper
40
41
  end
41
42
  puts "Created: #{filename}"
42
43
  end
44
+
45
+ def make_sample_simple_table_text
46
+ filename = pdf_sample('simple_table_text.pdf')
47
+ Prawn::Document.generate filename do
48
+ text "Simple Table Text"
49
+ text_box "Table Header", :at => [10, 300], :width => 200
50
+
51
+ text_box "row 1", :at => [10, 250], :width => 90
52
+ text_box "val 1", :at => [100, 250], :width => 50
53
+ text_box "val 2", :at => [150, 250], :width => 50
54
+ text_box "val 3", :at => [200, 250], :width => 50
55
+
56
+ text_box "row 2", :at => [10, 200], :width => 90
57
+ text_box "val 1", :at => [100, 200], :width => 50
58
+ text_box "val 2", :at => [150, 200], :width => 50
59
+ text_box "val 3", :at => [200, 200], :width => 50
60
+
61
+ text_box "Table Footer", :at => [10, 150], :width => 200
62
+ end
63
+ puts "Created: #{filename}"
64
+ end
65
+
43
66
  end
@@ -3,4 +3,197 @@ require 'spec_helper'
3
3
  describe PDF::Reader::Turtletext::Textangle do
4
4
  let(:resource_class) { PDF::Reader::Turtletext::Textangle }
5
5
 
6
+ let(:source) { nil } # we're just going to mock the PDF source here
7
+ let(:options) { {} }
8
+ let(:turtletext_reader) { PDF::Reader::Turtletext.new(source,options) }
9
+
10
+
11
+ describe "#reader" do
12
+ let(:textangle) { resource_class.new(turtletext_reader) }
13
+ subject { textangle.reader }
14
+ it { should be_a(PDF::Reader::Turtletext) }
15
+ end
16
+
17
+ describe "#text" do
18
+ let(:page) { 1 }
19
+ before do
20
+ turtletext_reader.stub(:load_content).and_return(given_page_content)
21
+ end
22
+ let(:given_page_content) { {
23
+ 70.0=>{10.0=>"crunchy bacon"},
24
+ 40.0=>{15.0=>"bacon on kimchi noodles", 25.0=>"heaven"},
25
+ 30.0=>{30.0=>"turkey bacon", 35.0=>"fraud"},
26
+ 10.0=>{40.0=>"smoked and streaky for me"}
27
+ } }
28
+
29
+ context "with block param" do
30
+ [:above,:below,:left_of,:right_of].each do |positional_method|
31
+ context "with #{positional_method}" do
32
+ let(:term) { "canary" }
33
+
34
+ it "should work with block param" do
35
+ textangle = resource_class.new(turtletext_reader) do |r|
36
+ r.send("#{positional_method}=",term)
37
+ end
38
+ textangle.send(positional_method).should eql(term)
39
+ end
40
+
41
+ end
42
+ end
43
+ end
44
+
45
+ context "without block param" do
46
+ it "#above should work" do
47
+ textangle = resource_class.new(turtletext_reader) do
48
+ above "canary"
49
+ end
50
+ textangle.above.should eql("canary")
51
+ end
52
+ it "#below should work" do
53
+ textangle = resource_class.new(turtletext_reader) do
54
+ below "canary"
55
+ end
56
+ textangle.below.should eql("canary")
57
+ end
58
+ it "#left_of should work" do
59
+ textangle = resource_class.new(turtletext_reader) do
60
+ left_of "canary"
61
+ end
62
+ textangle.left_of.should eql("canary")
63
+ end
64
+ it "#below should work" do
65
+ textangle = resource_class.new(turtletext_reader) do
66
+ right_of "canary"
67
+ end
68
+ textangle.right_of.should eql("canary")
69
+ end
70
+ end
71
+
72
+ context "when only below specified" do
73
+ context "as a string" do
74
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
75
+ r.below = "fraud"
76
+ end }
77
+ let(:expected) { [["smoked and streaky for me"]]}
78
+ subject { textangle.text }
79
+ it { should eql(expected) }
80
+ end
81
+ context "as a regex" do
82
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
83
+ r.below = /Fraud/i
84
+ end }
85
+ let(:expected) { [["smoked and streaky for me"]]}
86
+ subject { textangle.text }
87
+ it { should eql(expected) }
88
+ end
89
+ context "as a number" do
90
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
91
+ r.below = 20
92
+ end }
93
+ let(:expected) { [["smoked and streaky for me"]]}
94
+ subject { textangle.text }
95
+ it { should eql(expected) }
96
+ end
97
+ end
98
+
99
+ context "when only above specified" do
100
+ context "as a string" do
101
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
102
+ r.above = "heaven"
103
+ end }
104
+ let(:expected) { [["crunchy bacon"]]}
105
+ subject { textangle.text }
106
+ it { should eql(expected) }
107
+ end
108
+ context "as a regex" do
109
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
110
+ r.above = /heaVen/i
111
+ end }
112
+ let(:expected) { [["crunchy bacon"]]}
113
+ subject { textangle.text }
114
+ it { should eql(expected) }
115
+ end
116
+ context "as a number" do
117
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
118
+ r.above = 41
119
+ end }
120
+ let(:expected) { [["crunchy bacon"]]}
121
+ subject { textangle.text }
122
+ it { should eql(expected) }
123
+ end
124
+ end
125
+
126
+ context "when only left_of specified" do
127
+ context "as a string" do
128
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
129
+ r.left_of = "turkey bacon"
130
+ end }
131
+ let(:expected) { [
132
+ ["crunchy bacon"],
133
+ ["bacon on kimchi noodles", "heaven"]
134
+ ] }
135
+ subject { textangle.text }
136
+ it { should eql(expected) }
137
+ end
138
+ context "as a regex" do
139
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
140
+ r.left_of = /turKey/i
141
+ end }
142
+ let(:expected) { [
143
+ ["crunchy bacon"],
144
+ ["bacon on kimchi noodles", "heaven"]
145
+ ] }
146
+ subject { textangle.text }
147
+ it { should eql(expected) }
148
+ end
149
+ context "as a number" do
150
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
151
+ r.left_of = 29
152
+ end }
153
+ let(:expected) { [
154
+ ["crunchy bacon"],
155
+ ["bacon on kimchi noodles", "heaven"]
156
+ ] }
157
+ subject { textangle.text }
158
+ it { should eql(expected) }
159
+ end
160
+ end
161
+
162
+ context "when only right_of specified" do
163
+ context "as a string" do
164
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
165
+ r.right_of = "heaven"
166
+ end }
167
+ let(:expected) { [
168
+ ["turkey bacon","fraud"],
169
+ ["smoked and streaky for me"]
170
+ ] }
171
+ subject { textangle.text }
172
+ it { should eql(expected) }
173
+ end
174
+ context "as a regex" do
175
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
176
+ r.right_of = /Heaven/i
177
+ end }
178
+ let(:expected) { [
179
+ ["turkey bacon","fraud"],
180
+ ["smoked and streaky for me"]
181
+ ] }
182
+ subject { textangle.text }
183
+ it { should eql(expected) }
184
+ end
185
+ context "as a number" do
186
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
187
+ r.right_of = 26
188
+ end }
189
+ let(:expected) { [
190
+ ["turkey bacon","fraud"],
191
+ ["smoked and streaky for me"]
192
+ ] }
193
+ subject { textangle.text }
194
+ it { should eql(expected) }
195
+ end
196
+ end
197
+
198
+ end
6
199
  end
@@ -4,16 +4,16 @@ describe PDF::Reader::Turtletext do
4
4
  let(:resource_class) { PDF::Reader::Turtletext }
5
5
 
6
6
  let(:source) { nil } # we're just going to mock the PDF source here
7
- let(:structured_reader) { resource_class.new(source,options) }
7
+ let(:turtletext_reader) { resource_class.new(source,options) }
8
8
  let(:options) { {} }
9
9
 
10
10
  describe "#reader" do
11
- subject { structured_reader.reader}
11
+ subject { turtletext_reader.reader}
12
12
  it { should be_a(PDF::Reader) }
13
13
  end
14
14
 
15
15
  describe "#y_precision" do
16
- subject { structured_reader.y_precision}
16
+ subject { turtletext_reader.y_precision}
17
17
  context "default" do
18
18
  it { should eql(3) }
19
19
  end
@@ -27,35 +27,40 @@ describe PDF::Reader::Turtletext do
27
27
  context "with mocked source content" do
28
28
  let(:page) { 1 }
29
29
  before do
30
- structured_reader.should_receive(:load_content).with(page).and_return(given_page_content)
30
+ turtletext_reader.should_receive(:load_content).with(page).and_return(given_page_content)
31
31
  end
32
32
 
33
33
  {
34
34
  :with_simple_text => {
35
35
  :source_page_content => {10.0=>{10.0=>"a first bit of text"}},
36
36
  :expected_precise_content => {10.0=>{10.0=>"a first bit of text"}},
37
- :expected_fuzzed_content => {10.0=>{10.0=>"a first bit of text"}}
37
+ :expected_fuzzed_content => [[10.0,[[10.0,"a first bit of text"]]]]
38
38
  },
39
39
  :with_widely_separated_text => {
40
- :source_page_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
41
- :expected_precise_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
42
- :expected_fuzzed_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}}
43
- },
44
- :with_unsorted_y_text => {
45
40
  :source_page_content => {20.0=>{10.0=>"a first bit of text"},10.0=>{20.0=>"a second bit of text"}},
46
41
  :expected_precise_content => {20.0=>{10.0=>"a first bit of text"},10.0=>{20.0=>"a second bit of text"}},
47
- :expected_fuzzed_content => {10.0=>{20.0=>"a second bit of text"},20.0=>{10.0=>"a first bit of text"}}
42
+ :expected_fuzzed_content => [[20.0, [[10.0, "a first bit of text"]]], [10.0, [[20.0, "a second bit of text"]]]]
43
+ },
44
+ :with_unsorted_y_text => {
45
+ :source_page_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
46
+ :expected_precise_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
47
+ :expected_fuzzed_content => [[20.0, [[20.0, "a second bit of text"]]], [10.0, [[10.0, "a first bit of text"]]]]
48
48
  },
49
49
  :with_fuzzed_y_text => {
50
- :source_page_content => {10.0=>{10.0=>"a first bit of text"},12.0=>{12.0=>"a second bit of text"}},
51
- :expected_precise_content => {10.0=>{10.0=>"a first bit of text"},12.0=>{12.0=>"a second bit of text"}},
52
- :expected_fuzzed_content => {10.0=>{10.0=>"a first bit of text",12.0=>"a second bit of text"}}
50
+ :source_page_content => {20.0=>{10.0=>"a first bit of text"},18.0=>{12.0=>"a second bit of text"}},
51
+ :expected_precise_content => {20.0=>{10.0=>"a first bit of text"},18.0=>{12.0=>"a second bit of text"}},
52
+ :expected_fuzzed_content => [[20.0, [[10.0, "a first bit of text"], [12.0, "a second bit of text"]]]]
53
53
  },
54
54
  :with_widely_separated_fuzzed_y_text => {
55
55
  :y_precision => 25,
56
- :source_page_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
57
- :expected_precise_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
58
- :expected_fuzzed_content => {10.0=>{10.0=>"a first bit of text",20.0=>"a second bit of text"}}
56
+ :source_page_content => {20.0=>{10.0=>"a first bit of text"},10.0=>{20.0=>"a second bit of text"}},
57
+ :expected_precise_content => {20.0=>{10.0=>"a first bit of text"},10.0=>{20.0=>"a second bit of text"}},
58
+ :expected_fuzzed_content => [[20.0, [[10.0, "a first bit of text"], [20.0, "a second bit of text"]]]]
59
+ },
60
+ :with_multiple_row_text => {
61
+ :source_page_content => {10.0=>{10.0=>"first"},8.0=>{20.0=>"second",30.0=>"third"}},
62
+ :expected_precise_content => {10.0=>{10.0=>"first"},8.0=>{20.0=>"second",30.0=>"third"}},
63
+ :expected_fuzzed_content => [[10.0, [[10.0, "first"], [20.0, "second"], [30.0, "third"]]]]
59
64
  }
60
65
  }.each do |test_name,test_expectations|
61
66
  context test_name do
@@ -69,12 +74,12 @@ describe PDF::Reader::Turtletext do
69
74
  }
70
75
 
71
76
  describe "#content" do
72
- subject { structured_reader.content(page) }
77
+ subject { turtletext_reader.content(page) }
73
78
  it { should eql(test_expectations[:expected_fuzzed_content]) }
74
79
  end
75
80
 
76
81
  describe "#precise_content" do
77
- subject { structured_reader.precise_content(page) }
82
+ subject { turtletext_reader.precise_content(page) }
78
83
  it { should eql(test_expectations[:expected_precise_content]) }
79
84
  end
80
85
 
@@ -90,24 +95,24 @@ describe PDF::Reader::Turtletext do
90
95
  },
91
96
  :with_single_line_text => {
92
97
  :source_page_content => {
93
- 10.0=>{10.0=>"first line ignored"},
98
+ 70.0=>{10.0=>"first line ignored"},
94
99
  30.0=>{10.0=>"first part found", 20.0=>"last part found"},
95
- 70.0=>{10.0=>"last line ignored"}
100
+ 10.0=>{10.0=>"last line ignored"}
96
101
  },
97
102
  :xmin => 0, :xmax => 100, :ymin => 20, :ymax => 50,
98
103
  :expected_text => [["first part found", "last part found"]]
99
104
  },
100
105
  :with_multi_line_text => {
101
106
  :source_page_content => {
102
- 10.0=>{10.0=>"first line ignored"},
103
- 30.0=>{10.0=>"first line first part found", 20.0=>"first line last part found"},
104
- 40.0=>{10.0=>"last line first part found", 20.0=>"last line last part found"},
105
- 70.0=>{10.0=>"last line ignored"}
107
+ 70.0=>{10.0=>"first line ignored"},
108
+ 40.0=>{10.0=>"first line first part found", 20.0=>"first line last part found"},
109
+ 30.0=>{10.0=>"last line first part found", 20.0=>"last line last part found"},
110
+ 10.0=>{10.0=>"last line ignored"}
106
111
  },
107
112
  :xmin => 0, :xmax => 100, :ymin => 20, :ymax => 50,
108
113
  :expected_text => [
109
- ["last line first part found", "last line last part found"],
110
- ["first line first part found", "first line last part found"]
114
+ ["first line first part found", "first line last part found"],
115
+ ["last line first part found", "last line last part found"]
111
116
  ]
112
117
  }
113
118
  }.each do |test_name,test_expectations|
@@ -118,7 +123,7 @@ describe PDF::Reader::Turtletext do
118
123
  let(:ymin) { test_expectations[:ymin] }
119
124
  let(:ymax) { test_expectations[:ymax] }
120
125
  let(:expected_text) { test_expectations[:expected_text] }
121
- subject { structured_reader.text_in_region(xmin,xmax,ymin,ymax,page) }
126
+ subject { turtletext_reader.text_in_region(xmin,xmax,ymin,ymax,page) }
122
127
  it { should eql(expected_text) }
123
128
  end
124
129
  end
@@ -126,21 +131,21 @@ describe PDF::Reader::Turtletext do
126
131
 
127
132
  describe "#text_position" do
128
133
  let(:given_page_content) { {
129
- 10.0=>{10.0=>"crunchy bacon"},
130
- 30.0=>{15.0=>"bacon on kimchi noodles", 25.0=>"heaven"},
131
- 40.0=>{30.0=>"turkey bacon", 35.0=>"fraud"},
132
- 70.0=>{40.0=>"smoked and streaky da bomb"}
134
+ 70.0=>{10.0=>"crunchy bacon"},
135
+ 40.0=>{15.0=>"bacon on kimchi noodles", 25.0=>"heaven"},
136
+ 30.0=>{30.0=>"turkey bacon", 35.0=>"fraud"},
137
+ 10.0=>{40.0=>"smoked and streaky da bomb"}
133
138
  } }
134
139
  {
135
- :with_simple_match => { :match_term => 'turkey bacon', :expected_position => {:x=>30.0, :y=>40.0} },
136
- :with_match_along_line => { :match_term => 'heaven', :expected_position => {:x=>25.0, :y=>30.0} },
137
- :with_regex_match => { :match_term => /kimchi/, :expected_position => {:x=>15.0, :y=>30.0} },
138
- :with_regex_multi_matches_first => { :match_term => /turkey|crunchy/, :expected_position => {:x=>10.0, :y=>10.0} }
140
+ :with_simple_match => { :match_term => 'turkey bacon', :expected_position => {:x=>30.0, :y=>30.0} },
141
+ :with_match_along_line => { :match_term => 'heaven', :expected_position => {:x=>25.0, :y=>40.0} },
142
+ :with_regex_match => { :match_term => /kimchi/, :expected_position => {:x=>15.0, :y=>40.0} },
143
+ :with_regex_multi_matches_first => { :match_term => /turkey|crunchy/, :expected_position => {:x=>10.0, :y=>70.0} }
139
144
  }.each do |test_name,test_expectations|
140
145
  context test_name do
141
146
  let(:match_term) { test_expectations[:match_term] }
142
147
  let(:expected_position) { test_expectations[:expected_position] }
143
- subject { structured_reader.text_position(match_term,page) }
148
+ subject { turtletext_reader.text_position(match_term,page) }
144
149
  it { should eql(expected_position) }
145
150
  end
146
151
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pdf-reader-turtletext
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-07-22 00:00:00.000000000 Z
12
+ date: 2012-07-31 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: pdf-reader
16
- requirement: &70193556628420 !ruby/object:Gem::Requirement
16
+ requirement: &70218189955060 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - =
@@ -21,10 +21,10 @@ dependencies:
21
21
  version: 1.1.1
22
22
  type: :runtime
23
23
  prerelease: false
24
- version_requirements: *70193556628420
24
+ version_requirements: *70218189955060
25
25
  - !ruby/object:Gem::Dependency
26
26
  name: bundler
27
- requirement: &70193556627700 !ruby/object:Gem::Requirement
27
+ requirement: &70218189954360 !ruby/object:Gem::Requirement
28
28
  none: false
29
29
  requirements:
30
30
  - - ~>
@@ -32,10 +32,10 @@ dependencies:
32
32
  version: 1.1.4
33
33
  type: :development
34
34
  prerelease: false
35
- version_requirements: *70193556627700
35
+ version_requirements: *70218189954360
36
36
  - !ruby/object:Gem::Dependency
37
37
  name: jeweler
38
- requirement: &70193556626800 !ruby/object:Gem::Requirement
38
+ requirement: &70218189953580 !ruby/object:Gem::Requirement
39
39
  none: false
40
40
  requirements:
41
41
  - - ~>
@@ -43,10 +43,10 @@ dependencies:
43
43
  version: 1.6.4
44
44
  type: :development
45
45
  prerelease: false
46
- version_requirements: *70193556626800
46
+ version_requirements: *70218189953580
47
47
  - !ruby/object:Gem::Dependency
48
48
  name: rake
49
- requirement: &70193556626300 !ruby/object:Gem::Requirement
49
+ requirement: &70218189953020 !ruby/object:Gem::Requirement
50
50
  none: false
51
51
  requirements:
52
52
  - - ~>
@@ -54,10 +54,10 @@ dependencies:
54
54
  version: 0.9.2.2
55
55
  type: :development
56
56
  prerelease: false
57
- version_requirements: *70193556626300
57
+ version_requirements: *70218189953020
58
58
  - !ruby/object:Gem::Dependency
59
59
  name: rspec
60
- requirement: &70193556625680 !ruby/object:Gem::Requirement
60
+ requirement: &70218189952200 !ruby/object:Gem::Requirement
61
61
  none: false
62
62
  requirements:
63
63
  - - ~>
@@ -65,10 +65,10 @@ dependencies:
65
65
  version: 2.8.0
66
66
  type: :development
67
67
  prerelease: false
68
- version_requirements: *70193556625680
68
+ version_requirements: *70218189952200
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: rdoc
71
- requirement: &70193556624820 !ruby/object:Gem::Requirement
71
+ requirement: &70218189951400 !ruby/object:Gem::Requirement
72
72
  none: false
73
73
  requirements:
74
74
  - - ~>
@@ -76,10 +76,10 @@ dependencies:
76
76
  version: '3.11'
77
77
  type: :development
78
78
  prerelease: false
79
- version_requirements: *70193556624820
79
+ version_requirements: *70218189951400
80
80
  - !ruby/object:Gem::Dependency
81
81
  name: prawn
82
- requirement: &70193556623960 !ruby/object:Gem::Requirement
82
+ requirement: &70218189950700 !ruby/object:Gem::Requirement
83
83
  none: false
84
84
  requirements:
85
85
  - - ~>
@@ -87,10 +87,10 @@ dependencies:
87
87
  version: 0.12.0
88
88
  type: :development
89
89
  prerelease: false
90
- version_requirements: *70193556623960
90
+ version_requirements: *70218189950700
91
91
  - !ruby/object:Gem::Dependency
92
92
  name: guard-rspec
93
- requirement: &70193556623440 !ruby/object:Gem::Requirement
93
+ requirement: &70218189950100 !ruby/object:Gem::Requirement
94
94
  none: false
95
95
  requirements:
96
96
  - - ~>
@@ -98,7 +98,7 @@ dependencies:
98
98
  version: 1.2.0
99
99
  type: :development
100
100
  prerelease: false
101
- version_requirements: *70193556623440
101
+ version_requirements: *70218189950100
102
102
  description: a library that can read structured and positional text from PDFs. Ideal
103
103
  for asembling structured data from invoices and the like.
104
104
  email: gallagher.paul@gmail.com
@@ -111,6 +111,7 @@ files:
111
111
  - .rspec
112
112
  - .rvmrc
113
113
  - .travis.yml
114
+ - CHANGELOG
114
115
  - Gemfile
115
116
  - Gemfile.lock
116
117
  - Guardfile
@@ -125,8 +126,10 @@ files:
125
126
  - lib/pdf/reader/turtletext/version.rb
126
127
  - pdf-reader-turtletext.gemspec
127
128
  - spec/fixtures/pdf_samples/.gitkeep
129
+ - spec/fixtures/pdf_samples/expectations.yml
128
130
  - spec/fixtures/pdf_samples/hello_world.pdf
129
131
  - spec/fixtures/pdf_samples/junk_prefix.pdf
132
+ - spec/fixtures/pdf_samples/simple_table_text.pdf
130
133
  - spec/integration/pdf_samples_spec.rb
131
134
  - spec/spec_helper.rb
132
135
  - spec/support/pdf_samples_helper.rb