pdf-reader-turtletext 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,3 +1,11 @@
1
1
  # These are specific configuration settings required for travis-ci
2
2
  # see http://travis-ci.org/tardate/pdf-reader-turtletext
3
- rvm: 1.9.3
3
+ language: ruby
4
+ rvm:
5
+ - 1.8.7
6
+ - 1.9.2
7
+ - 1.9.3
8
+ - rbx-18mode
9
+ - rbx-19mode
10
+ - jruby-18mode
11
+ - jruby-19mode
@@ -0,0 +1,11 @@
1
+ Version 0.2.0 Release: n/a
2
+ ==================================================
3
+ * add bounding_box / textangle semantics
4
+ * improve documentation
5
+ * MRI 1.8.7, 1.9.2, 1.9.3, Rubinius (1.8 and 1.9 mode), JRuby (1.8 and 1.9 mode)
6
+
7
+ Version 0.1.0 Release: 22nd July 2012
8
+ ==================================================
9
+ * Initial packaging and release of core functionality directly extracted
10
+ from https://github.com/tardate/sps_bill_scanner/
11
+ * MRI 1.9 only
@@ -14,30 +14,121 @@ For an example of how this is works in practice, see the
14
14
 
15
15
  == Requirements and Known Limitations
16
16
 
17
- * currently only tested with Ruby 1.9
18
- * fixed dependency on PDF::Reader v 1.1.1
17
+ * Tested with MRI 1.8.7, 1.9.2, 1.9.3, Rubinius (1.8 and 1.9 mode), JRuby (1.8 and 1.9 mode)
18
+ * Has a fixed dependency on PDF::Reader v1.1.1
19
19
 
20
- == Installation
20
+ == The PDF::Reader::Turtletext Cookbook
21
21
 
22
- gem install pdf-reader-turtletext
22
+ === How do I install it for normal use?
23
23
 
24
- == Usage
24
+ It is distributed as a gem, so all normal gem installation procedures apply. To install the
25
+ gem directly from the command line:
25
26
 
26
- === PDF::Reader::Turtletext
27
+ $ gem install pdf-reader-turtletext
27
28
 
28
- Provides a range of methods to extract structured text from a PDF file,
29
- such as <tt>text_position</tt> and <tt>text_in_region</tt>.
29
+ If you are using bundler or Rails, add to your Gemfile:
30
30
 
31
- A typical usage:
31
+ gem 'pdf-reader-turtletext'
32
32
 
33
+ Then bundle install:
34
+
35
+ $ bundle
36
+
37
+ === How do I install it for gem development?
38
+
39
+ If you want to work on enhancements of fix bugs in PDF::Reader::Turtletext, fork and clone the github repository. See the section below on 'Contributing to PDF::Reader::Turtletext'
40
+
41
+ === How to instantiate Turtletext in code
42
+
43
+ All interaction is done using an instance of the PDF::Reader::Turtletext class. It is
44
+ initialised given a filename or IO-like object, and any required options.
45
+
46
+ Typical usage:
47
+
48
+ pdf_filename = '../some_path/some.pdf'
33
49
  reader = PDF::Reader::Turtletext.new(pdf_filename)
34
- page = 1
35
- heading_position = reader.text_position(/transaction table/i)
36
- next_section = reader.text_position(/transaction summary/i)
37
- transaction_rows = reader.text_in_region(
38
- heading_position[x], 900,
39
- heading_position[y] + 1,next_section[:y] -1
50
+ options = { :y_precision => 5 }
51
+ reader_with_options = PDF::Reader::Turtletext.new(pdf_filename,options)
52
+
53
+ === How to extract text within a region described in relation to other text
54
+
55
+ Problem: we don't know exactly where the required text will be on the page, and it is not encoded
56
+ within the PDF as a single object. But we do know that it will be relatively positioned (for example)
57
+ below a certain bit of text, to the left of another, and above some other text.
58
+
59
+ Solution: use the <tt>bounding_box</tt> method to describe the region and extract the matching text.
60
+
61
+ textangle = reader.bounding_box do
62
+ page 1
63
+ below /electricity/i
64
+ above 10
65
+ right_of 240.0
66
+ left_of "Total ($)"
67
+ end
68
+ textangle.text
69
+ => [['string','string'],['string']] # array of rows, each row is an array of text elements in the row
70
+
71
+ The range of methods that can be used within the <tt>bounding_box</tt> block are all optional, and include:
72
+ * <tt>page</tt> - specifies the PDF page from which to extract text (default is 1).
73
+ * <tt>below</tt> - a string, regex or number that describes the upper limit of the text box
74
+ (default is top border of the page).
75
+ * <tt>above</tt> - a string, regex or number that describes the lower limit of the text box
76
+ (default is bottom border of the page).
77
+ * <tt>left_of</tt> - a string, regex or number that describes the right limit of the text box
78
+ (default is right border of the page).
79
+ * <tt>right_of</tt> - a string, regex or number that describes the left limit of the text box
80
+ (default is left border of the page).
81
+
82
+ Note that <tt>left_of</tt> and <tt>right_of</tt> constraints do *not* need to be within the vertical
83
+ range of the box being described.
84
+ For example, you could use an element in the page header to describe the <tt>left_of</tt> limit
85
+ for a table at the bottom of the page, if it has the correct alignment needed to describe your text region.
86
+
87
+ Similarly, <tt>above</tt> and <tt>below</tt> constraints do *not* need to be within the horizontal
88
+ range of the box being described.
89
+
90
+ === Using a block parameter with the <tt>bounding_box</tt> method
91
+
92
+ An explicit block parameter may be used with the <tt>bounding_box</tt> method:
93
+
94
+ textangle = reader.bounding_box do |r|
95
+ r.below /electricity/i
96
+ r.left_of "Total ($)"
97
+ end
98
+ textangle.text
99
+ => [['string','string'],['string']] # array of rows, each row is an array of text elements in the row
100
+
101
+ === Extract text for a region with known positional co-ordinates
102
+
103
+ If you know (or can calculate) the x,y positions of the required text region, you can extract the region's
104
+ text using the <tt>text_in_region</tt> method.
105
+
106
+ text = reader.text_in_region(
107
+ 10, # minimum x (left-most) (inclusive)
108
+ 900, # maximum x (right-most) (inclusive)
109
+ 200, # minimum y (bottom-most) (inclusive)
110
+ 400, # maximum y (top-most) (inclusive)
111
+ 1 # page
40
112
  )
113
+ => [['string','string'],['string']] # array of rows, each row is an array of text elements in the row
114
+
115
+ Note that the x,y origin is at the bottom-left of the page.
116
+
117
+ === How to find the x,y co-ordinate of a specific text element
118
+
119
+ Problem: if you are doing low-level text extraction with <tt>text_in_region</tt> for example,
120
+ it is usually necessary to locate specific text to provide a positional reference.
121
+
122
+ Solution: use the <tt>text_position</tt> method to locate text by exact or partial match.
123
+ It returns a Hash of x/y co-ordinates that is the bottom-left corner of the text.
124
+
125
+ page = 1
126
+ text_by_exact_match = reader.text_position("Transaction Table", page)
127
+ => { :x => 10.0, :y => 600.0 }
128
+ text_by_regex_match = reader.text_position(/transaction summary/i, page)
129
+ => { :x => 10.0, :y => 300.0 }
130
+
131
+ Note: in the case of multitple matches, only the first match is returned.
41
132
 
42
133
 
43
134
  == Contributing to PDF::Reader::Turtletext
@@ -16,6 +16,8 @@ class PDF::Reader::Turtletext
16
16
  attr_reader :options
17
17
 
18
18
  # +source+ is a file name or stream-like object
19
+ # Supported +options+ include:
20
+ # * :y_precision
19
21
  def initialize(source, options={})
20
22
  @options = options
21
23
  @reader = PDF::Reader.new(source)
@@ -31,7 +33,7 @@ class PDF::Reader::Turtletext
31
33
  end
32
34
 
33
35
  # Returns positional (with fuzzed y positioning) text content collection as a hash:
34
- # { y_position: { x_position: content}}
36
+ # [ fuzzed_y_position, [[x_position,content]] ]
35
37
  def content(page=1)
36
38
  @content ||= []
37
39
  if @content[page]
@@ -41,18 +43,24 @@ class PDF::Reader::Turtletext
41
43
  end
42
44
  end
43
45
 
44
- # Returns a hash with fuzzed positioning:
45
- # { fuzzed_y_position: { x_position: content}}
46
+ # Returns an Array with fuzzed positioning, ordered by decreasing y position. Row content order by x position.
47
+ # [ fuzzed_y_position, [[x_position,content]] ]
46
48
  # Given +input+ as a hash:
47
49
  # { y_position: { x_position: content}}
48
50
  # Fuzz factors: +y_precision+
49
51
  def fuzzed_y(input)
50
- output = {}
51
- input.keys.sort.each do |precise_y|
52
- # matching_y = (precise_y / 5.0).truncate * 5.0
53
- matching_y = output.keys.select{|new_y| (new_y - precise_y).abs < y_precision }.first || precise_y
54
- output[matching_y] ||= {}
55
- output[matching_y].merge!(input[precise_y])
52
+ output = []
53
+ input.keys.sort.reverse.each do |precise_y|
54
+ matching_y = output.map(&:first).select{|new_y| (new_y - precise_y).abs < y_precision }.first || precise_y
55
+ y_index = output.index{|y| y.first == matching_y }
56
+ new_row_content = input[precise_y].to_a
57
+ if y_index
58
+ row_content = output[y_index].last
59
+ row_content += new_row_content
60
+ output[y_index] = [matching_y,row_content]
61
+ else
62
+ output << [matching_y,new_row_content]
63
+ end
56
64
  end
57
65
  output
58
66
  end
@@ -69,21 +77,24 @@ class PDF::Reader::Turtletext
69
77
  end
70
78
 
71
79
  # Returns an array of text elements found within the x,y limits,
80
+ # x ranges from +xmin+ (left of page) to +xmax+ (right of page)
81
+ # y ranges from +ymin+ (bottom of page) to +ymax+ (top of page)
72
82
  # Each line of text found is returned as an array element.
73
83
  # Each line of text is an array of the seperate text elements found on that line.
74
84
  # [["first line first text", "first line last text"],["second line text"]]
75
85
  def text_in_region(xmin,xmax,ymin,ymax,page=1)
76
86
  text_map = content(page)
77
87
  box = []
78
- text_map.keys.sort.reverse.each do |y|
88
+
89
+ text_map.each do |y,text_row|
79
90
  if y >= ymin && y<= ymax
80
91
  row = []
81
- text_map[y].keys.sort.each do |x|
92
+ text_row.each do |x,element|
82
93
  if x >= xmin && x<= xmax
83
- row << text_map[y][x]
94
+ row << [x,element]
84
95
  end
85
96
  end
86
- box << row unless row.empty?
97
+ box << row.sort{|a,b| a.first <=> b.first }.map(&:last) unless row.empty?
87
98
  end
88
99
  end
89
100
  box
@@ -94,7 +105,11 @@ class PDF::Reader::Turtletext
94
105
  # +text+ may be a string (exact match required) or a Regexp
95
106
  def text_position(text,page=1)
96
107
  item = if text.class <= Regexp
97
- content(page).map {|k,v| if x = v.reduce(nil){|memo,vv| memo = (vv[1] =~ text) ? vv[0] : memo } ; [k,x] ; end }
108
+ content(page).map do |k,v|
109
+ if x = v.reduce(nil){|memo,vv| memo = (vv[1] =~ text) ? vv[0] : memo }
110
+ [k,x]
111
+ end
112
+ end
98
113
  else
99
114
  content(page).map {|k,v| if x = v.rassoc(text) ; [k,x] ; end }
100
115
  end
@@ -104,17 +119,30 @@ class PDF::Reader::Turtletext
104
119
  end
105
120
  end
106
121
 
107
- # WIP - not using Textangle yet for text extraction.
108
- # Ideal usage is something like this:
122
+ # Returns a text region definition using a descriptive block.
123
+ #
124
+ # Usage:
125
+ #
126
+ # textangle = reader.bounding_box do
127
+ # page 1
128
+ # below /electricity/i
129
+ # above 10
130
+ # right_of 240.0
131
+ # left_of "Total ($)"
132
+ # end
133
+ # textangle.text
134
+ #
135
+ # Alternatively, an explicit block parameter may be used:
109
136
  #
110
- # textangle = reader.bounding_box do
111
- # page 1
112
- # below "Electricity Services"
113
- # above "Gas Services by City Gas Pte Ltd"
114
- # right_of 240.0
115
- # left_of "Total ($)"
116
- # end
117
- # textangle.text
137
+ # textangle = reader.bounding_box do |r|
138
+ # r.page 1
139
+ # r.below /electricity/i
140
+ # r.above 10
141
+ # r.right_of 240.0
142
+ # r.left_of "Total ($)"
143
+ # end
144
+ # textangle.text
145
+ # => [['string','string'],['string']] # array of rows, each row is an array of column text element
118
146
  #
119
147
  def bounding_box(&block)
120
148
  PDF::Reader::Turtletext::Textangle.new(self,&block)
@@ -1,27 +1,103 @@
1
1
  # A DSL syntax for text extraction.
2
- # WIP - not using this yet
3
2
  #
4
- # textangle = PDF::Reader::Turtletext::Textangle.new(reader) do
5
- # page 1
6
- # below "Electricity Services"
7
- # above "Gas Services by City Gas Pte Ltd"
8
- # right_of 240.0
9
- # left_of "Total ($)"
3
+ # textangle = PDF::Reader::Turtletext::Textangle.new(reader) do |r|
4
+ # r.page = 1
5
+ # r.below = "Electricity Services"
6
+ # r.above = "Gas Services by City Gas Pte Ltd"
7
+ # r.right_of = 240.0
8
+ # r.left_of = "Total ($)"
10
9
  # end
11
10
  # textangle.text
12
11
  #
13
12
  class PDF::Reader::Turtletext::Textangle
14
13
  attr_reader :reader
15
- attr_writer :page,:above,:below,:left_of,:right_of
14
+ attr_accessor :page
15
+ attr_writer :above,:below,:left_of,:right_of
16
16
 
17
- # +structured_reader+ is a PDF::StructuredReader
18
- def initialize(structured_reader,&block)
19
- @reader = structured_reader
20
- instance_eval( &block ) if block
17
+ # +turtletext_reader+ is a PDF::Reader::Turtletext
18
+ def initialize(turtletext_reader,&block)
19
+ @reader = turtletext_reader
20
+ @page = 1
21
+ if block_given?
22
+ if block.arity == 1
23
+ yield self
24
+ else
25
+ instance_eval &block
26
+ end
27
+ end
21
28
  end
22
29
 
30
+ def above(*args)
31
+ if value = args.first
32
+ @above = value
33
+ end
34
+ @above
35
+ end
36
+
37
+ def below(*args)
38
+ if value = args.first
39
+ @below = value
40
+ end
41
+ @below
42
+ end
43
+
44
+ def left_of(*args)
45
+ if value = args.first
46
+ @left_of = value
47
+ end
48
+ @left_of
49
+ end
50
+
51
+ def right_of(*args)
52
+ if value = args.first
53
+ @right_of = value
54
+ end
55
+ @right_of
56
+ end
57
+
58
+ # Returns the text
23
59
  def text
24
- # TODO
60
+ return unless reader
61
+
62
+ xmin = if right_of
63
+ if [Fixnum,Float].include?(right_of.class)
64
+ right_of
65
+ else
66
+ reader.text_position(right_of,page)[:x] + 1
67
+ end
68
+ else
69
+ 0
70
+ end
71
+ xmax = if left_of
72
+ if [Fixnum,Float].include?(left_of.class)
73
+ left_of
74
+ else
75
+ reader.text_position(left_of,page)[:x] - 1
76
+ end
77
+ else
78
+ 99999 # TODO actual limit
79
+ end
80
+
81
+ ymin = if above
82
+ if [Fixnum,Float].include?(above.class)
83
+ above
84
+ else
85
+ reader.text_position(above,page)[:y] + 1
86
+ end
87
+ else
88
+ 0
89
+ end
90
+ ymax = if below
91
+ if [Fixnum,Float].include?(below.class)
92
+ below
93
+ else
94
+ reader.text_position(below,page)[:y] - 1
95
+ end
96
+ else
97
+ 99999 # TODO actual limit
98
+ end
99
+
100
+ reader.text_in_region(xmin,xmax,ymin,ymax,page)
25
101
  end
26
102
 
27
103
  end
@@ -3,7 +3,7 @@ module PDF
3
3
  class Turtletext
4
4
  class Version
5
5
  MAJOR = 0
6
- MINOR = 1
6
+ MINOR = 2
7
7
  PATCH = 0
8
8
 
9
9
  STRING = [MAJOR, MINOR, PATCH].compact.join('.')
@@ -5,11 +5,11 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = "pdf-reader-turtletext"
8
- s.version = "0.1.0"
8
+ s.version = "0.2.0"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Paul Gallagher"]
12
- s.date = "2012-07-22"
12
+ s.date = "2012-07-31"
13
13
  s.description = "a library that can read structured and positional text from PDFs. Ideal for asembling structured data from invoices and the like."
14
14
  s.email = "gallagher.paul@gmail.com"
15
15
  s.extra_rdoc_files = [
@@ -20,6 +20,7 @@ Gem::Specification.new do |s|
20
20
  ".rspec",
21
21
  ".rvmrc",
22
22
  ".travis.yml",
23
+ "CHANGELOG",
23
24
  "Gemfile",
24
25
  "Gemfile.lock",
25
26
  "Guardfile",
@@ -34,8 +35,10 @@ Gem::Specification.new do |s|
34
35
  "lib/pdf/reader/turtletext/version.rb",
35
36
  "pdf-reader-turtletext.gemspec",
36
37
  "spec/fixtures/pdf_samples/.gitkeep",
38
+ "spec/fixtures/pdf_samples/expectations.yml",
37
39
  "spec/fixtures/pdf_samples/hello_world.pdf",
38
40
  "spec/fixtures/pdf_samples/junk_prefix.pdf",
41
+ "spec/fixtures/pdf_samples/simple_table_text.pdf",
39
42
  "spec/integration/pdf_samples_spec.rb",
40
43
  "spec/spec_helper.rb",
41
44
  "spec/support/pdf_samples_helper.rb",
@@ -0,0 +1,95 @@
1
+ # this file defines the test expectations for PDF samples in spec/fixtures/pdf_samples.
2
+ #
3
+ # This is a YAML-format file, so beware that indentation is significant
4
+ ---
5
+ hello_world.pdf:
6
+ :test_above:
7
+ :above: 100
8
+ :expected_text:
9
+ -
10
+ - "Hello World"
11
+ :test_below:
12
+ :below: 900
13
+ :expected_text:
14
+ -
15
+ - "Hello World"
16
+ :test_below_na:
17
+ :below: 10
18
+ :expected_text: []
19
+ simple_table_text.pdf:
20
+ :test_above:
21
+ :above: Table Header
22
+ :expected_text:
23
+ -
24
+ - "Simple Table Text"
25
+ :test_below:
26
+ :below: row 2
27
+ :expected_text:
28
+ -
29
+ - "Table Footer"
30
+ :test_right_of:
31
+ :right_of: row 1
32
+ :expected_text:
33
+ -
34
+ - "val 1"
35
+ - "val 2"
36
+ - "val 3"
37
+ -
38
+ - "val 1"
39
+ - "val 2"
40
+ - "val 3"
41
+ :test_left_of:
42
+ :left_of: val 1
43
+ :expected_text:
44
+ -
45
+ - "Simple Table Text"
46
+ -
47
+ - "Table Header"
48
+ -
49
+ - "row 1"
50
+ -
51
+ - "row 2"
52
+ -
53
+ - "Table Footer"
54
+ :test_above_and_below:
55
+ :below: Table Header
56
+ :above: Table Footer
57
+ :expected_text:
58
+ -
59
+ - "row 1"
60
+ - "val 1"
61
+ - "val 2"
62
+ - "val 3"
63
+ -
64
+ - "row 2"
65
+ - "val 1"
66
+ - "val 2"
67
+ - "val 3"
68
+ :test_above_and_below_and_left_of:
69
+ :below: Table Header
70
+ :above: Table Footer
71
+ :left_of: val 2
72
+ :expected_text:
73
+ -
74
+ - "row 1"
75
+ - "val 1"
76
+ -
77
+ - "row 2"
78
+ - "val 1"
79
+ :test_above_and_below_and_left_of_and_right_of:
80
+ :below: Table Header
81
+ :above: Table Footer
82
+ :left_of: val 2
83
+ :right_of: row 1
84
+ :expected_text:
85
+ -
86
+ - "val 1"
87
+ -
88
+ - "val 1"
89
+
90
+
91
+
92
+
93
+
94
+
95
+
@@ -0,0 +1,139 @@
1
+ %PDF-1.3
2
+ %����
3
+ 1 0 obj
4
+ << /Creator <feff0050007200610077006e>
5
+ /Producer <feff0050007200610077006e>
6
+ >>
7
+ endobj
8
+ 2 0 obj
9
+ << /Type /Catalog
10
+ /Pages 3 0 R
11
+ >>
12
+ endobj
13
+ 3 0 obj
14
+ << /Type /Pages
15
+ /Count 1
16
+ /Kids [5 0 R]
17
+ >>
18
+ endobj
19
+ 4 0 obj
20
+ << /Length 795
21
+ >>
22
+ stream
23
+ q
24
+
25
+ BT
26
+ 36 747.384 Td
27
+ /F1.0 12 Tf
28
+ [<53696d706c652054> 120 <6162> 20 <6c652054> 120 <65> 30 <7874>] TJ
29
+ ET
30
+
31
+
32
+ BT
33
+ 46 327.384 Td
34
+ /F1.0 12 Tf
35
+ [<54> 120 <6162> 20 <6c6520486561646572>] TJ
36
+ ET
37
+
38
+
39
+ BT
40
+ 46 277.384 Td
41
+ /F1.0 12 Tf
42
+ [<726f> 15 <772031>] TJ
43
+ ET
44
+
45
+
46
+ BT
47
+ 136 277.384 Td
48
+ /F1.0 12 Tf
49
+ [<76> 25 <616c2031>] TJ
50
+ ET
51
+
52
+
53
+ BT
54
+ 186 277.384 Td
55
+ /F1.0 12 Tf
56
+ [<76> 25 <616c2032>] TJ
57
+ ET
58
+
59
+
60
+ BT
61
+ 236 277.384 Td
62
+ /F1.0 12 Tf
63
+ [<76> 25 <616c2033>] TJ
64
+ ET
65
+
66
+
67
+ BT
68
+ 46 227.38400000000001 Td
69
+ /F1.0 12 Tf
70
+ [<726f> 15 <772032>] TJ
71
+ ET
72
+
73
+
74
+ BT
75
+ 136 227.38400000000001 Td
76
+ /F1.0 12 Tf
77
+ [<76> 25 <616c2031>] TJ
78
+ ET
79
+
80
+
81
+ BT
82
+ 186 227.38400000000001 Td
83
+ /F1.0 12 Tf
84
+ [<76> 25 <616c2032>] TJ
85
+ ET
86
+
87
+
88
+ BT
89
+ 236 227.38400000000001 Td
90
+ /F1.0 12 Tf
91
+ [<76> 25 <616c2033>] TJ
92
+ ET
93
+
94
+
95
+ BT
96
+ 46 177.38400000000001 Td
97
+ /F1.0 12 Tf
98
+ [<54> 120 <6162> 20 <6c652046> 30 <6f6f746572>] TJ
99
+ ET
100
+
101
+ Q
102
+
103
+ endstream
104
+ endobj
105
+ 5 0 obj
106
+ << /Type /Page
107
+ /Parent 3 0 R
108
+ /MediaBox [0 0 612.0 792.0]
109
+ /Contents 4 0 R
110
+ /Resources << /ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
111
+ /Font << /F1.0 6 0 R
112
+ >>
113
+ >>
114
+ >>
115
+ endobj
116
+ 6 0 obj
117
+ << /Type /Font
118
+ /Subtype /Type1
119
+ /BaseFont /Helvetica
120
+ /Encoding /WinAnsiEncoding
121
+ >>
122
+ endobj
123
+ xref
124
+ 0 7
125
+ 0000000000 65535 f
126
+ 0000000015 00000 n
127
+ 0000000109 00000 n
128
+ 0000000158 00000 n
129
+ 0000000215 00000 n
130
+ 0000001061 00000 n
131
+ 0000001239 00000 n
132
+ trailer
133
+ << /Size 7
134
+ /Root 2 0 R
135
+ /Info 1 0 R
136
+ >>
137
+ startxref
138
+ 1336
139
+ %%EOF
@@ -3,5 +3,33 @@ include PdfSamplesHelper
3
3
 
4
4
  describe "PDF Samples" do
5
5
 
6
+ # This will scan all *.pdf files in spec/fixtures/personal_pdf_samples
7
+ # and do basic verification of the file structure without any effort from you.
8
+ pdf_sample_expectations.each do |sample_name,test_specifications|
9
+ describe "sample" do
10
+ let(:options) { test_specifications[:options] || {} }
11
+ let(:sample_file) { pdf_sample(sample_name) }
12
+ let(:turtletext_reader) { PDF::Reader::Turtletext.new(sample_file,options) }
13
+
14
+ (test_specifications||{}).each do |test_name,expectations|
15
+ context test_name do
16
+ let(:bounding_box) {
17
+ turtletext_reader.bounding_box do
18
+ above expectations[:above]
19
+ below expectations[:below]
20
+ left_of expectations[:left_of]
21
+ right_of expectations[:right_of]
22
+ end
23
+ }
24
+ # it {
25
+ # puts "bounding_box"
26
+ # puts bounding_box.inspect
27
+ # }
28
+ subject { bounding_box.text }
29
+ it { should eql(expectations[:expected_text])}
30
+ end
31
+ end
32
+ end
33
+ end
6
34
 
7
35
  end
@@ -31,6 +31,7 @@ module PdfSamplesHelper
31
31
  require 'prawn'
32
32
  puts "Making PDF samples for tests.."
33
33
  make_sample_hello_world
34
+ make_sample_simple_table_text
34
35
  end
35
36
 
36
37
  def make_sample_hello_world
@@ -40,4 +41,26 @@ module PdfSamplesHelper
40
41
  end
41
42
  puts "Created: #{filename}"
42
43
  end
44
+
45
+ def make_sample_simple_table_text
46
+ filename = pdf_sample('simple_table_text.pdf')
47
+ Prawn::Document.generate filename do
48
+ text "Simple Table Text"
49
+ text_box "Table Header", :at => [10, 300], :width => 200
50
+
51
+ text_box "row 1", :at => [10, 250], :width => 90
52
+ text_box "val 1", :at => [100, 250], :width => 50
53
+ text_box "val 2", :at => [150, 250], :width => 50
54
+ text_box "val 3", :at => [200, 250], :width => 50
55
+
56
+ text_box "row 2", :at => [10, 200], :width => 90
57
+ text_box "val 1", :at => [100, 200], :width => 50
58
+ text_box "val 2", :at => [150, 200], :width => 50
59
+ text_box "val 3", :at => [200, 200], :width => 50
60
+
61
+ text_box "Table Footer", :at => [10, 150], :width => 200
62
+ end
63
+ puts "Created: #{filename}"
64
+ end
65
+
43
66
  end
@@ -3,4 +3,197 @@ require 'spec_helper'
3
3
  describe PDF::Reader::Turtletext::Textangle do
4
4
  let(:resource_class) { PDF::Reader::Turtletext::Textangle }
5
5
 
6
+ let(:source) { nil } # we're just going to mock the PDF source here
7
+ let(:options) { {} }
8
+ let(:turtletext_reader) { PDF::Reader::Turtletext.new(source,options) }
9
+
10
+
11
+ describe "#reader" do
12
+ let(:textangle) { resource_class.new(turtletext_reader) }
13
+ subject { textangle.reader }
14
+ it { should be_a(PDF::Reader::Turtletext) }
15
+ end
16
+
17
+ describe "#text" do
18
+ let(:page) { 1 }
19
+ before do
20
+ turtletext_reader.stub(:load_content).and_return(given_page_content)
21
+ end
22
+ let(:given_page_content) { {
23
+ 70.0=>{10.0=>"crunchy bacon"},
24
+ 40.0=>{15.0=>"bacon on kimchi noodles", 25.0=>"heaven"},
25
+ 30.0=>{30.0=>"turkey bacon", 35.0=>"fraud"},
26
+ 10.0=>{40.0=>"smoked and streaky for me"}
27
+ } }
28
+
29
+ context "with block param" do
30
+ [:above,:below,:left_of,:right_of].each do |positional_method|
31
+ context "with #{positional_method}" do
32
+ let(:term) { "canary" }
33
+
34
+ it "should work with block param" do
35
+ textangle = resource_class.new(turtletext_reader) do |r|
36
+ r.send("#{positional_method}=",term)
37
+ end
38
+ textangle.send(positional_method).should eql(term)
39
+ end
40
+
41
+ end
42
+ end
43
+ end
44
+
45
+ context "without block param" do
46
+ it "#above should work" do
47
+ textangle = resource_class.new(turtletext_reader) do
48
+ above "canary"
49
+ end
50
+ textangle.above.should eql("canary")
51
+ end
52
+ it "#below should work" do
53
+ textangle = resource_class.new(turtletext_reader) do
54
+ below "canary"
55
+ end
56
+ textangle.below.should eql("canary")
57
+ end
58
+ it "#left_of should work" do
59
+ textangle = resource_class.new(turtletext_reader) do
60
+ left_of "canary"
61
+ end
62
+ textangle.left_of.should eql("canary")
63
+ end
64
+ it "#below should work" do
65
+ textangle = resource_class.new(turtletext_reader) do
66
+ right_of "canary"
67
+ end
68
+ textangle.right_of.should eql("canary")
69
+ end
70
+ end
71
+
72
+ context "when only below specified" do
73
+ context "as a string" do
74
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
75
+ r.below = "fraud"
76
+ end }
77
+ let(:expected) { [["smoked and streaky for me"]]}
78
+ subject { textangle.text }
79
+ it { should eql(expected) }
80
+ end
81
+ context "as a regex" do
82
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
83
+ r.below = /Fraud/i
84
+ end }
85
+ let(:expected) { [["smoked and streaky for me"]]}
86
+ subject { textangle.text }
87
+ it { should eql(expected) }
88
+ end
89
+ context "as a number" do
90
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
91
+ r.below = 20
92
+ end }
93
+ let(:expected) { [["smoked and streaky for me"]]}
94
+ subject { textangle.text }
95
+ it { should eql(expected) }
96
+ end
97
+ end
98
+
99
+ context "when only above specified" do
100
+ context "as a string" do
101
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
102
+ r.above = "heaven"
103
+ end }
104
+ let(:expected) { [["crunchy bacon"]]}
105
+ subject { textangle.text }
106
+ it { should eql(expected) }
107
+ end
108
+ context "as a regex" do
109
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
110
+ r.above = /heaVen/i
111
+ end }
112
+ let(:expected) { [["crunchy bacon"]]}
113
+ subject { textangle.text }
114
+ it { should eql(expected) }
115
+ end
116
+ context "as a number" do
117
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
118
+ r.above = 41
119
+ end }
120
+ let(:expected) { [["crunchy bacon"]]}
121
+ subject { textangle.text }
122
+ it { should eql(expected) }
123
+ end
124
+ end
125
+
126
+ context "when only left_of specified" do
127
+ context "as a string" do
128
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
129
+ r.left_of = "turkey bacon"
130
+ end }
131
+ let(:expected) { [
132
+ ["crunchy bacon"],
133
+ ["bacon on kimchi noodles", "heaven"]
134
+ ] }
135
+ subject { textangle.text }
136
+ it { should eql(expected) }
137
+ end
138
+ context "as a regex" do
139
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
140
+ r.left_of = /turKey/i
141
+ end }
142
+ let(:expected) { [
143
+ ["crunchy bacon"],
144
+ ["bacon on kimchi noodles", "heaven"]
145
+ ] }
146
+ subject { textangle.text }
147
+ it { should eql(expected) }
148
+ end
149
+ context "as a number" do
150
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
151
+ r.left_of = 29
152
+ end }
153
+ let(:expected) { [
154
+ ["crunchy bacon"],
155
+ ["bacon on kimchi noodles", "heaven"]
156
+ ] }
157
+ subject { textangle.text }
158
+ it { should eql(expected) }
159
+ end
160
+ end
161
+
162
+ context "when only right_of specified" do
163
+ context "as a string" do
164
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
165
+ r.right_of = "heaven"
166
+ end }
167
+ let(:expected) { [
168
+ ["turkey bacon","fraud"],
169
+ ["smoked and streaky for me"]
170
+ ] }
171
+ subject { textangle.text }
172
+ it { should eql(expected) }
173
+ end
174
+ context "as a regex" do
175
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
176
+ r.right_of = /Heaven/i
177
+ end }
178
+ let(:expected) { [
179
+ ["turkey bacon","fraud"],
180
+ ["smoked and streaky for me"]
181
+ ] }
182
+ subject { textangle.text }
183
+ it { should eql(expected) }
184
+ end
185
+ context "as a number" do
186
+ let(:textangle) { resource_class.new(turtletext_reader) do |r|
187
+ r.right_of = 26
188
+ end }
189
+ let(:expected) { [
190
+ ["turkey bacon","fraud"],
191
+ ["smoked and streaky for me"]
192
+ ] }
193
+ subject { textangle.text }
194
+ it { should eql(expected) }
195
+ end
196
+ end
197
+
198
+ end
6
199
  end
@@ -4,16 +4,16 @@ describe PDF::Reader::Turtletext do
4
4
  let(:resource_class) { PDF::Reader::Turtletext }
5
5
 
6
6
  let(:source) { nil } # we're just going to mock the PDF source here
7
- let(:structured_reader) { resource_class.new(source,options) }
7
+ let(:turtletext_reader) { resource_class.new(source,options) }
8
8
  let(:options) { {} }
9
9
 
10
10
  describe "#reader" do
11
- subject { structured_reader.reader}
11
+ subject { turtletext_reader.reader}
12
12
  it { should be_a(PDF::Reader) }
13
13
  end
14
14
 
15
15
  describe "#y_precision" do
16
- subject { structured_reader.y_precision}
16
+ subject { turtletext_reader.y_precision}
17
17
  context "default" do
18
18
  it { should eql(3) }
19
19
  end
@@ -27,35 +27,40 @@ describe PDF::Reader::Turtletext do
27
27
  context "with mocked source content" do
28
28
  let(:page) { 1 }
29
29
  before do
30
- structured_reader.should_receive(:load_content).with(page).and_return(given_page_content)
30
+ turtletext_reader.should_receive(:load_content).with(page).and_return(given_page_content)
31
31
  end
32
32
 
33
33
  {
34
34
  :with_simple_text => {
35
35
  :source_page_content => {10.0=>{10.0=>"a first bit of text"}},
36
36
  :expected_precise_content => {10.0=>{10.0=>"a first bit of text"}},
37
- :expected_fuzzed_content => {10.0=>{10.0=>"a first bit of text"}}
37
+ :expected_fuzzed_content => [[10.0,[[10.0,"a first bit of text"]]]]
38
38
  },
39
39
  :with_widely_separated_text => {
40
- :source_page_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
41
- :expected_precise_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
42
- :expected_fuzzed_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}}
43
- },
44
- :with_unsorted_y_text => {
45
40
  :source_page_content => {20.0=>{10.0=>"a first bit of text"},10.0=>{20.0=>"a second bit of text"}},
46
41
  :expected_precise_content => {20.0=>{10.0=>"a first bit of text"},10.0=>{20.0=>"a second bit of text"}},
47
- :expected_fuzzed_content => {10.0=>{20.0=>"a second bit of text"},20.0=>{10.0=>"a first bit of text"}}
42
+ :expected_fuzzed_content => [[20.0, [[10.0, "a first bit of text"]]], [10.0, [[20.0, "a second bit of text"]]]]
43
+ },
44
+ :with_unsorted_y_text => {
45
+ :source_page_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
46
+ :expected_precise_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
47
+ :expected_fuzzed_content => [[20.0, [[20.0, "a second bit of text"]]], [10.0, [[10.0, "a first bit of text"]]]]
48
48
  },
49
49
  :with_fuzzed_y_text => {
50
- :source_page_content => {10.0=>{10.0=>"a first bit of text"},12.0=>{12.0=>"a second bit of text"}},
51
- :expected_precise_content => {10.0=>{10.0=>"a first bit of text"},12.0=>{12.0=>"a second bit of text"}},
52
- :expected_fuzzed_content => {10.0=>{10.0=>"a first bit of text",12.0=>"a second bit of text"}}
50
+ :source_page_content => {20.0=>{10.0=>"a first bit of text"},18.0=>{12.0=>"a second bit of text"}},
51
+ :expected_precise_content => {20.0=>{10.0=>"a first bit of text"},18.0=>{12.0=>"a second bit of text"}},
52
+ :expected_fuzzed_content => [[20.0, [[10.0, "a first bit of text"], [12.0, "a second bit of text"]]]]
53
53
  },
54
54
  :with_widely_separated_fuzzed_y_text => {
55
55
  :y_precision => 25,
56
- :source_page_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
57
- :expected_precise_content => {10.0=>{10.0=>"a first bit of text"},20.0=>{20.0=>"a second bit of text"}},
58
- :expected_fuzzed_content => {10.0=>{10.0=>"a first bit of text",20.0=>"a second bit of text"}}
56
+ :source_page_content => {20.0=>{10.0=>"a first bit of text"},10.0=>{20.0=>"a second bit of text"}},
57
+ :expected_precise_content => {20.0=>{10.0=>"a first bit of text"},10.0=>{20.0=>"a second bit of text"}},
58
+ :expected_fuzzed_content => [[20.0, [[10.0, "a first bit of text"], [20.0, "a second bit of text"]]]]
59
+ },
60
+ :with_multiple_row_text => {
61
+ :source_page_content => {10.0=>{10.0=>"first"},8.0=>{20.0=>"second",30.0=>"third"}},
62
+ :expected_precise_content => {10.0=>{10.0=>"first"},8.0=>{20.0=>"second",30.0=>"third"}},
63
+ :expected_fuzzed_content => [[10.0, [[10.0, "first"], [20.0, "second"], [30.0, "third"]]]]
59
64
  }
60
65
  }.each do |test_name,test_expectations|
61
66
  context test_name do
@@ -69,12 +74,12 @@ describe PDF::Reader::Turtletext do
69
74
  }
70
75
 
71
76
  describe "#content" do
72
- subject { structured_reader.content(page) }
77
+ subject { turtletext_reader.content(page) }
73
78
  it { should eql(test_expectations[:expected_fuzzed_content]) }
74
79
  end
75
80
 
76
81
  describe "#precise_content" do
77
- subject { structured_reader.precise_content(page) }
82
+ subject { turtletext_reader.precise_content(page) }
78
83
  it { should eql(test_expectations[:expected_precise_content]) }
79
84
  end
80
85
 
@@ -90,24 +95,24 @@ describe PDF::Reader::Turtletext do
90
95
  },
91
96
  :with_single_line_text => {
92
97
  :source_page_content => {
93
- 10.0=>{10.0=>"first line ignored"},
98
+ 70.0=>{10.0=>"first line ignored"},
94
99
  30.0=>{10.0=>"first part found", 20.0=>"last part found"},
95
- 70.0=>{10.0=>"last line ignored"}
100
+ 10.0=>{10.0=>"last line ignored"}
96
101
  },
97
102
  :xmin => 0, :xmax => 100, :ymin => 20, :ymax => 50,
98
103
  :expected_text => [["first part found", "last part found"]]
99
104
  },
100
105
  :with_multi_line_text => {
101
106
  :source_page_content => {
102
- 10.0=>{10.0=>"first line ignored"},
103
- 30.0=>{10.0=>"first line first part found", 20.0=>"first line last part found"},
104
- 40.0=>{10.0=>"last line first part found", 20.0=>"last line last part found"},
105
- 70.0=>{10.0=>"last line ignored"}
107
+ 70.0=>{10.0=>"first line ignored"},
108
+ 40.0=>{10.0=>"first line first part found", 20.0=>"first line last part found"},
109
+ 30.0=>{10.0=>"last line first part found", 20.0=>"last line last part found"},
110
+ 10.0=>{10.0=>"last line ignored"}
106
111
  },
107
112
  :xmin => 0, :xmax => 100, :ymin => 20, :ymax => 50,
108
113
  :expected_text => [
109
- ["last line first part found", "last line last part found"],
110
- ["first line first part found", "first line last part found"]
114
+ ["first line first part found", "first line last part found"],
115
+ ["last line first part found", "last line last part found"]
111
116
  ]
112
117
  }
113
118
  }.each do |test_name,test_expectations|
@@ -118,7 +123,7 @@ describe PDF::Reader::Turtletext do
118
123
  let(:ymin) { test_expectations[:ymin] }
119
124
  let(:ymax) { test_expectations[:ymax] }
120
125
  let(:expected_text) { test_expectations[:expected_text] }
121
- subject { structured_reader.text_in_region(xmin,xmax,ymin,ymax,page) }
126
+ subject { turtletext_reader.text_in_region(xmin,xmax,ymin,ymax,page) }
122
127
  it { should eql(expected_text) }
123
128
  end
124
129
  end
@@ -126,21 +131,21 @@ describe PDF::Reader::Turtletext do
126
131
 
127
132
  describe "#text_position" do
128
133
  let(:given_page_content) { {
129
- 10.0=>{10.0=>"crunchy bacon"},
130
- 30.0=>{15.0=>"bacon on kimchi noodles", 25.0=>"heaven"},
131
- 40.0=>{30.0=>"turkey bacon", 35.0=>"fraud"},
132
- 70.0=>{40.0=>"smoked and streaky da bomb"}
134
+ 70.0=>{10.0=>"crunchy bacon"},
135
+ 40.0=>{15.0=>"bacon on kimchi noodles", 25.0=>"heaven"},
136
+ 30.0=>{30.0=>"turkey bacon", 35.0=>"fraud"},
137
+ 10.0=>{40.0=>"smoked and streaky da bomb"}
133
138
  } }
134
139
  {
135
- :with_simple_match => { :match_term => 'turkey bacon', :expected_position => {:x=>30.0, :y=>40.0} },
136
- :with_match_along_line => { :match_term => 'heaven', :expected_position => {:x=>25.0, :y=>30.0} },
137
- :with_regex_match => { :match_term => /kimchi/, :expected_position => {:x=>15.0, :y=>30.0} },
138
- :with_regex_multi_matches_first => { :match_term => /turkey|crunchy/, :expected_position => {:x=>10.0, :y=>10.0} }
140
+ :with_simple_match => { :match_term => 'turkey bacon', :expected_position => {:x=>30.0, :y=>30.0} },
141
+ :with_match_along_line => { :match_term => 'heaven', :expected_position => {:x=>25.0, :y=>40.0} },
142
+ :with_regex_match => { :match_term => /kimchi/, :expected_position => {:x=>15.0, :y=>40.0} },
143
+ :with_regex_multi_matches_first => { :match_term => /turkey|crunchy/, :expected_position => {:x=>10.0, :y=>70.0} }
139
144
  }.each do |test_name,test_expectations|
140
145
  context test_name do
141
146
  let(:match_term) { test_expectations[:match_term] }
142
147
  let(:expected_position) { test_expectations[:expected_position] }
143
- subject { structured_reader.text_position(match_term,page) }
148
+ subject { turtletext_reader.text_position(match_term,page) }
144
149
  it { should eql(expected_position) }
145
150
  end
146
151
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pdf-reader-turtletext
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-07-22 00:00:00.000000000 Z
12
+ date: 2012-07-31 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: pdf-reader
16
- requirement: &70193556628420 !ruby/object:Gem::Requirement
16
+ requirement: &70218189955060 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - =
@@ -21,10 +21,10 @@ dependencies:
21
21
  version: 1.1.1
22
22
  type: :runtime
23
23
  prerelease: false
24
- version_requirements: *70193556628420
24
+ version_requirements: *70218189955060
25
25
  - !ruby/object:Gem::Dependency
26
26
  name: bundler
27
- requirement: &70193556627700 !ruby/object:Gem::Requirement
27
+ requirement: &70218189954360 !ruby/object:Gem::Requirement
28
28
  none: false
29
29
  requirements:
30
30
  - - ~>
@@ -32,10 +32,10 @@ dependencies:
32
32
  version: 1.1.4
33
33
  type: :development
34
34
  prerelease: false
35
- version_requirements: *70193556627700
35
+ version_requirements: *70218189954360
36
36
  - !ruby/object:Gem::Dependency
37
37
  name: jeweler
38
- requirement: &70193556626800 !ruby/object:Gem::Requirement
38
+ requirement: &70218189953580 !ruby/object:Gem::Requirement
39
39
  none: false
40
40
  requirements:
41
41
  - - ~>
@@ -43,10 +43,10 @@ dependencies:
43
43
  version: 1.6.4
44
44
  type: :development
45
45
  prerelease: false
46
- version_requirements: *70193556626800
46
+ version_requirements: *70218189953580
47
47
  - !ruby/object:Gem::Dependency
48
48
  name: rake
49
- requirement: &70193556626300 !ruby/object:Gem::Requirement
49
+ requirement: &70218189953020 !ruby/object:Gem::Requirement
50
50
  none: false
51
51
  requirements:
52
52
  - - ~>
@@ -54,10 +54,10 @@ dependencies:
54
54
  version: 0.9.2.2
55
55
  type: :development
56
56
  prerelease: false
57
- version_requirements: *70193556626300
57
+ version_requirements: *70218189953020
58
58
  - !ruby/object:Gem::Dependency
59
59
  name: rspec
60
- requirement: &70193556625680 !ruby/object:Gem::Requirement
60
+ requirement: &70218189952200 !ruby/object:Gem::Requirement
61
61
  none: false
62
62
  requirements:
63
63
  - - ~>
@@ -65,10 +65,10 @@ dependencies:
65
65
  version: 2.8.0
66
66
  type: :development
67
67
  prerelease: false
68
- version_requirements: *70193556625680
68
+ version_requirements: *70218189952200
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: rdoc
71
- requirement: &70193556624820 !ruby/object:Gem::Requirement
71
+ requirement: &70218189951400 !ruby/object:Gem::Requirement
72
72
  none: false
73
73
  requirements:
74
74
  - - ~>
@@ -76,10 +76,10 @@ dependencies:
76
76
  version: '3.11'
77
77
  type: :development
78
78
  prerelease: false
79
- version_requirements: *70193556624820
79
+ version_requirements: *70218189951400
80
80
  - !ruby/object:Gem::Dependency
81
81
  name: prawn
82
- requirement: &70193556623960 !ruby/object:Gem::Requirement
82
+ requirement: &70218189950700 !ruby/object:Gem::Requirement
83
83
  none: false
84
84
  requirements:
85
85
  - - ~>
@@ -87,10 +87,10 @@ dependencies:
87
87
  version: 0.12.0
88
88
  type: :development
89
89
  prerelease: false
90
- version_requirements: *70193556623960
90
+ version_requirements: *70218189950700
91
91
  - !ruby/object:Gem::Dependency
92
92
  name: guard-rspec
93
- requirement: &70193556623440 !ruby/object:Gem::Requirement
93
+ requirement: &70218189950100 !ruby/object:Gem::Requirement
94
94
  none: false
95
95
  requirements:
96
96
  - - ~>
@@ -98,7 +98,7 @@ dependencies:
98
98
  version: 1.2.0
99
99
  type: :development
100
100
  prerelease: false
101
- version_requirements: *70193556623440
101
+ version_requirements: *70218189950100
102
102
  description: a library that can read structured and positional text from PDFs. Ideal
103
103
  for asembling structured data from invoices and the like.
104
104
  email: gallagher.paul@gmail.com
@@ -111,6 +111,7 @@ files:
111
111
  - .rspec
112
112
  - .rvmrc
113
113
  - .travis.yml
114
+ - CHANGELOG
114
115
  - Gemfile
115
116
  - Gemfile.lock
116
117
  - Guardfile
@@ -125,8 +126,10 @@ files:
125
126
  - lib/pdf/reader/turtletext/version.rb
126
127
  - pdf-reader-turtletext.gemspec
127
128
  - spec/fixtures/pdf_samples/.gitkeep
129
+ - spec/fixtures/pdf_samples/expectations.yml
128
130
  - spec/fixtures/pdf_samples/hello_world.pdf
129
131
  - spec/fixtures/pdf_samples/junk_prefix.pdf
132
+ - spec/fixtures/pdf_samples/simple_table_text.pdf
130
133
  - spec/integration/pdf_samples_spec.rb
131
134
  - spec/spec_helper.rb
132
135
  - spec/support/pdf_samples_helper.rb