format_parser_pdf 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: dc2beb605a5a7311a69fbac08d5c23d829d222e98ac0b28eef049e8900eb817d
4
+ data.tar.gz: 406cf82d28db153d083e8cfaf8fc8c4eb46a9ea18f7fb893842239096815bd0b
5
+ SHA512:
6
+ metadata.gz: 4febc76c910e9c35bcffc68d782e1bf8bc2b8204b8b3b5eedaaace3a64e2b4ea0deb8ff4711363b14f477685b6bafa344191425ca254f9e9aac56d2893ee85d7
7
+ data.tar.gz: 05cce2295783953dbddd3fd34fbb10527433d5a12a7022baf516e18e526a48d5deeffe24444e17c4b2080b08f6fee866992ba869a29fbad25f6299c9bb1a5cc8
@@ -0,0 +1,12 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+ Gemfile.lock
10
+
11
+ # rspec failure tracking
12
+ .rspec_status
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
@@ -0,0 +1,2 @@
1
+ inherit_gem:
2
+ wetransfer_style: ruby/default.yml
@@ -0,0 +1,11 @@
1
+ sudo: false
2
+ language: ruby
3
+ cache: bundler
4
+ rvm:
5
+ - 2.2.0
6
+ - 2.3.0
7
+ - 2.4.2
8
+ - 2.5.0
9
+ - jruby-9.0
10
+ before_install: gem install bundler -v 1.16.2
11
+
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at gerard@wetransfer.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source "https://rubygems.org"
2
+
3
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2018 WeTransfer
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2018 grdw
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,34 @@
1
+ # Format parser PDF
2
+
3
+ [![Build Status](https://travis-ci.com/WeTransfer/format_parser_pdf.svg?token=9FxKF7M1tP265s2qQFFJ&branch=master)](https://travis-ci.com/WeTransfer/format_parser_pdf)
4
+
5
+ An extension gem for [format_parser](https://github.com/WeTransfer/format_parser) that uses [pdf-reader](https://github.com/yob/pdf-reader) to detect page count in PDF files.
6
+
7
+ ## Installation
8
+
9
+ In your `Gemfile`, in addition to the `format_parser` gem add the `format_parser_pdf`:
10
+
11
+ ```ruby
12
+ source :rubygems
13
+
14
+ # ...
15
+ gem 'format_parser'
16
+ gem 'format_parser_pdf'
17
+ ```
18
+
19
+ If you only use `Bundler.setup` in your code and require gems manually, you need to explicitly `require` the library so that the built-in
20
+ PDF parser in the `format_parser` gem gets replaced with the extended version. If you use `Bundler.require` it will happen automatically.
21
+
22
+ ## Usage
23
+
24
+ Anywhere in your code use the standard `FormatParser.parse(io)` calls and related methods, for PDFs the extended version will be used.
25
+
26
+ ## Development
27
+
28
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
29
+
30
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
31
+
32
+ ## Contributing
33
+
34
+ Bug reports and pull requests are welcome on GitHub at https://github.com/WeTransfer/format_parser_pdf
@@ -0,0 +1,8 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+ require 'rubocop/rake_task'
4
+
5
+ RuboCop::RakeTask.new(:rubocop)
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ task default: [:spec, :rubocop]
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "format_parser_pdf"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,32 @@
1
+
2
+ lib = File.expand_path("../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require "format_parser_pdf/version"
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "format_parser_pdf"
8
+ spec.version = FormatParserPDF::VERSION
9
+ spec.authors = ["grdw"]
10
+ spec.email = ["gerard@wetransfer.com"]
11
+
12
+ spec.summary = %q{An adapter for format_parser to parse PDF files using pdf-reader.}
13
+ spec.description = %q{An adapter for format_parser to parse PDF files using pdf-reader. Replaces the standard PDF parser module.}
14
+ spec.homepage = "https://github.com/WeTransfer/format_parser_pdf"
15
+ spec.license = "MIT"
16
+
17
+ # Specify which files should be added to the gem when it is released.
18
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
19
+ spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
20
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
21
+ end
22
+ spec.bindir = "exe"
23
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
24
+ spec.require_paths = ["lib"]
25
+
26
+ spec.add_dependency 'pdf-reader', '~> 2.1'
27
+ spec.add_dependency 'format_parser', '~> 0', '< 1.0'
28
+ spec.add_development_dependency "bundler", "~> 1.16"
29
+ spec.add_development_dependency "rake", "~> 10.0"
30
+ spec.add_development_dependency "rspec", "~> 3.0"
31
+ spec.add_development_dependency "wetransfer_style", "0.6.0"
32
+ end
@@ -0,0 +1,43 @@
1
+ require 'pdf-reader'
2
+ require 'format_parser'
3
+ require 'format_parser_pdf/io_extension'
4
+ require 'format_parser_pdf/version'
5
+
6
+ module FormatParserPDF
7
+ class Parser
8
+ include FormatParser::IOUtils
9
+
10
+ PDF_MARKER = /%PDF-1\.[0-9]{1}/
11
+
12
+ def call(io)
13
+ io = FormatParser::IOConstraint.new(io)
14
+
15
+ return unless safe_read(io, 9) =~ PDF_MARKER
16
+ parse_using_pdf_reader(io)
17
+ end
18
+
19
+ def parse_using_pdf_reader(io)
20
+ pdf_reader = PDF::Reader.new(IOExtension.new(io))
21
+ page_count = pdf_reader.page_count
22
+ # We might want to recover more useful items (such as page dimensions -
23
+ # media box, trim box, bleed box, art box and a few dozens other boxes)
24
+ # later on. At the moment FormatParser::Document does not have `intrinsics' which
25
+ # is an omission we need to sort out on format_parser itself
26
+ # intrinsics = {
27
+ # pdf_version: pdf_reader.pdf_version,
28
+ # info: pdf_reader.info,
29
+ # metadata: pdf_reader.metadata,
30
+ # }
31
+ FormatParser::Document.new(format: :pdf, page_count: page_count) # , intrinsics: intrinsics)
32
+ rescue PDF::Reader::MalformedPDFError
33
+ nil
34
+ end
35
+ end
36
+
37
+ # FormatParser includes a builtin PDF parser that only checks for the PDF magic comment
38
+ # at the start of file. That one needs to be disabled first
39
+ FormatParser.deregister_parser(FormatParser::PDFParser)
40
+
41
+ # ...and replaced with ours
42
+ FormatParser.register_parser Parser, natures: :document, formats: :pdf
43
+ end
@@ -0,0 +1,43 @@
1
+ module FormatParserPDF
2
+ class Parser
3
+ class IOExtension < FormatParser::IOConstraint
4
+ def initialize(io)
5
+ @io = io
6
+ end
7
+
8
+ def seek(n, seek_mode = IO::SEEK_SET)
9
+ absolute_offset = case seek_mode
10
+ when IO::SEEK_SET
11
+ n
12
+ when IO::SEEK_CUR
13
+ @io.pos + n
14
+ when IO::SEEK_END
15
+ @io.size + n
16
+ else
17
+ raise Errno::EINVAL
18
+ end
19
+
20
+ if absolute_offset < 0
21
+ # Raise a special exception that FormatParser ignores - it will stop the parser and skip to the next one
22
+ msg = "Can only seek to positive absolute offsets (requested seek to #{absolute_offset})"
23
+ raise FormatParser::IOUtils::InvalidRead, msg
24
+ end
25
+ @io.seek(absolute_offset)
26
+ end
27
+
28
+ def rewind
29
+ @io.seek(0)
30
+ end
31
+
32
+ def readchar
33
+ @io.read(1)
34
+ end
35
+
36
+ def getbyte
37
+ if byte = @io.read(1)
38
+ byte.unpack('C').first
39
+ end
40
+ end
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,3 @@
1
+ module FormatParserPDF
2
+ VERSION = "0.1.0"
3
+ end
metadata ADDED
@@ -0,0 +1,151 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: format_parser_pdf
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - grdw
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2018-07-16 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: pdf-reader
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '2.1'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '2.1'
27
+ - !ruby/object:Gem::Dependency
28
+ name: format_parser
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ - - "<"
35
+ - !ruby/object:Gem::Version
36
+ version: '1.0'
37
+ type: :runtime
38
+ prerelease: false
39
+ version_requirements: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - "~>"
42
+ - !ruby/object:Gem::Version
43
+ version: '0'
44
+ - - "<"
45
+ - !ruby/object:Gem::Version
46
+ version: '1.0'
47
+ - !ruby/object:Gem::Dependency
48
+ name: bundler
49
+ requirement: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - "~>"
52
+ - !ruby/object:Gem::Version
53
+ version: '1.16'
54
+ type: :development
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - "~>"
59
+ - !ruby/object:Gem::Version
60
+ version: '1.16'
61
+ - !ruby/object:Gem::Dependency
62
+ name: rake
63
+ requirement: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - "~>"
66
+ - !ruby/object:Gem::Version
67
+ version: '10.0'
68
+ type: :development
69
+ prerelease: false
70
+ version_requirements: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - "~>"
73
+ - !ruby/object:Gem::Version
74
+ version: '10.0'
75
+ - !ruby/object:Gem::Dependency
76
+ name: rspec
77
+ requirement: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - "~>"
80
+ - !ruby/object:Gem::Version
81
+ version: '3.0'
82
+ type: :development
83
+ prerelease: false
84
+ version_requirements: !ruby/object:Gem::Requirement
85
+ requirements:
86
+ - - "~>"
87
+ - !ruby/object:Gem::Version
88
+ version: '3.0'
89
+ - !ruby/object:Gem::Dependency
90
+ name: wetransfer_style
91
+ requirement: !ruby/object:Gem::Requirement
92
+ requirements:
93
+ - - '='
94
+ - !ruby/object:Gem::Version
95
+ version: 0.6.0
96
+ type: :development
97
+ prerelease: false
98
+ version_requirements: !ruby/object:Gem::Requirement
99
+ requirements:
100
+ - - '='
101
+ - !ruby/object:Gem::Version
102
+ version: 0.6.0
103
+ description: An adapter for format_parser to parse PDF files using pdf-reader. Replaces
104
+ the standard PDF parser module.
105
+ email:
106
+ - gerard@wetransfer.com
107
+ executables: []
108
+ extensions: []
109
+ extra_rdoc_files: []
110
+ files:
111
+ - ".gitignore"
112
+ - ".rspec"
113
+ - ".rubocop.yml"
114
+ - ".travis.yml"
115
+ - CODE_OF_CONDUCT.md
116
+ - Gemfile
117
+ - LICENSE
118
+ - LICENSE.txt
119
+ - README.md
120
+ - Rakefile
121
+ - bin/console
122
+ - bin/setup
123
+ - format_parser_pdf.gemspec
124
+ - lib/format_parser_pdf.rb
125
+ - lib/format_parser_pdf/io_extension.rb
126
+ - lib/format_parser_pdf/version.rb
127
+ homepage: https://github.com/WeTransfer/format_parser_pdf
128
+ licenses:
129
+ - MIT
130
+ metadata: {}
131
+ post_install_message:
132
+ rdoc_options: []
133
+ require_paths:
134
+ - lib
135
+ required_ruby_version: !ruby/object:Gem::Requirement
136
+ requirements:
137
+ - - ">="
138
+ - !ruby/object:Gem::Version
139
+ version: '0'
140
+ required_rubygems_version: !ruby/object:Gem::Requirement
141
+ requirements:
142
+ - - ">="
143
+ - !ruby/object:Gem::Version
144
+ version: '0'
145
+ requirements: []
146
+ rubyforge_project:
147
+ rubygems_version: 2.7.3
148
+ signing_key:
149
+ specification_version: 4
150
+ summary: An adapter for format_parser to parse PDF files using pdf-reader.
151
+ test_files: []