pdf_scanner 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: dc0823c094b7fde9fecb5f4c9611d44dc6b5f02f3448a7d0946d82941bf8b293
4
+ data.tar.gz: e9939027ff27ae2e72b8ac99e4f5d967120144ec6c31f6056f5c5ce9838682ff
5
+ SHA512:
6
+ metadata.gz: 4d1d787b01125f6fd8821905f174b0d6a517cf485347f08276ec5e7edf49dc8acb5c6cab984710ca0278d5e752c895f159352feb6502958bfcad452142c0feb2
7
+ data.tar.gz: 58a07aaa802f72018c64ec0565d5a0f5164a767ff727a8ec2f362522f1a165331f332eda025a26d4085f1e12fa18c9b0f3889353f39cb7b7684b4c98dab4c22a
data/.gitignore ADDED
@@ -0,0 +1,8 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
data/.rubocop.yml ADDED
@@ -0,0 +1,13 @@
1
+ AllCops:
2
+ TargetRubyVersion: 2.4
3
+
4
+ Style/StringLiterals:
5
+ Enabled: true
6
+ EnforcedStyle: double_quotes
7
+
8
+ Style/StringLiteralsInInterpolation:
9
+ Enabled: true
10
+ EnforcedStyle: double_quotes
11
+
12
+ Layout/LineLength:
13
+ Max: 120
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ ## [Unreleased]
2
+
3
+ ## [0.1.0] - 2023-02-17
4
+
5
+ - Initial release
@@ -0,0 +1,84 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
6
+
7
+ We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
8
+
9
+ ## Our Standards
10
+
11
+ Examples of behavior that contributes to a positive environment for our community include:
12
+
13
+ * Demonstrating empathy and kindness toward other people
14
+ * Being respectful of differing opinions, viewpoints, and experiences
15
+ * Giving and gracefully accepting constructive feedback
16
+ * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
17
+ * Focusing on what is best not just for us as individuals, but for the overall community
18
+
19
+ Examples of unacceptable behavior include:
20
+
21
+ * The use of sexualized language or imagery, and sexual attention or
22
+ advances of any kind
23
+ * Trolling, insulting or derogatory comments, and personal or political attacks
24
+ * Public or private harassment
25
+ * Publishing others' private information, such as a physical or email
26
+ address, without their explicit permission
27
+ * Other conduct which could reasonably be considered inappropriate in a
28
+ professional setting
29
+
30
+ ## Enforcement Responsibilities
31
+
32
+ Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
33
+
34
+ Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
35
+
36
+ ## Scope
37
+
38
+ This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
39
+
40
+ ## Enforcement
41
+
42
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at shekhar@cardup.co. All complaints will be reviewed and investigated promptly and fairly.
43
+
44
+ All community leaders are obligated to respect the privacy and security of the reporter of any incident.
45
+
46
+ ## Enforcement Guidelines
47
+
48
+ Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
49
+
50
+ ### 1. Correction
51
+
52
+ **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
53
+
54
+ **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
55
+
56
+ ### 2. Warning
57
+
58
+ **Community Impact**: A violation through a single incident or series of actions.
59
+
60
+ **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
61
+
62
+ ### 3. Temporary Ban
63
+
64
+ **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior.
65
+
66
+ **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
67
+
68
+ ### 4. Permanent Ban
69
+
70
+ **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
71
+
72
+ **Consequence**: A permanent ban from any sort of public interaction within the community.
73
+
74
+ ## Attribution
75
+
76
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0,
77
+ available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
78
+
79
+ Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity).
80
+
81
+ [homepage]: https://www.contributor-covenant.org
82
+
83
+ For answers to common questions about this code of conduct, see the FAQ at
84
+ https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.
data/Gemfile ADDED
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
4
+
5
+ # Specify your gem's dependencies in pdf_scanner.gemspec
6
+ gemspec
7
+
8
+ gem "rake", "~> 13.0"
9
+
10
+ gem "rubocop", "~> 1.7"
data/Gemfile.lock ADDED
@@ -0,0 +1,46 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ pdf_scanner (0.1.0)
5
+ origami (~> 2.1.0)
6
+
7
+ GEM
8
+ remote: https://rubygems.org/
9
+ specs:
10
+ ast (2.4.2)
11
+ colorize (0.8.1)
12
+ json (2.6.3)
13
+ origami (2.1.0)
14
+ colorize (~> 0.7)
15
+ parallel (1.22.1)
16
+ parser (3.2.1.0)
17
+ ast (~> 2.4.1)
18
+ rainbow (3.1.1)
19
+ rake (13.0.6)
20
+ regexp_parser (2.7.0)
21
+ rexml (3.2.5)
22
+ rubocop (1.45.1)
23
+ json (~> 2.3)
24
+ parallel (~> 1.10)
25
+ parser (>= 3.2.0.0)
26
+ rainbow (>= 2.2.2, < 4.0)
27
+ regexp_parser (>= 1.8, < 3.0)
28
+ rexml (>= 3.2.5, < 4.0)
29
+ rubocop-ast (>= 1.24.1, < 2.0)
30
+ ruby-progressbar (~> 1.7)
31
+ unicode-display_width (>= 2.4.0, < 3.0)
32
+ rubocop-ast (1.26.0)
33
+ parser (>= 3.2.1.0)
34
+ ruby-progressbar (1.11.0)
35
+ unicode-display_width (2.4.2)
36
+
37
+ PLATFORMS
38
+ x86_64-linux
39
+
40
+ DEPENDENCIES
41
+ pdf_scanner!
42
+ rake (~> 13.0)
43
+ rubocop (~> 1.7)
44
+
45
+ BUNDLED WITH
46
+ 2.2.14
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2023 Shekhar Patil
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,43 @@
1
+ # PdfScanner
2
+
3
+ Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/pdf_scanner`. To experiment with that code, run `bin/console` for an interactive prompt.
4
+
5
+ TODO: Delete this and the text above, and describe your gem
6
+
7
+ ## Installation
8
+
9
+ Add this line to your application's Gemfile:
10
+
11
+ ```ruby
12
+ gem 'pdf_scanner'
13
+ ```
14
+
15
+ And then execute:
16
+
17
+ $ bundle install
18
+
19
+ Or install it yourself as:
20
+
21
+ $ gem install pdf_scanner
22
+
23
+ ## Usage
24
+
25
+ TODO: Write usage instructions here
26
+
27
+ ## Development
28
+
29
+ After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
30
+
31
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
32
+
33
+ ## Contributing
34
+
35
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/pdf_scanner. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/[USERNAME]/pdf_scanner/blob/master/CODE_OF_CONDUCT.md).
36
+
37
+ ## License
38
+
39
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
40
+
41
+ ## Code of Conduct
42
+
43
+ Everyone interacting in the PdfScanner project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/pdf_scanner/blob/master/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,8 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rubocop/rake_task"
5
+
6
+ RuboCop::RakeTask.new
7
+
8
+ task default: :rubocop
data/bin/console ADDED
@@ -0,0 +1,15 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "bundler/setup"
5
+ require "pdf_scanner"
6
+
7
+ # You can add fixtures and/or initialization code here to make experimenting
8
+ # with your gem easier. You can also use a different console, if you like.
9
+
10
+ # (If you use this, don't forget to add pry to your Gemfile!)
11
+ # require "pry"
12
+ # Pry.start
13
+
14
+ require "irb"
15
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,236 @@
1
+ ---
2
+ POLICY_NONE:
3
+
4
+ #
5
+ # General features.
6
+ #
7
+ allowParserErrors: true
8
+ allowAttachments: true
9
+ allowEncryption: true
10
+ allowFormCalc: true
11
+ allowJSAtOpening: true
12
+ allowJS: true
13
+ allowAcroForms: true
14
+ allowXFAForms: true
15
+
16
+ #
17
+ # Page annotations.
18
+ #
19
+ allowAnnotations: true
20
+ allow3DAnnotation: true
21
+ allowFileAttachmentAnnotation: true
22
+ allowMovieAnnotation: true
23
+ allowRichMediaAnnotation: true
24
+ allowScreenAnnotation: true
25
+ allowSoundAnnotation: true
26
+
27
+ #
28
+ # PDF Actions.
29
+ #
30
+ allowChainedActions: true
31
+ allowOpenAction: true
32
+ allowGoTo3DAction: true
33
+ allowGoToAction: true
34
+ allowGoToEAction: true
35
+ allowGoToRAction: true
36
+ allowImportDataAction: true
37
+ allowJSAction: true
38
+ allowLaunchAction: true
39
+ allowMovieAction: true
40
+ allowNamedAction: true
41
+ allowRenditionAction: true
42
+ allowRichMediaAction: true
43
+ allowSoundAction: true
44
+ allowSubmitFormAction: true
45
+ allowURIAction: true
46
+
47
+ #
48
+ # Stream filters.
49
+ #
50
+ allowASCII85Filter: true
51
+ allowASCIIHexFilter: true
52
+ allowCCITTFaxFilter: true
53
+ allowCryptFilter: true
54
+ allowDCTFilter: true
55
+ allowFlateFilter: true
56
+ allowJBIG2Filter: true
57
+ allowJPXFilter: true
58
+ allowLZWFilter: true
59
+ allowRunLengthFilter: true
60
+
61
+ POLICY_STANDARD:
62
+
63
+ #
64
+ # General features.
65
+ #
66
+ allowParserErrors: false
67
+ allowAttachments: false
68
+ allowAcroForms: true
69
+ allowEncryption: true
70
+ allowFormCalc: true
71
+ allowJS: true
72
+ allowJSAtOpening: false
73
+ allowXFAForms: true
74
+
75
+ #
76
+ # Page annotations.
77
+ #
78
+ allowAnnotations: true
79
+ allow3DAnnotation: false
80
+ allowFileAttachmentAnnotation: false
81
+ allowMovieAnnotation: false
82
+ allowRichMediaAnnotation: false
83
+ allowScreenAnnotation: false
84
+ allowSoundAnnotation: false
85
+
86
+ #
87
+ # PDF Actions.
88
+ #
89
+ allowChainedActions: true
90
+ allowOpenAction: true
91
+ allowGoTo3DAction: false
92
+ allowGoToAction: true
93
+ allowGoToEAction: false
94
+ allowGoToRAction: false
95
+ allowImportDataAction: false
96
+ allowJSAction: true
97
+ allowLaunchAction: false
98
+ allowMovieAction: false
99
+ allowNamedAction: false
100
+ allowRenditionAction: false
101
+ allowRichMediaAction: false
102
+ allowSoundAction: false
103
+ allowSubmitFormAction: true
104
+ allowURIAction: true
105
+
106
+ #
107
+ # Stream filters.
108
+ #
109
+ allowASCII85Filter: false
110
+ allowASCIIHexFilter: false
111
+ allowCCITTFaxFilter: true
112
+ allowCryptFilter: true
113
+ allowDCTFilter: true
114
+ allowFlateFilter: true
115
+ allowJBIG2Filter: false
116
+ allowJPXFilter: false
117
+ allowLZWFilter: false
118
+ allowRunLengthFilter: false
119
+
120
+ POLICY_STRONG:
121
+
122
+ #
123
+ # General features.
124
+ #
125
+ allowParserErrors: false
126
+ allowAttachments: false
127
+ allowAcroForms: false
128
+ allowEncryption: true
129
+ allowFormCalc: true
130
+ allowJS: false
131
+ allowJSAtOpening: false
132
+ allowXFAForms: false
133
+
134
+ #
135
+ # Page annotations.
136
+ #
137
+ allowAnnotations: true
138
+ allow3DAnnotation: false
139
+ allowFileAttachmentAnnotation: false
140
+ allowMovieAnnotation: false
141
+ allowRichMediaAnnotation: false
142
+ allowScreenAnnotation: false
143
+ allowSoundAnnotation: false
144
+
145
+ #
146
+ # PDF Actions.
147
+ #
148
+ allowChainedActions: false
149
+ allowOpenAction: true
150
+ allowGoTo3DAction: false
151
+ allowGoToAction: true
152
+ allowGoToEAction: false
153
+ allowGoToRAction: false
154
+ allowImportDataAction: false
155
+ allowJSAction: false
156
+ allowLaunchAction: false
157
+ allowMovieAction: false
158
+ allowNamedAction: false
159
+ allowRenditionAction: false
160
+ allowRichMediaAction: false
161
+ allowSoundAction: false
162
+ allowSubmitFormAction: false
163
+ allowURIAction: true
164
+
165
+ #
166
+ # Stream filters.
167
+ #
168
+ allowASCII85Filter: false
169
+ allowASCIIHexFilter: false
170
+ allowCCITTFaxFilter: false
171
+ allowCryptFilter: true
172
+ allowDCTFilter: true
173
+ allowFlateFilter: true
174
+ allowJBIG2Filter: false
175
+ allowJPXFilter: false
176
+ allowLZWFilter: false
177
+ allowRunLengthFilter: false
178
+
179
+ POLICY_PARANOID:
180
+
181
+ #
182
+ # General features.
183
+ #
184
+ allowParserErrors: false
185
+ allowAttachments: false
186
+ allowAcroForms: false
187
+ allowEncryption: false
188
+ allowFormCalc: false
189
+ allowJS: false
190
+ allowJSAtOpening: false
191
+ allowXFAForms: false
192
+
193
+ #
194
+ # Page annotations.
195
+ #
196
+ allowAnnotations: true
197
+ allow3DAnnotation: false
198
+ allowFileAttachmentAnnotation: false
199
+ allowMovieAnnotation: false
200
+ allowRichMediaAnnotation: false
201
+ allowScreenAnnotation: false
202
+ allowSoundAnnotation: false
203
+
204
+ #
205
+ # PDF Actions.
206
+ #
207
+ allowChainedActions: false
208
+ allowOpenAction: false
209
+ allowGoTo3DAction: false
210
+ allowGoToAction: true
211
+ allowGoToEAction: false
212
+ allowGoToRAction: false
213
+ allowImportDataAction: false
214
+ allowJSAction: false
215
+ allowLaunchAction: false
216
+ allowMovieAction: false
217
+ allowNamedAction: false
218
+ allowRenditionAction: false
219
+ allowRichMediaAction: false
220
+ allowSoundAction: false
221
+ allowSubmitFormAction: false
222
+ allowURIAction: false
223
+
224
+ #
225
+ # Stream filters.
226
+ #
227
+ allowASCII85Filter: false
228
+ allowASCIIHexFilter: false
229
+ allowCCITTFaxFilter: false
230
+ allowCryptFilter: false
231
+ allowDCTFilter: true
232
+ allowFlateFilter: true
233
+ allowJBIG2Filter: false
234
+ allowJPXFilter: false
235
+ allowLZWFilter: false
236
+ allowRunLengthFilter: false
@@ -0,0 +1,334 @@
1
+ require 'optparse'
2
+ require 'yaml'
3
+ require 'rexml/document'
4
+ require 'digest/sha2'
5
+ require 'fileutils'
6
+ require 'origami'
7
+
8
+ module PdfScanner
9
+ class Scanner
10
+ DEFAULT_CONFIG_FILE = "#{File.dirname(__FILE__)}/config/pdfcop.conf.yml"
11
+ DEFAULT_POLICY = "standard"
12
+ SECURITY_POLICIES = {}
13
+ ANNOTATION_RIGHTS = {
14
+ FileAttachment: %i[allowAttachments allowFileAttachmentAnnotation],
15
+ Sound: %i[allowSoundAnnotation],
16
+ Movie: %i[allowMovieAnnotation],
17
+ Screen: %i[allowScreenAnnotation],
18
+ Widget: %i[allowAcroforms],
19
+ RichMedia: %i[allowRichMediaAnnotation],
20
+ :"3D" => %i[allow3DAnnotation]
21
+ }
22
+
23
+ def initialize(params = {})
24
+ @options = {}
25
+ @options[:output_log] = params[:output_log] if params[:output_log].present?
26
+ @options[:target_file] = params[:target_file] if params[:target_file].present?
27
+ @options[:config_file] = params[:config_file] if params[:config_file].present?
28
+ @options[:policy] = params[:policy] if params[:policy].present?
29
+ @options[:move_dir] = params[:dir] if params[:dir].present?
30
+ @options[:password] = params[:passwd] if params[:passwd].present? # for encrypted file
31
+ @errors = { rejected_policies: [], analysis_failure: [] }
32
+ end
33
+
34
+ def scan
35
+ begin
36
+ @errors = { rejected_policies: [], analysis_failure: [] }
37
+ if !@options.key?(:policy)
38
+ @options[:policy] = DEFAULT_POLICY
39
+ end
40
+
41
+ if @options.key?(:move_dir) and !File.directory?(@options[:move_dir])
42
+ abort "Error: #{@options[:move_dir]} is not a valid directory."
43
+ end
44
+
45
+ load_config_file(@options[:config_file] || DEFAULT_CONFIG_FILE)
46
+
47
+ unless SECURITY_POLICIES.key?("POLICY_#{@options[:policy].upcase}")
48
+ return "Undeclared policy `#{@options[:policy]}'"
49
+ end
50
+
51
+ @pdf = Origami::PDF.read(@options[:target_file],
52
+ verbosity: Origami::Parser::VERBOSE_QUIET,
53
+ ignore_errors: SECURITY_POLICIES["POLICY_#{@options[:policy].upcase}"]['allowParserErrors'],
54
+ decrypt: SECURITY_POLICIES["POLICY_#{@options[:policy].upcase}"]['allowEncryption'],
55
+ prompt_password: lambda { '' },
56
+ password: @options[:password]
57
+ )
58
+
59
+ if @pdf.encrypted?
60
+ check_rights(:allowEncryption)
61
+ end
62
+
63
+ catalog = @pdf.Catalog
64
+ reject("Invalid document catalog") unless catalog.is_a?(Origami::Catalog)
65
+
66
+ if catalog.key?(:OpenAction)
67
+ check_rights(:allowOpenAction)
68
+ action = catalog.OpenAction
69
+ analyze_action(action, true, 1)
70
+ end
71
+
72
+ if catalog.key?(:AA)
73
+ if catalog.AA.is_a?(Origami::Dictionary)
74
+ aa = Origami::CatalogAdditionalActions.new(catalog.AA); aa.parent = catalog;
75
+ analyze_action(aa.WC, false, 1) if aa.key?(:WC)
76
+ analyze_action(aa.WS, false, 1) if aa.key?(:WS)
77
+ analyze_action(aa.DS, false, 1) if aa.key?(:DS)
78
+ analyze_action(aa.WP, false, 1) if aa.key?(:WP)
79
+ analyze_action(aa.DP, false, 1) if aa.key?(:DP)
80
+ end
81
+ end
82
+
83
+ if catalog.key?(:AcroForm)
84
+ acroform = catalog.AcroForm
85
+ if acroform.is_a?(Origami::Dictionary)
86
+ check_rights(:allowAcroForms)
87
+ if acroform.key?(:XFA)
88
+ check_rights(:allowXFAForms)
89
+
90
+ analyze_xfa_forms(acroform[:XFA].solve)
91
+ end
92
+ end
93
+ end
94
+
95
+ if @pdf.each_named_script.any?
96
+ check_rights(:allowJS)
97
+ check_rights(:allowJSAtOpening)
98
+ end
99
+
100
+ if @pdf.each_attachment.any?
101
+ check_rights(:allowAttachments)
102
+ end
103
+
104
+ @pdf.each_page do |page|
105
+ analyze_page(page, 1)
106
+ end
107
+
108
+ @pdf.each_object.select{|obj| obj.is_a?(Origami::Stream)}.each do |stream|
109
+ if stream.dictionary.key?(:Filter)
110
+ filters = stream.Filter
111
+ filters = [ filters ] if filters.is_a?(Origami::Name)
112
+
113
+ if filters.is_a?(Origami::Array)
114
+ filters.each do |filter|
115
+ case filter.value
116
+ when :ASCIIHexDecode
117
+ check_rights(:allowASCIIHexFilter)
118
+ when :ASCII85Decode
119
+ check_rights(:allowASCII85Filter)
120
+ when :LZWDecode
121
+ check_rights(:allowLZWFilter)
122
+ when :FlateDecode
123
+ check_rights(:allowFlateDecode)
124
+ when :RunLengthDecode
125
+ check_rights(:allowRunLengthFilter)
126
+ when :CCITTFaxDecode
127
+ check_rights(:allowCCITTFaxFilter)
128
+ when :JBIG2Decode
129
+ check_rights(:allowJBIG2Filter)
130
+ when :DCTDecode
131
+ check_rights(:allowDCTFilter)
132
+ when :JPXDecode
133
+ check_rights(:allowJPXFilter)
134
+ when :Crypt
135
+ check_rights(:allowCryptFilter)
136
+ end
137
+ end
138
+ end
139
+ end
140
+ end
141
+ rescue StandardError => ex
142
+ reject("Analysis failure", ex)
143
+ end
144
+
145
+ @errors
146
+ end
147
+
148
+ def load_config_file(path)
149
+ SECURITY_POLICIES.update(Hash.new(false).update YAML.load(File.read(path)))
150
+ end
151
+
152
+ def reject(cause, exception_message = nil)
153
+ if @options.key?(:move_dir)
154
+ quarantine(@options[:target_file], @options[:move_dir])
155
+ end
156
+
157
+ if exception_message.nil?
158
+ @errors[:rejected_policies] << { policy: @options[:policy], message: cause.inspect }
159
+ else
160
+ @errors[:analysis_failure] << { error: exception_message.to_s, message: cause.inspect }
161
+ end
162
+ end
163
+
164
+ def quarantine(file, quarantine_folder)
165
+ digest = Digest::SHA256.file(file)
166
+ ext = File.extname(file)
167
+ dest_name = "#{File.basename(file, ext)}_#{digest}#{ext}"
168
+ dest_path = File.join(quarantine_folder, dest_name)
169
+
170
+ FileUtils.move(file, dest_path)
171
+ end
172
+
173
+ def check_rights(*required_rights)
174
+ current_rights = SECURITY_POLICIES["POLICY_#{@options[:policy].upcase}"]
175
+
176
+ reject(required_rights) if required_rights.any?{|right| current_rights[right.to_s] == false}
177
+ end
178
+
179
+ def analyze_xfa_forms(xfa)
180
+ case xfa
181
+ when Origami::Array then
182
+ xml = ""
183
+ i = 0
184
+ xfa.each do |packet|
185
+ if i % 2 == 1
186
+ xml << packet.solve.data
187
+ end
188
+
189
+ i = i + 1
190
+ end
191
+ when Origami::Stream then
192
+ xml = xfa.data
193
+ else
194
+ reject("Malformed XFA dictionary")
195
+ end
196
+
197
+ xfadoc = REXML::Document.new(xml)
198
+ REXML::XPath.match(xfadoc, "//script").each do |script|
199
+ case script.attributes["contentType"]
200
+ when "application/x-formcalc" then
201
+ check_rights(:allowFormCalc)
202
+ else
203
+ check_rights(:allowJS)
204
+ end
205
+ end
206
+ end
207
+
208
+ def check_annotation_rights(annot)
209
+ subtype = annot.Subtype.value
210
+
211
+ check_rights(*ANNOTATION_RIGHTS[subtype]) if ANNOTATION_RIGHTS.include?(subtype)
212
+ end
213
+
214
+ def analyze_annotation(annot, _level = 0)
215
+ check_rights(:allowAnnotations)
216
+
217
+ if annot.is_a?(Origami::Dictionary) and annot.key?(:Subtype)
218
+ check_annotation_rights(annot)
219
+
220
+ analyze_3d_annotation(annot) if annot.Subtype.value == :"3D"
221
+ end
222
+ end
223
+
224
+ def analyze_3d_annotation(annot)
225
+ # 3D annotation might pull in JavaScript for real-time driven behavior.
226
+ return unless annot.key?(:"3DD")
227
+
228
+ dd = annot[:"3DD"].solve
229
+ u3dstream = nil
230
+
231
+ case dd
232
+ when Origami::Stream
233
+ u3dstream = dd
234
+ when Origami::Dictionary
235
+ u3dstream = dd[:"3DD"].solve
236
+ end
237
+
238
+ if u3dstream.is_a?(Stream) and u3dstream.key?(:OnInstantiate)
239
+ check_rights(:allowJS)
240
+
241
+ if annot.key?(:"3DA") # is 3d view instantiated automatically?
242
+ u3dactiv = annot[:"3DA"].solve
243
+
244
+ check_rights(:allowJSAtOpening) if u3dactiv.is_a?(Origami::Dictionary) and (u3dactiv.A == :PO or u3dactiv.A == :PV)
245
+ end
246
+ end
247
+ end
248
+
249
+ def analyze_page(page, level = 0)
250
+ if page.is_a?(Origami::Dictionary)
251
+ #
252
+ # Checking page additional actions.
253
+ #
254
+ if page.key?(:AA)
255
+ if page.AA.is_a?(Origami::Dictionary)
256
+
257
+ aa = Origami::Page::AdditionalActions.new(page.AA); aa.parent = page.AA.parent
258
+ analyze_action(aa.O, true, level + 1) if aa.key?(:O)
259
+ analyze_action(aa.C, false, level + 1) if aa.key?(:C)
260
+ end
261
+ end
262
+
263
+ #
264
+ # Looking for page annotations.
265
+ #
266
+ page.each_annotation do |annot|
267
+ analyze_annotation(annot, level + 1)
268
+ end
269
+ end
270
+ end
271
+
272
+ def analyze_action(action, triggered_at_opening, level = 0)
273
+ if action.is_a?(Origami::Dictionary)
274
+ type = action[:S].is_a?(Origami::Reference) ? action[:S].solve : action[:S]
275
+
276
+ case type.value
277
+ when :JavaScript
278
+ check_rights(:allowJS)
279
+ check_rights(:allowJSAtOpening) if triggered_at_opening
280
+ when :Launch
281
+ check_rights(:allowLaunchAction)
282
+ when :Named
283
+ check_rights(:allowNamedAction)
284
+ when :GoTo
285
+ check_rights(:allowGoToAction)
286
+ dest = action[:D].is_a?(Origami::Reference) ? action[:D].solve : action[:D]
287
+ if dest.is_a?(Origami::Array) and dest.length > 0 and dest.first.is_a?(Origami::Reference)
288
+ dest_page = dest.first.solve
289
+ if dest_page.is_a?(Origami::Page)
290
+ analyze_page(dest_page, level + 1)
291
+ end
292
+ end
293
+ when :GoToE
294
+ check_rights(:allowAttachments,:allowGoToEAction)
295
+ when :GoToR
296
+ check_rights(:allowGoToRAction)
297
+ when :Thread
298
+ check_rights(:allowGoToRAction) if action.key?(:F)
299
+ when :URI
300
+ check_rights(:allowURIAction)
301
+ when :SubmitForm
302
+ check_rights(:allowAcroForms,:allowSubmitFormAction)
303
+ when :ImportData
304
+ check_rights(:allowAcroForms,:allowImportDataAction)
305
+ when :Rendition
306
+ check_rights(:allowScreenAnnotation,:allowRenditionAction)
307
+ when :Sound
308
+ check_rights(:allowSoundAnnotation,:allowSoundAction)
309
+ when :Movie
310
+ check_rights(:allowMovieAnnotation,:allowMovieAction)
311
+ when :RichMediaExecute
312
+ check_rights(:allowRichMediaAnnotation,:allowRichMediaAction)
313
+ when :GoTo3DView
314
+ check_rights(:allow3DAnnotation,:allowGoTo3DAction)
315
+ end
316
+
317
+ if action.key?(:Next)
318
+ check_rights(:allowChainedActions)
319
+ analyze_action(action.Next)
320
+ end
321
+
322
+ elsif action.is_a?(Origami::Array)
323
+ dest = action
324
+ if dest.length > 0 and dest.first.is_a?(Origami::Reference)
325
+ dest_page = dest.first.solve
326
+ if dest_page.is_a?(Origami::Page)
327
+ check_rights(:allowGoToAction)
328
+ analyze_page(dest_page, level + 1)
329
+ end
330
+ end
331
+ end
332
+ end
333
+ end
334
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PdfScanner
4
+ VERSION = "0.1.0"
5
+ end
@@ -0,0 +1,8 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "pdf_scanner/version"
4
+ require "pdf_scanner/scanner"
5
+ module PdfScanner
6
+ class Error < StandardError; end
7
+ # Your code goes here...
8
+ end
@@ -0,0 +1,39 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "lib/pdf_scanner/version"
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "pdf_scanner"
7
+ spec.version = PdfScanner::VERSION
8
+ spec.authors = ["shekhar-patil"]
9
+ spec.email = ["patilshekhar900@gmail.com"]
10
+
11
+ spec.summary = "Write a short summary, because RubyGems requires one."
12
+ spec.description = "Write a longer description or delete this line."
13
+ spec.homepage = "https://github.com/shekhar-patil/pdf_scanner"
14
+ spec.license = "MIT"
15
+ # spec.required_ruby_version = Gem::Requirement.new(">= 2.4.0")
16
+
17
+ spec.metadata["allowed_push_host"] = "https://rubygems.org"
18
+
19
+ spec.metadata["homepage_uri"] = spec.homepage
20
+ spec.metadata["source_code_uri"] = "https://github.com/shekhar-patil/pdf_scanner"
21
+ spec.metadata["changelog_uri"] = "https://github.com/shekhar-patil/pdf_scanner"
22
+
23
+ # Specify which files should be added to the gem when it is released.
24
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
25
+ spec.files = Dir.chdir(File.expand_path(__dir__)) do
26
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{\A(?:test|spec|features)/}) }
27
+ end
28
+ spec.bindir = "exe"
29
+ spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
30
+ spec.require_paths = ["lib"]
31
+ spec.required_ruby_version = '>= 2.1'
32
+ spec.add_runtime_dependency "origami", "~> 2.1.0"
33
+
34
+ # Uncomment to register a new dependency of your gem
35
+ # spec.add_dependency "example-gem", "~> 1.0"
36
+
37
+ # For more information and examples about making a new gem, checkout our
38
+ # guide at: https://bundler.io/guides/creating_gem.html
39
+ end
metadata ADDED
@@ -0,0 +1,77 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: pdf_scanner
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - shekhar-patil
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2023-02-20 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: origami
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: 2.1.0
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: 2.1.0
27
+ description: Write a longer description or delete this line.
28
+ email:
29
+ - patilshekhar900@gmail.com
30
+ executables: []
31
+ extensions: []
32
+ extra_rdoc_files: []
33
+ files:
34
+ - ".gitignore"
35
+ - ".rubocop.yml"
36
+ - CHANGELOG.md
37
+ - CODE_OF_CONDUCT.md
38
+ - Gemfile
39
+ - Gemfile.lock
40
+ - LICENSE.txt
41
+ - README.md
42
+ - Rakefile
43
+ - bin/console
44
+ - bin/setup
45
+ - lib/pdf_scanner.rb
46
+ - lib/pdf_scanner/config/pdfcop.conf.yml
47
+ - lib/pdf_scanner/scanner.rb
48
+ - lib/pdf_scanner/version.rb
49
+ - pdf_scanner.gemspec
50
+ homepage: https://github.com/shekhar-patil/pdf_scanner
51
+ licenses:
52
+ - MIT
53
+ metadata:
54
+ allowed_push_host: https://rubygems.org
55
+ homepage_uri: https://github.com/shekhar-patil/pdf_scanner
56
+ source_code_uri: https://github.com/shekhar-patil/pdf_scanner
57
+ changelog_uri: https://github.com/shekhar-patil/pdf_scanner
58
+ post_install_message:
59
+ rdoc_options: []
60
+ require_paths:
61
+ - lib
62
+ required_ruby_version: !ruby/object:Gem::Requirement
63
+ requirements:
64
+ - - ">="
65
+ - !ruby/object:Gem::Version
66
+ version: '2.1'
67
+ required_rubygems_version: !ruby/object:Gem::Requirement
68
+ requirements:
69
+ - - ">="
70
+ - !ruby/object:Gem::Version
71
+ version: '0'
72
+ requirements: []
73
+ rubygems_version: 3.0.8
74
+ signing_key:
75
+ specification_version: 4
76
+ summary: Write a short summary, because RubyGems requires one.
77
+ test_files: []