pdfmd 1.9.1 → 2.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +22 -2
- data/README.md +2 -2
- data/TODO.mkd +26 -0
- data/bin/pdfmd +267 -1
- data/lib/pdfmd.rb +242 -634
- data/lib/pdfmd/explain.hiera.md +25 -4
- data/lib/pdfmd/long_desc.pdfmdconfig.txt +40 -0
- data/lib/pdfmd/long_desc.pdfmdedit.txt +166 -0
- data/lib/pdfmd/long_desc.pdfmdexplain.txt +16 -0
- data/lib/pdfmd/long_desc.pdfmdrename.txt +206 -0
- data/lib/pdfmd/long_desc.pdfmdshow.txt +92 -0
- data/lib/pdfmd/long_desc.pdfmdsort.txt +111 -0
- data/lib/pdfmd/long_desc.pdfmdstat.txt +23 -0
- data/lib/pdfmd/pdfmdconfig.rb +30 -0
- data/lib/pdfmd/pdfmdedit.rb +201 -0
- data/lib/pdfmd/pdfmdmethods.rb +125 -0
- data/lib/pdfmd/pdfmdrename.rb +243 -0
- data/lib/pdfmd/pdfmdshow.rb +88 -0
- data/lib/pdfmd/pdfmdsort.rb +115 -0
- data/lib/pdfmd/pdfmdstat.rb +117 -0
- data/lib/{string_extend.rb → pdfmd/string_extend.rb} +0 -0
- data/lib/run.rb +235 -0
- data/pdfmd.gemspec +3 -2
- metadata +23 -11
- data/lib/pdfmd/check.rb +0 -10
- data/lib/pdfmd/config.rb +0 -59
- data/lib/pdfmd/edit.rb +0 -144
- data/lib/pdfmd/rename.rb +0 -295
- data/lib/pdfmd/show.rb +0 -164
- data/lib/pdfmd/sort.rb +0 -199
@@ -0,0 +1,92 @@
|
|
1
|
+
== General
|
2
|
+
|
3
|
+
Show metatags of a PDF document.
|
4
|
+
|
5
|
+
The following tags are being shown:
|
6
|
+
|
7
|
+
* Author
|
8
|
+
* CreateDate
|
9
|
+
* Title
|
10
|
+
* Subject
|
11
|
+
* Keywords
|
12
|
+
|
13
|
+
== Parameter
|
14
|
+
|
15
|
+
--all, -a
|
16
|
+
|
17
|
+
Show all relevant metatags for a document.
|
18
|
+
|
19
|
+
Relevant tags are: Author, CreateDate, Title, Subject, Keywords.
|
20
|
+
|
21
|
+
This is the default action.
|
22
|
+
|
23
|
+
|
24
|
+
|
25
|
+
--tag, -t
|
26
|
+
|
27
|
+
Specify the metatag to show. The selected metatag must be one of the relevant tags. Other tags are ignored and nothing is returned.
|
28
|
+
|
29
|
+
The value for the parameter is case insensitive: 'Author' == 'author'
|
30
|
+
|
31
|
+
Multiple Tags can be specificed, separated by a comma.
|
32
|
+
|
33
|
+
If multiple tags are specified in a different order than the default order, the specified order will be used. This has an impact on the order of the fields when e.g. the output is exported in CSV format.
|
34
|
+
|
35
|
+
|
36
|
+
|
37
|
+
--format, -f
|
38
|
+
|
39
|
+
Specify a different output format. Default: yaml
|
40
|
+
|
41
|
+
Available formats are: json,yaml,csv,hash
|
42
|
+
|
43
|
+
|
44
|
+
|
45
|
+
--includepdf, -i
|
46
|
+
|
47
|
+
Include the filename of the PDF document in the output if this option is set to true. Default: false
|
48
|
+
|
49
|
+
|
50
|
+
|
51
|
+
--log, -l
|
52
|
+
|
53
|
+
Enable/Disable logging. Default: true
|
54
|
+
|
55
|
+
|
56
|
+
|
57
|
+
--logfile, -p
|
58
|
+
|
59
|
+
Specify path to logfile. Default: `./.pdfmd.log`
|
60
|
+
|
61
|
+
|
62
|
+
|
63
|
+
== Example
|
64
|
+
|
65
|
+
# Show default metatags for a pdf document
|
66
|
+
|
67
|
+
$ pdfmd show <filename>
|
68
|
+
|
69
|
+
# Show default metatags for example.pdf
|
70
|
+
|
71
|
+
$ pdfmd show example.pdf
|
72
|
+
|
73
|
+
# Show value for metatag 'Author' for the file example.pdf
|
74
|
+
|
75
|
+
$ pdfmd show -t author example.pdf
|
76
|
+
|
77
|
+
# Show value for metatags 'Author','Title' for the file example.pdf
|
78
|
+
|
79
|
+
$ pdfmd show -t author,title example.pdf
|
80
|
+
|
81
|
+
|
82
|
+
|
83
|
+
== Hiera
|
84
|
+
|
85
|
+
--- # YAML
|
86
|
+
pdfmd::config
|
87
|
+
show:
|
88
|
+
format : yaml|json|csv|hash
|
89
|
+
tag : author,subject,createdate,title,keywords
|
90
|
+
includepdf: true|false
|
91
|
+
log : true|false
|
92
|
+
|
@@ -0,0 +1,111 @@
|
|
1
|
+
== General
|
2
|
+
|
3
|
+
Will sort pdf documents into subdirectories according to the value of the meta-tag 'author'.
|
4
|
+
|
5
|
+
If a document does not have an entry in the meta tag 'author', the file will not be processed.
|
6
|
+
|
7
|
+
=== Parameter
|
8
|
+
|
9
|
+
--destination, -d
|
10
|
+
|
11
|
+
Speficy the root output directory to where the folderstructure is being created.
|
12
|
+
|
13
|
+
This parameter is required if hiera is not configured.
|
14
|
+
|
15
|
+
The command line parameter overwrites the hiera defaults
|
16
|
+
|
17
|
+
Default: current working directory.
|
18
|
+
|
19
|
+
|
20
|
+
|
21
|
+
--dryrun, -n
|
22
|
+
|
23
|
+
If set to true the command will perform all actions as usual but without actually doing anything. Logentries will be prefaced with 'DRYRUN: ' for all simulated actions. Default: false
|
24
|
+
|
25
|
+
|
26
|
+
|
27
|
+
--copy, -c
|
28
|
+
|
29
|
+
Copy the files instead of moving them. Default: false
|
30
|
+
|
31
|
+
|
32
|
+
|
33
|
+
--log, -l
|
34
|
+
|
35
|
+
Disable/Enable the logging. Default: true
|
36
|
+
|
37
|
+
|
38
|
+
|
39
|
+
--logfile, -p
|
40
|
+
|
41
|
+
Set an alternate path for the logfile. If no path is chosen, the logfile gets created in the current working directory as `.pdfmd.log`.
|
42
|
+
|
43
|
+
|
44
|
+
|
45
|
+
--interactive, -i
|
46
|
+
|
47
|
+
Disable/Enable interactive sorting. This will ask for confirmation for each sorting action. Default: false
|
48
|
+
|
49
|
+
|
50
|
+
|
51
|
+
--overwrite, -o
|
52
|
+
|
53
|
+
If set to 'true' the command will overwrite any existing file at the target destination with the same name without asking. Default: false
|
54
|
+
|
55
|
+
|
56
|
+
|
57
|
+
=== Replacement rules
|
58
|
+
|
59
|
+
The subdirectories for the documents are generated from the values in the
|
60
|
+
tag 'author' of each document.
|
61
|
+
|
62
|
+
In order to ensure a clean directory structure, there are certain rules
|
63
|
+
for altering the values.
|
64
|
+
|
65
|
+
1. Whitespaces are replaced by underscores.
|
66
|
+
|
67
|
+
2. Dots are replaced by underscores.
|
68
|
+
|
69
|
+
3. All letters are converted to their lowercase version.
|
70
|
+
|
71
|
+
4. Special characters are serialized.
|
72
|
+
|
73
|
+
|
74
|
+
|
75
|
+
=== Hiera
|
76
|
+
|
77
|
+
Set the default values mentioned below as sub-hash of the main configuration:
|
78
|
+
|
79
|
+
--- #YAML
|
80
|
+
pdfmd::config:
|
81
|
+
sort:
|
82
|
+
copy : true|false
|
83
|
+
destination : /tmp
|
84
|
+
dryrun : true|false
|
85
|
+
interactive : true|false
|
86
|
+
log : true|false
|
87
|
+
logfile : /var/log/pdfmd.log
|
88
|
+
overwrite : true|false
|
89
|
+
|
90
|
+
See the README file for an example how to define the values in Hiera or run `pdfmd explain hiera`.
|
91
|
+
|
92
|
+
|
93
|
+
|
94
|
+
=== Example
|
95
|
+
|
96
|
+
This command does the following:
|
97
|
+
|
98
|
+
1. Take all pdf documents in the subdirectory ./documents.
|
99
|
+
|
100
|
+
2. Create the output folder structure in `/tmp/test/`.
|
101
|
+
|
102
|
+
3. Copy the files instead of moving them.
|
103
|
+
|
104
|
+
4. Disable the logging.
|
105
|
+
|
106
|
+
$ pdfmd sort -d /tmp/test -c -l false ./documents
|
107
|
+
|
108
|
+
# Sort only a single file
|
109
|
+
|
110
|
+
$ pdfmd sort -d /tmp/test -c -l false ./documents/test.pdf
|
111
|
+
|
@@ -0,0 +1,23 @@
|
|
1
|
+
Show statistics about the metadata of the PDF documents in a directory.
|
2
|
+
|
3
|
+
== Usage
|
4
|
+
|
5
|
+
Example: `pdfmd stat <directory>`
|
6
|
+
|
7
|
+
|
8
|
+
== Parameter
|
9
|
+
|
10
|
+
[<directory>]
|
11
|
+
|
12
|
+
Path to the directory containing PDF documents or subdirectories with PDF documents.
|
13
|
+
|
14
|
+
Example: `pdfmd stat ~/pdf`
|
15
|
+
|
16
|
+
|
17
|
+
--r --recursive
|
18
|
+
|
19
|
+
If set to true, pdfmd includes all PDF documents found in subdirectories of <directory> as well.
|
20
|
+
|
21
|
+
Default: false
|
22
|
+
|
23
|
+
|
@@ -0,0 +1,30 @@
|
|
1
|
+
# == Class: Pdfmdconfig
|
2
|
+
#
|
3
|
+
# Show current default configuration of pdfmd
|
4
|
+
#
|
5
|
+
class Pdfmdconfig < Pdfmd
|
6
|
+
|
7
|
+
require 'yaml'
|
8
|
+
|
9
|
+
def initialize(filename)
|
10
|
+
super(filename)
|
11
|
+
@filename = filename
|
12
|
+
end
|
13
|
+
|
14
|
+
def show_config(key = '')
|
15
|
+
|
16
|
+
if key.empty?
|
17
|
+
self.log('debug','Showing current configuration in yaml format.')
|
18
|
+
@hieradata.to_yaml
|
19
|
+
elsif @hieradata.has_key?(key)
|
20
|
+
self.log('debug',"Showing current configuration in yaml format, section: #{key}.")
|
21
|
+
@hieradata[key].to_yaml
|
22
|
+
else
|
23
|
+
self.log('error',"Unknown Hiera Key used: '#{key}'.")
|
24
|
+
puts 'Unknown hiera key. Abort.'
|
25
|
+
abort
|
26
|
+
end
|
27
|
+
|
28
|
+
end
|
29
|
+
|
30
|
+
end
|
@@ -0,0 +1,201 @@
|
|
1
|
+
# == Class: pdfmdedit
|
2
|
+
#
|
3
|
+
# Edit Metadata of PDF documentsc
|
4
|
+
#
|
5
|
+
class Pdfmdedit < Pdfmd
|
6
|
+
|
7
|
+
attr_accessor :filename, :opendoc, :pdfviewer
|
8
|
+
|
9
|
+
@@edit_tags = Hash.new
|
10
|
+
|
11
|
+
def initialize(filename)
|
12
|
+
super(filename)
|
13
|
+
self.set_tags(@@default_tags)
|
14
|
+
end
|
15
|
+
|
16
|
+
|
17
|
+
# Start a viewer
|
18
|
+
def start_viewer(filename = '', viewer = '')
|
19
|
+
if File.exists?(filename) and !viewer.empty?
|
20
|
+
|
21
|
+
pid = IO.popen("#{viewer} #{filename}")
|
22
|
+
self.log('debug', "Application '#{viewer}' with PID #{pid.pid} started to show file '#{filename}'.")
|
23
|
+
pid.pid
|
24
|
+
|
25
|
+
elsif viewer.empty?
|
26
|
+
self.log('error', 'No viewer specified. Aborting document view.')
|
27
|
+
else
|
28
|
+
self.log('error', "Could not find file '#{filename}' for viewing.")
|
29
|
+
end
|
30
|
+
|
31
|
+
end
|
32
|
+
|
33
|
+
|
34
|
+
#
|
35
|
+
# Setting the tags to edit
|
36
|
+
def set_tags(tags = Array.new)
|
37
|
+
|
38
|
+
if tags.is_a?(String) and tags.downcase == 'all'
|
39
|
+
@@default_tags.each do |value|
|
40
|
+
@@edit_tags[value] = ''
|
41
|
+
end
|
42
|
+
elsif tags.is_a?(Array)
|
43
|
+
tags.each do |value|
|
44
|
+
@@edit_tags[value] = ''
|
45
|
+
end
|
46
|
+
elsif tags.is_a?(Hash)
|
47
|
+
# NOTE: might need some adjustment here
|
48
|
+
# Not sure this is used at all
|
49
|
+
@@edit_tags = tags
|
50
|
+
else
|
51
|
+
|
52
|
+
|
53
|
+
# Try to match tags
|
54
|
+
if tags.is_a?(String)
|
55
|
+
|
56
|
+
@@edit_tags = {}
|
57
|
+
tagsForEditing = tags.split(',')
|
58
|
+
tagsForEditing.each do |value|
|
59
|
+
|
60
|
+
if value.match(/:/)
|
61
|
+
|
62
|
+
self.log('debug', 'Found tag value assignment.')
|
63
|
+
tagmatching = value.split(':')
|
64
|
+
|
65
|
+
# Check date for validity
|
66
|
+
if tagmatching[0] == 'createdate'
|
67
|
+
validatedDate = validateDate(tagmatching[1])
|
68
|
+
if !validatedDate
|
69
|
+
self.log('error',"Date not recognized: '#{tagmatching[1]}'.")
|
70
|
+
abort 'Date format not recognized. Abort.'
|
71
|
+
else
|
72
|
+
self.log('debug',"Identified date: #{validatedDate} ")
|
73
|
+
@@edit_tags[tagmatching[0]] = validatedDate
|
74
|
+
end
|
75
|
+
else
|
76
|
+
self.log('debug', "Identified key #{tagmatching[0]} with value '#{tagmatching[1]}'.")
|
77
|
+
@@edit_tags[tagmatching[0]] = tagmatching[1]
|
78
|
+
end
|
79
|
+
else
|
80
|
+
@@edit_tags[value] = ''
|
81
|
+
end
|
82
|
+
|
83
|
+
end
|
84
|
+
|
85
|
+
end
|
86
|
+
|
87
|
+
end
|
88
|
+
|
89
|
+
|
90
|
+
end
|
91
|
+
|
92
|
+
|
93
|
+
#
|
94
|
+
# Update the tags
|
95
|
+
# Reads @@edit_tags and asks for updates from the user if no value in
|
96
|
+
# @@edit_tags is provided
|
97
|
+
def update_tags()
|
98
|
+
|
99
|
+
# Empty String for possible viewer Process PID
|
100
|
+
viewerPID = ''
|
101
|
+
|
102
|
+
# Iterate through all tags and request information from user
|
103
|
+
# if necessary
|
104
|
+
@@edit_tags.each do |key,value|
|
105
|
+
if value.empty?
|
106
|
+
|
107
|
+
# At this poing:
|
108
|
+
# 1. If @opendoc
|
109
|
+
# 2. viewerPID.empty? (no viewer stated)
|
110
|
+
# => Start the viewer
|
111
|
+
if @opendoc and viewerPID.to_s.empty?
|
112
|
+
viewerPID = start_viewer(@filename, @pdfviewer)
|
113
|
+
self.log('debug', "Started external viewer '#{@pdfviewer}' with file '#{@filename}' and PID: #{viewerPID}")
|
114
|
+
end
|
115
|
+
|
116
|
+
puts 'Changing ' + key.capitalize + ', current value: ' + @@metadata[key].to_s
|
117
|
+
if key.downcase == 'createdate'
|
118
|
+
|
119
|
+
# Repeat asking for a valid date
|
120
|
+
validatedDate = false
|
121
|
+
while !validatedDate
|
122
|
+
validatedDate = validateDate(readUserInput('New date value: '))
|
123
|
+
end
|
124
|
+
@@metadata[key] = validatedDate
|
125
|
+
|
126
|
+
else
|
127
|
+
|
128
|
+
@@metadata[key] = readUserInput('New value: ')
|
129
|
+
|
130
|
+
end
|
131
|
+
|
132
|
+
else
|
133
|
+
|
134
|
+
# Setting the new metadata
|
135
|
+
@@metadata[key] = value
|
136
|
+
|
137
|
+
end
|
138
|
+
end
|
139
|
+
|
140
|
+
# Close the external PDF viewer if a PID has been set.
|
141
|
+
if !viewerPID.to_s.empty?
|
142
|
+
`kill #{viewerPID}`
|
143
|
+
self.log('debug', "Viewer process with PID #{viewerPID} killed.")
|
144
|
+
end
|
145
|
+
|
146
|
+
end
|
147
|
+
|
148
|
+
#
|
149
|
+
# Function to validate and interprete date information
|
150
|
+
def validateDate(date)
|
151
|
+
|
152
|
+
year = '[1-2][90][0-9][0-9]'
|
153
|
+
month = '0[1-9]|10|11|12'
|
154
|
+
day = '[1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1]'
|
155
|
+
hour = '[0-1][0-9]|2[0-3]|[1-9]'
|
156
|
+
minute = '[0-5][0-9]'
|
157
|
+
second = '[0-5][0-9]'
|
158
|
+
case date
|
159
|
+
when /^(#{year})(#{month})(#{day})$/
|
160
|
+
identifiedDate = $1 + ':' + $2 + ':' + $3 + ' 00:00:00'
|
161
|
+
when /^(#{year})(#{month})(#{day})(#{hour})(#{minute})(#{second})$/
|
162
|
+
identifiedDate = $1 + ':' + $2 + ':' + $3 + ' ' + $4 + ':' + $5 + ':' + $6
|
163
|
+
when /^(#{year})[\:|\.|\-](#{month})[\:|\.|\-](#{day})\s(#{hour})[\:](#{minute})[\:](#{second})$/
|
164
|
+
identifiedDate = $1 + ':' + $2 + ':' + $3 + ' ' + $4 + ':' + $5 + ':' + $6
|
165
|
+
when /^(#{year})[\:|\.|\-](#{month})[\:|\.|\-](#{day})$/
|
166
|
+
day = "%02d" % $3
|
167
|
+
month = "%02d" % $2
|
168
|
+
|
169
|
+
# Return the identified string
|
170
|
+
$1 + ':' + month + ':' + day + ' 00:00:00'
|
171
|
+
|
172
|
+
else
|
173
|
+
|
174
|
+
# This wasn't a date we recognize
|
175
|
+
false
|
176
|
+
|
177
|
+
end
|
178
|
+
end
|
179
|
+
|
180
|
+
#
|
181
|
+
# Write tags from the @@metadata back into the file
|
182
|
+
def write_tags(filename)
|
183
|
+
|
184
|
+
filename.empty? ? filename = @filename : ''
|
185
|
+
|
186
|
+
commandparameter = '-overwrite_original'
|
187
|
+
@@metadata.each do |key,value|
|
188
|
+
commandparameter = commandparameter + " -#{key}='#{value}'"
|
189
|
+
end
|
190
|
+
|
191
|
+
if !@@documentPassword.to_s.empty?
|
192
|
+
commandparameter = commandparameter + " -password '#{@@documentPassword}'"
|
193
|
+
end
|
194
|
+
|
195
|
+
command = "exiftool #{commandparameter} '#{filename}'"
|
196
|
+
`#{command}`
|
197
|
+
self.log('info',"Updating '#{filename}' with " + commandparameter.gsub(/\s\-password\s\'.*\'/,'').gsub(/\-overwrite\_original\s/,'').gsub(/\'\s\-/,"', ").gsub(/\-/,' ') )
|
198
|
+
|
199
|
+
end
|
200
|
+
|
201
|
+
end
|