pdfmd 1.9.1 → 2.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,92 @@
1
+ == General
2
+
3
+ Show metatags of a PDF document.
4
+
5
+ The following tags are being shown:
6
+
7
+ * Author
8
+ * CreateDate
9
+ * Title
10
+ * Subject
11
+ * Keywords
12
+
13
+ == Parameter
14
+
15
+ --all, -a
16
+
17
+ Show all relevant metatags for a document.
18
+
19
+ Relevant tags are: Author, CreateDate, Title, Subject, Keywords.
20
+
21
+ This is the default action.
22
+
23
+
24
+
25
+ --tag, -t
26
+
27
+ Specify the metatag to show. The selected metatag must be one of the relevant tags. Other tags are ignored and nothing is returned.
28
+
29
+ The value for the parameter is case insensitive: 'Author' == 'author'
30
+
31
+ Multiple Tags can be specificed, separated by a comma.
32
+
33
+ If multiple tags are specified in a different order than the default order, the specified order will be used. This has an impact on the order of the fields when e.g. the output is exported in CSV format.
34
+
35
+
36
+
37
+ --format, -f
38
+
39
+ Specify a different output format. Default: yaml
40
+
41
+ Available formats are: json,yaml,csv,hash
42
+
43
+
44
+
45
+ --includepdf, -i
46
+
47
+ Include the filename of the PDF document in the output if this option is set to true. Default: false
48
+
49
+
50
+
51
+ --log, -l
52
+
53
+ Enable/Disable logging. Default: true
54
+
55
+
56
+
57
+ --logfile, -p
58
+
59
+ Specify path to logfile. Default: `./.pdfmd.log`
60
+
61
+
62
+
63
+ == Example
64
+
65
+ # Show default metatags for a pdf document
66
+
67
+ $ pdfmd show <filename>
68
+
69
+ # Show default metatags for example.pdf
70
+
71
+ $ pdfmd show example.pdf
72
+
73
+ # Show value for metatag 'Author' for the file example.pdf
74
+
75
+ $ pdfmd show -t author example.pdf
76
+
77
+ # Show value for metatags 'Author','Title' for the file example.pdf
78
+
79
+ $ pdfmd show -t author,title example.pdf
80
+
81
+
82
+
83
+ == Hiera
84
+
85
+ --- # YAML
86
+ pdfmd::config
87
+ show:
88
+ format : yaml|json|csv|hash
89
+ tag : author,subject,createdate,title,keywords
90
+ includepdf: true|false
91
+ log : true|false
92
+
@@ -0,0 +1,111 @@
1
+ == General
2
+
3
+ Will sort pdf documents into subdirectories according to the value of the meta-tag 'author'.
4
+
5
+ If a document does not have an entry in the meta tag 'author', the file will not be processed.
6
+
7
+ === Parameter
8
+
9
+ --destination, -d
10
+
11
+ Speficy the root output directory to where the folderstructure is being created.
12
+
13
+ This parameter is required if hiera is not configured.
14
+
15
+ The command line parameter overwrites the hiera defaults
16
+
17
+ Default: current working directory.
18
+
19
+
20
+
21
+ --dryrun, -n
22
+
23
+ If set to true the command will perform all actions as usual but without actually doing anything. Logentries will be prefaced with 'DRYRUN: ' for all simulated actions. Default: false
24
+
25
+
26
+
27
+ --copy, -c
28
+
29
+ Copy the files instead of moving them. Default: false
30
+
31
+
32
+
33
+ --log, -l
34
+
35
+ Disable/Enable the logging. Default: true
36
+
37
+
38
+
39
+ --logfile, -p
40
+
41
+ Set an alternate path for the logfile. If no path is chosen, the logfile gets created in the current working directory as `.pdfmd.log`.
42
+
43
+
44
+
45
+ --interactive, -i
46
+
47
+ Disable/Enable interactive sorting. This will ask for confirmation for each sorting action. Default: false
48
+
49
+
50
+
51
+ --overwrite, -o
52
+
53
+ If set to 'true' the command will overwrite any existing file at the target destination with the same name without asking. Default: false
54
+
55
+
56
+
57
+ === Replacement rules
58
+
59
+ The subdirectories for the documents are generated from the values in the
60
+ tag 'author' of each document.
61
+
62
+ In order to ensure a clean directory structure, there are certain rules
63
+ for altering the values.
64
+
65
+ 1. Whitespaces are replaced by underscores.
66
+
67
+ 2. Dots are replaced by underscores.
68
+
69
+ 3. All letters are converted to their lowercase version.
70
+
71
+ 4. Special characters are serialized.
72
+
73
+
74
+
75
+ === Hiera
76
+
77
+ Set the default values mentioned below as sub-hash of the main configuration:
78
+
79
+ --- #YAML
80
+ pdfmd::config:
81
+ sort:
82
+ copy : true|false
83
+ destination : /tmp
84
+ dryrun : true|false
85
+ interactive : true|false
86
+ log : true|false
87
+ logfile : /var/log/pdfmd.log
88
+ overwrite : true|false
89
+
90
+ See the README file for an example how to define the values in Hiera or run `pdfmd explain hiera`.
91
+
92
+
93
+
94
+ === Example
95
+
96
+ This command does the following:
97
+
98
+ 1. Take all pdf documents in the subdirectory ./documents.
99
+
100
+ 2. Create the output folder structure in `/tmp/test/`.
101
+
102
+ 3. Copy the files instead of moving them.
103
+
104
+ 4. Disable the logging.
105
+
106
+ $ pdfmd sort -d /tmp/test -c -l false ./documents
107
+
108
+ # Sort only a single file
109
+
110
+ $ pdfmd sort -d /tmp/test -c -l false ./documents/test.pdf
111
+
@@ -0,0 +1,23 @@
1
+ Show statistics about the metadata of the PDF documents in a directory.
2
+
3
+ == Usage
4
+
5
+ Example: `pdfmd stat <directory>`
6
+
7
+
8
+ == Parameter
9
+
10
+ [<directory>]
11
+
12
+ Path to the directory containing PDF documents or subdirectories with PDF documents.
13
+
14
+ Example: `pdfmd stat ~/pdf`
15
+
16
+
17
+ --r --recursive
18
+
19
+ If set to true, pdfmd includes all PDF documents found in subdirectories of <directory> as well.
20
+
21
+ Default: false
22
+
23
+
@@ -0,0 +1,30 @@
1
+ # == Class: Pdfmdconfig
2
+ #
3
+ # Show current default configuration of pdfmd
4
+ #
5
+ class Pdfmdconfig < Pdfmd
6
+
7
+ require 'yaml'
8
+
9
+ def initialize(filename)
10
+ super(filename)
11
+ @filename = filename
12
+ end
13
+
14
+ def show_config(key = '')
15
+
16
+ if key.empty?
17
+ self.log('debug','Showing current configuration in yaml format.')
18
+ @hieradata.to_yaml
19
+ elsif @hieradata.has_key?(key)
20
+ self.log('debug',"Showing current configuration in yaml format, section: #{key}.")
21
+ @hieradata[key].to_yaml
22
+ else
23
+ self.log('error',"Unknown Hiera Key used: '#{key}'.")
24
+ puts 'Unknown hiera key. Abort.'
25
+ abort
26
+ end
27
+
28
+ end
29
+
30
+ end
@@ -0,0 +1,201 @@
1
+ # == Class: pdfmdedit
2
+ #
3
+ # Edit Metadata of PDF documentsc
4
+ #
5
+ class Pdfmdedit < Pdfmd
6
+
7
+ attr_accessor :filename, :opendoc, :pdfviewer
8
+
9
+ @@edit_tags = Hash.new
10
+
11
+ def initialize(filename)
12
+ super(filename)
13
+ self.set_tags(@@default_tags)
14
+ end
15
+
16
+
17
+ # Start a viewer
18
+ def start_viewer(filename = '', viewer = '')
19
+ if File.exists?(filename) and !viewer.empty?
20
+
21
+ pid = IO.popen("#{viewer} #{filename}")
22
+ self.log('debug', "Application '#{viewer}' with PID #{pid.pid} started to show file '#{filename}'.")
23
+ pid.pid
24
+
25
+ elsif viewer.empty?
26
+ self.log('error', 'No viewer specified. Aborting document view.')
27
+ else
28
+ self.log('error', "Could not find file '#{filename}' for viewing.")
29
+ end
30
+
31
+ end
32
+
33
+
34
+ #
35
+ # Setting the tags to edit
36
+ def set_tags(tags = Array.new)
37
+
38
+ if tags.is_a?(String) and tags.downcase == 'all'
39
+ @@default_tags.each do |value|
40
+ @@edit_tags[value] = ''
41
+ end
42
+ elsif tags.is_a?(Array)
43
+ tags.each do |value|
44
+ @@edit_tags[value] = ''
45
+ end
46
+ elsif tags.is_a?(Hash)
47
+ # NOTE: might need some adjustment here
48
+ # Not sure this is used at all
49
+ @@edit_tags = tags
50
+ else
51
+
52
+
53
+ # Try to match tags
54
+ if tags.is_a?(String)
55
+
56
+ @@edit_tags = {}
57
+ tagsForEditing = tags.split(',')
58
+ tagsForEditing.each do |value|
59
+
60
+ if value.match(/:/)
61
+
62
+ self.log('debug', 'Found tag value assignment.')
63
+ tagmatching = value.split(':')
64
+
65
+ # Check date for validity
66
+ if tagmatching[0] == 'createdate'
67
+ validatedDate = validateDate(tagmatching[1])
68
+ if !validatedDate
69
+ self.log('error',"Date not recognized: '#{tagmatching[1]}'.")
70
+ abort 'Date format not recognized. Abort.'
71
+ else
72
+ self.log('debug',"Identified date: #{validatedDate} ")
73
+ @@edit_tags[tagmatching[0]] = validatedDate
74
+ end
75
+ else
76
+ self.log('debug', "Identified key #{tagmatching[0]} with value '#{tagmatching[1]}'.")
77
+ @@edit_tags[tagmatching[0]] = tagmatching[1]
78
+ end
79
+ else
80
+ @@edit_tags[value] = ''
81
+ end
82
+
83
+ end
84
+
85
+ end
86
+
87
+ end
88
+
89
+
90
+ end
91
+
92
+
93
+ #
94
+ # Update the tags
95
+ # Reads @@edit_tags and asks for updates from the user if no value in
96
+ # @@edit_tags is provided
97
+ def update_tags()
98
+
99
+ # Empty String for possible viewer Process PID
100
+ viewerPID = ''
101
+
102
+ # Iterate through all tags and request information from user
103
+ # if necessary
104
+ @@edit_tags.each do |key,value|
105
+ if value.empty?
106
+
107
+ # At this poing:
108
+ # 1. If @opendoc
109
+ # 2. viewerPID.empty? (no viewer stated)
110
+ # => Start the viewer
111
+ if @opendoc and viewerPID.to_s.empty?
112
+ viewerPID = start_viewer(@filename, @pdfviewer)
113
+ self.log('debug', "Started external viewer '#{@pdfviewer}' with file '#{@filename}' and PID: #{viewerPID}")
114
+ end
115
+
116
+ puts 'Changing ' + key.capitalize + ', current value: ' + @@metadata[key].to_s
117
+ if key.downcase == 'createdate'
118
+
119
+ # Repeat asking for a valid date
120
+ validatedDate = false
121
+ while !validatedDate
122
+ validatedDate = validateDate(readUserInput('New date value: '))
123
+ end
124
+ @@metadata[key] = validatedDate
125
+
126
+ else
127
+
128
+ @@metadata[key] = readUserInput('New value: ')
129
+
130
+ end
131
+
132
+ else
133
+
134
+ # Setting the new metadata
135
+ @@metadata[key] = value
136
+
137
+ end
138
+ end
139
+
140
+ # Close the external PDF viewer if a PID has been set.
141
+ if !viewerPID.to_s.empty?
142
+ `kill #{viewerPID}`
143
+ self.log('debug', "Viewer process with PID #{viewerPID} killed.")
144
+ end
145
+
146
+ end
147
+
148
+ #
149
+ # Function to validate and interprete date information
150
+ def validateDate(date)
151
+
152
+ year = '[1-2][90][0-9][0-9]'
153
+ month = '0[1-9]|10|11|12'
154
+ day = '[1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1]'
155
+ hour = '[0-1][0-9]|2[0-3]|[1-9]'
156
+ minute = '[0-5][0-9]'
157
+ second = '[0-5][0-9]'
158
+ case date
159
+ when /^(#{year})(#{month})(#{day})$/
160
+ identifiedDate = $1 + ':' + $2 + ':' + $3 + ' 00:00:00'
161
+ when /^(#{year})(#{month})(#{day})(#{hour})(#{minute})(#{second})$/
162
+ identifiedDate = $1 + ':' + $2 + ':' + $3 + ' ' + $4 + ':' + $5 + ':' + $6
163
+ when /^(#{year})[\:|\.|\-](#{month})[\:|\.|\-](#{day})\s(#{hour})[\:](#{minute})[\:](#{second})$/
164
+ identifiedDate = $1 + ':' + $2 + ':' + $3 + ' ' + $4 + ':' + $5 + ':' + $6
165
+ when /^(#{year})[\:|\.|\-](#{month})[\:|\.|\-](#{day})$/
166
+ day = "%02d" % $3
167
+ month = "%02d" % $2
168
+
169
+ # Return the identified string
170
+ $1 + ':' + month + ':' + day + ' 00:00:00'
171
+
172
+ else
173
+
174
+ # This wasn't a date we recognize
175
+ false
176
+
177
+ end
178
+ end
179
+
180
+ #
181
+ # Write tags from the @@metadata back into the file
182
+ def write_tags(filename)
183
+
184
+ filename.empty? ? filename = @filename : ''
185
+
186
+ commandparameter = '-overwrite_original'
187
+ @@metadata.each do |key,value|
188
+ commandparameter = commandparameter + " -#{key}='#{value}'"
189
+ end
190
+
191
+ if !@@documentPassword.to_s.empty?
192
+ commandparameter = commandparameter + " -password '#{@@documentPassword}'"
193
+ end
194
+
195
+ command = "exiftool #{commandparameter} '#{filename}'"
196
+ `#{command}`
197
+ self.log('info',"Updating '#{filename}' with " + commandparameter.gsub(/\s\-password\s\'.*\'/,'').gsub(/\-overwrite\_original\s/,'').gsub(/\'\s\-/,"', ").gsub(/\-/,' ') )
198
+
199
+ end
200
+
201
+ end