opex-manifest-generator 1.3.5__py3-none-any.whl → 1.3.7__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- opex_manifest_generator/__init__.py +3 -3
- opex_manifest_generator/cli.py +204 -123
- opex_manifest_generator/common.py +7 -7
- opex_manifest_generator/hash.py +4 -4
- opex_manifest_generator/metadata/EAD Template.xml +1 -1
- opex_manifest_generator/metadata/GDPR Template.xml +1 -1
- opex_manifest_generator/metadata/MODS Template.xml +1 -1
- opex_manifest_generator/opex_manifest.py +59 -59
- opex_manifest_generator/options/options.properties +1 -1
- opex_manifest_generator-1.3.7.dist-info/METADATA +619 -0
- opex_manifest_generator-1.3.7.dist-info/RECORD +16 -0
- opex_manifest_generator-1.3.7.dist-info/entry_points.txt +2 -0
- {opex_manifest_generator-1.3.5.dist-info → opex_manifest_generator-1.3.7.dist-info}/licenses/LICENSE.md +1 -1
- opex_manifest_generator-1.3.5.dist-info/METADATA +0 -557
- opex_manifest_generator-1.3.5.dist-info/RECORD +0 -16
- opex_manifest_generator-1.3.5.dist-info/entry_points.txt +0 -2
- {opex_manifest_generator-1.3.5.dist-info → opex_manifest_generator-1.3.7.dist-info}/WHEEL +0 -0
- {opex_manifest_generator-1.3.5.dist-info → opex_manifest_generator-1.3.7.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,619 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: opex_manifest_generator
|
|
3
|
+
Version: 1.3.7
|
|
4
|
+
Summary: An Opex Manifest Generator tool for use with OPEX Files, as designed by Preservica
|
|
5
|
+
Author-email: Christopher Prince <c.pj.prince@gmail.com>
|
|
6
|
+
License-Expression: Apache-2.0
|
|
7
|
+
Project-URL: Homepage, https://github.com/CPJPRINCE/opex_manifest_generator
|
|
8
|
+
Project-URL: Issues, https://github.com/CPJPRINCE/opex_manifest_generator/issues
|
|
9
|
+
Keywords: archiving,archives,digital archiving,opex,Preservica,opex generator
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Operating System :: OS Independent
|
|
12
|
+
Classifier: Topic :: System :: Archiving
|
|
13
|
+
Description-Content-Type: text/markdown
|
|
14
|
+
License-File: LICENSE.md
|
|
15
|
+
Requires-Dist: auto_reference_generator
|
|
16
|
+
Requires-Dist: pandas
|
|
17
|
+
Requires-Dist: openpyxl
|
|
18
|
+
Requires-Dist: lxml
|
|
19
|
+
Provides-Extra: addex
|
|
20
|
+
Requires-Dist: odfpy; extra == "addex"
|
|
21
|
+
Provides-Extra: dev
|
|
22
|
+
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
23
|
+
Dynamic: license-file
|
|
24
|
+
|
|
25
|
+
# Opex Manifest Generator Tool
|
|
26
|
+
|
|
27
|
+
[](https://pypi.org/project/opex_manifest_generator)
|
|
28
|
+
[](https://github.com/CPJPRINCE/opex_manifest_generator/actions/workflows/codeql.yml)
|
|
29
|
+
|
|
30
|
+
A small Python programme for generating opex manifest files. Used for safe transfer of files and metadata ingests into opex compatible systems (Preservica). The program will recurse through a given hierarchy and generate manifests for all folders/files (depending on option).
|
|
31
|
+
|
|
32
|
+
## Table of Contents
|
|
33
|
+
|
|
34
|
+
- [Quick Start](#quick-start)
|
|
35
|
+
- [Version & Package Info](#version--package-info)
|
|
36
|
+
- [Why Use This Tool?](#why-use-this-tool)
|
|
37
|
+
- [Additional Features](#additional-features)
|
|
38
|
+
- [Expected Output](#expected-output)
|
|
39
|
+
- [Advanced Usage](#advanced-usage)
|
|
40
|
+
- [Fixity Generation](#fixity-generation)
|
|
41
|
+
- [Continuous Operation](#continuous-operation)
|
|
42
|
+
- [Clearing Opex Files](#clearing-opex-files)
|
|
43
|
+
- [Zipping](#zipping)
|
|
44
|
+
- [Removing Empty Directories](#removing-empty-directories)
|
|
45
|
+
- [Hidden Directories](#hidden-directories)
|
|
46
|
+
- [Auto Reference Usage](#auto-reference-usage)
|
|
47
|
+
- [Input Option](#input-option)
|
|
48
|
+
- [XIP Metadata - Title, Description and Security Tags](#xip-metadata---title-description-and-security-tags)
|
|
49
|
+
- [XIP Metadata - Identifiers](#xip-metadata---identifiers)
|
|
50
|
+
- [Samples](#samples)
|
|
51
|
+
- [Custom Spreadsheets](#custom-spreadsheets)
|
|
52
|
+
- [XML Metadata - Basic Templates](#xml-metadata---basic-templates)
|
|
53
|
+
- [XML Metadata - Quick Notes](#xml-metadata---quick-notes)
|
|
54
|
+
- [XML Metadata Templates - Custom Templates](#xml-metadata-templates---custom-templates)
|
|
55
|
+
- [Input Hashes](#input-hashes)
|
|
56
|
+
- [Removals & Ignore](#removals--ignore)
|
|
57
|
+
- [Options File](#options-file)
|
|
58
|
+
- [Full Options](#full-options)
|
|
59
|
+
- [Future Developments](#future-developments)
|
|
60
|
+
- [Troubleshooting](#troubleshooting)
|
|
61
|
+
- [Developers](#developers)
|
|
62
|
+
- [Contributing](#contributing)
|
|
63
|
+
|
|
64
|
+
## Quick Start
|
|
65
|
+
|
|
66
|
+
### Option 1: Using pip (Recommended for Python users / long-term use)
|
|
67
|
+
```bash
|
|
68
|
+
pip install -U opex_manifest_generator
|
|
69
|
+
opex_generate /path/to/root
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
### Option 2: Using Portable Executable (No Python Required)
|
|
73
|
+
|
|
74
|
+
Download the latest portable executable for your platform from [Releases](https://github.com/CPJPRINCE/opex_manifest_generator/releases)
|
|
75
|
+
|
|
76
|
+
Extract and run:
|
|
77
|
+
```bash
|
|
78
|
+
# Windows
|
|
79
|
+
cd opex_generate\bin
|
|
80
|
+
.\opex_generate.cmd .\path\to\root -fx SHA-256
|
|
81
|
+
|
|
82
|
+
# Linux/macOS
|
|
83
|
+
./opex_generate /path/to/root -fx SHA-256
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
On Windows you can also use the install.cmd with admin privileges to install and run the command without navigating to the bin folder (see Option 1 for use).
|
|
87
|
+
|
|
88
|
+
## Version & Package Info
|
|
89
|
+
|
|
90
|
+
**Python Version:**
|
|
91
|
+
|
|
92
|
+
Python Version 3.10+ is recommended. Earlier versions may work but are not tested.
|
|
93
|
+
|
|
94
|
+
**Additional Packages:**
|
|
95
|
+
- auto_reference_generator (required)
|
|
96
|
+
- pandas (required)
|
|
97
|
+
- tqdm (required)
|
|
98
|
+
- openpyxl (required)
|
|
99
|
+
- lxml (required)
|
|
100
|
+
- odfpy (optional - ods export)
|
|
101
|
+
|
|
102
|
+
To install using Python:
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
pip install pandas openpyxl pyodf lxml tqdm
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
If using Python, ensure it is added to your Environment variables.
|
|
109
|
+
|
|
110
|
+
### Output
|
|
111
|
+
|
|
112
|
+
Will generate an `.opex` manifest file for each of your directories. This manifest will contain a list of all files/folders in that folder.
|
|
113
|
+
|
|
114
|
+
File manifests may also be generated if using additional options. When file manifests are active, the folder manifest automatically accounts for additional opexes.
|
|
115
|
+
|
|
116
|
+
## Why Use This Tool?
|
|
117
|
+
|
|
118
|
+
This tool was primarily intended to allow users to undertake larger uploads safely using bulk ingests.
|
|
119
|
+
|
|
120
|
+
It functions with all methods of Opex Ingests. For Preservica this includes:
|
|
121
|
+
- **Opex Incremental Workflow**
|
|
122
|
+
- **PUT Tool**
|
|
123
|
+
- **Starter Drag 'n' Drop**
|
|
124
|
+
- **Manual Ingest**
|
|
125
|
+
|
|
126
|
+
## Additional Features
|
|
127
|
+
|
|
128
|
+
- **Hash generation (MD5, SHA1, SHA256, SHA512) - for additional security checks.**
|
|
129
|
+
- **Generate multiple algorithm hashes**
|
|
130
|
+
- **Generate hashes for PAX files**
|
|
131
|
+
- **Continuous Operation - allowing closure/crashes of the program to occur and then picking up where you left off**
|
|
132
|
+
- **Opex removal**
|
|
133
|
+
- **Zip functionality**
|
|
134
|
+
|
|
135
|
+
The Program also includes the [Auto Reference Generator](https://github.com/CPJPRINCE/auto_reference_generator), built in allowing for:
|
|
136
|
+
- **Automated Reference generation straight to Opex files**
|
|
137
|
+
- **Clearing and logging empty folders**
|
|
138
|
+
- **A Removal mode to delete and log files/folders**
|
|
139
|
+
- **Sorting - by alphabetically or 'folders first'**
|
|
140
|
+
- **Keyword assignment - replacing numerals with specified keywords (initials, first letter, JSON map)**
|
|
141
|
+
- **And more! See the github page for details**
|
|
142
|
+
|
|
143
|
+
A key function built on ARG is the `--input` mode, allowing you to use a spreadsheet to assign XIP/XML metadata to your files and folders. Currently this allows:
|
|
144
|
+
- **Assignment of XIP title, description, and security status fields**
|
|
145
|
+
- **Assignment of standard and custom XML metadata templates**
|
|
146
|
+
- **'Drop-in/drop-out' operations, so only needed columns are added**
|
|
147
|
+
|
|
148
|
+
All these options can be combined to create extensive and robust Opex files for file transfers.
|
|
149
|
+
|
|
150
|
+
## Expected Output
|
|
151
|
+
|
|
152
|
+
At a basic level, using `opex_generate`, the program will only generate folder manifests.
|
|
153
|
+
|
|
154
|
+

|
|
155
|
+
|
|
156
|
+
Which will contain a simple list of files/folders in that folder:
|
|
157
|
+
|
|
158
|
+

|
|
159
|
+
|
|
160
|
+
When using an option that affects files, you will generate individual Opexes for files:
|
|
161
|
+
|
|
162
|
+

|
|
163
|
+
|
|
164
|
+
These will contain the data about the files (which will vary based on selected options).
|
|
165
|
+
|
|
166
|
+

|
|
167
|
+
|
|
168
|
+
When individual opex files are generated, the folder manifest will include these as **metadata** files.
|
|
169
|
+
|
|
170
|
+

|
|
171
|
+
|
|
172
|
+
## Advanced Usage
|
|
173
|
+
|
|
174
|
+
**Important Notes**
|
|
175
|
+
|
|
176
|
+
- The term `meta` is hard-coded to always be ignored. This is case-sensitive.
|
|
177
|
+
- A meta folder will only be created using `--fixity`, `--remove-empty` or `-rm` options. You can disable this using the `--disable-meta-dir` option or `-o` option to relocate it.
|
|
178
|
+
|
|
179
|
+
### Fixity Generation
|
|
180
|
+
|
|
181
|
+
```bash
|
|
182
|
+
# Generate with SHA-256 Hash
|
|
183
|
+
opex_generate "/path/to/folder" -fx SHA-256
|
|
184
|
+
|
|
185
|
+
# Generate with MD5 and SHA-256 Hash
|
|
186
|
+
opex_generate "/path/to/folder" -fx MD5 SHA-256
|
|
187
|
+
|
|
188
|
+
# Generate with SHA-512 for PAX - PAXes can be zipped or a folder titled '.pax'
|
|
189
|
+
opex_generate "/path/to/paxfolders" -fx SHA-1 --pax-fixity
|
|
190
|
+
|
|
191
|
+
# Generate with MD5 and SHA1 for PAX
|
|
192
|
+
opex_generate "/path/to/paxfolders" -fx MD5 SHA-1
|
|
193
|
+
|
|
194
|
+
# Using -fx without specifying will default to SHA-1
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
### Continuous Operation
|
|
198
|
+
|
|
199
|
+
The program won't override an existing opex when generating a new Opex. If an opex is present it will state:
|
|
200
|
+
|
|
201
|
+
```
|
|
202
|
+
Avoiding override, Opex exists at: /path/to/opex
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
This allows for continuous operation, as long generations - particularly if you have large files - can be cancelled at any point, then picked up later. To halt the program, simply press `Ctrl + C` in the console.
|
|
206
|
+
|
|
207
|
+
There is no way to force an override. If you need to rerun a generation, use the `-clr` option.
|
|
208
|
+
|
|
209
|
+
### Clearing Opex Files
|
|
210
|
+
|
|
211
|
+
```bash
|
|
212
|
+
# Will clear existing opexes recursively then end
|
|
213
|
+
opex_generate /path/to/folder -clr
|
|
214
|
+
|
|
215
|
+
# If other options are enabled will clear and rerun generation
|
|
216
|
+
opex_generate /path/to/folder -clr -fx SHA1
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
### Zipping
|
|
220
|
+
|
|
221
|
+
```bash
|
|
222
|
+
# Will zip opex and file into a zip file
|
|
223
|
+
opex_generate /path/to/folder -fx SHA-1 -z
|
|
224
|
+
|
|
225
|
+
# Will zip opex and file and remove the original files
|
|
226
|
+
opex_generate /path/to/folder -fx SHA-1 -z --remove-zipped-files
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
**Use zipping with caution, repeated use can get quite messy fast.**
|
|
230
|
+
|
|
231
|
+
### Removing Empty Directories
|
|
232
|
+
|
|
233
|
+
```bash
|
|
234
|
+
# Remove and generate a text log to the 'meta' folder of removed directories
|
|
235
|
+
opex_generate /path/to/folder --remove-empty
|
|
236
|
+
|
|
237
|
+
# You will be asked to give confirmation that you want to proceed
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
### Hidden Directories
|
|
241
|
+
|
|
242
|
+
```bash
|
|
243
|
+
# By default hidden directories/files are not included. Adding --hidden will include hidden files
|
|
244
|
+
opex_generate /path/to/folder --hidden
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
## Auto Reference Usage
|
|
248
|
+
|
|
249
|
+
As mentioned, built into the OMG is the Auto Reference Generator, allowing archival references to be assigned directly to Opexes. By default, codes generated using this method are hard-coded to the identifier `code`.
|
|
250
|
+
|
|
251
|
+
If you want to understand what these References will look like, please see [here](https://github.com/CPJPRINCE/auto_reference_generator?tab=readme-ov-file#structure-of-references).
|
|
252
|
+
|
|
253
|
+
```bash
|
|
254
|
+
# Will generate a reference code for the hierarchy with the prefix "ARCH"
|
|
255
|
+
opex_generate /path/to/folder -r catalog -p ARCH
|
|
256
|
+
|
|
257
|
+
# Will generate a reference code with prefix "ARCH-1-2-3", suffix "Z" and delimiter "-"
|
|
258
|
+
opex_generate /path/to/folder -r catalog -p "ARCH-1-2-3" -s Z -dlm "-"
|
|
259
|
+
|
|
260
|
+
# Will generate a reference code without a prefix - this will only be the numerals
|
|
261
|
+
opex_generate /path/to/folder -r catalog
|
|
262
|
+
|
|
263
|
+
# Will generate an accession code / 'running number' with the prefix "2026-X"
|
|
264
|
+
opex_generate /path/to/folder -r accession -p 2026-X
|
|
265
|
+
|
|
266
|
+
# Will fill in title, description and security tag data based upon file and folder names and sets to the default security tag 'open'
|
|
267
|
+
opex_generate -c generic /path/to/folder
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
## Input Option
|
|
271
|
+
|
|
272
|
+
This program also supports using a spreadsheet as an `input`. This allows the data to be prefilled in and set on ingest. The following XIP Metadata fields can be set:
|
|
273
|
+
|
|
274
|
+
- Title
|
|
275
|
+
- Description
|
|
276
|
+
- Security Status
|
|
277
|
+
- Identifiers
|
|
278
|
+
- SourceID
|
|
279
|
+
|
|
280
|
+
XML metadata data is also supported for both default and custom XMLs.
|
|
281
|
+
|
|
282
|
+
### XIP Metadata - Title, Description and Security Tags
|
|
283
|
+
|
|
284
|
+
To use an input override, you first need to create a spreadsheet folder listing. It's not necessary, but for convenience, I'd recommend using the `auto_ref` tool. Like so:
|
|
285
|
+
|
|
286
|
+
```bash
|
|
287
|
+
auto_ref -p "ARCH" /path/to/root
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
The column headers are all 'drop-in/drop-out'. Simply add new columns for the data you'd like to edit. The column headers are case-sensitive and have to match exactly. For reference, these are the following:
|
|
291
|
+
|
|
292
|
+
```
|
|
293
|
+
Title
|
|
294
|
+
Description
|
|
295
|
+
Security
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
These fields would then be filled in with the relevant data. **For Security Tags**, ensure they are an exact match to the tag on your system, which are also case-sensitive.
|
|
299
|
+
|
|
300
|
+

|
|
301
|
+
|
|
302
|
+
Once the cells are filled in with the respective data, run a generation using the `-i` option and input the full path to your spreadsheet. Ensure that the `/path/to/root` is the same root as you generated the spreadsheet for.
|
|
303
|
+
|
|
304
|
+
```bash
|
|
305
|
+
# Will use the 'spreadsheet.xlsx' as an input
|
|
306
|
+
opex_generate -i /path/to/your/spreadsheet.xlsx /path/to/root
|
|
307
|
+
|
|
308
|
+
# These can still be combined with the above options
|
|
309
|
+
opex_generate -i /path/to/your/spreadsheet.xlsx -fx SHA-1 /path/to/root
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
**To Note:**
|
|
313
|
+
- If you leave blank cells, it will simply skip those details.
|
|
314
|
+
- If you rearrange the hierarchy after your spreadsheet generation, you may receive errors or mismatches due to folders/files being incorrectly looked up. In these cases, you may need to regenerate your list and migrate the data to it.
|
|
315
|
+
- Assignment is not specific to Folders/Files.
|
|
316
|
+
|
|
317
|
+
### XIP Metadata - Identifiers
|
|
318
|
+
|
|
319
|
+
Identifiers are also supported and can be added to the column header following this convention:
|
|
320
|
+
|
|
321
|
+
```
|
|
322
|
+
Identifier:Key
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
The `Key` will determine the identifier name and the cells will contain the value.
|
|
326
|
+
|
|
327
|
+

|
|
328
|
+
|
|
329
|
+
You can also use the following column headers:
|
|
330
|
+
|
|
331
|
+
```
|
|
332
|
+
# Defaults to 'code' key
|
|
333
|
+
- Identifier
|
|
334
|
+
- Archive_Reference
|
|
335
|
+
|
|
336
|
+
# Defaults to 'accref' key
|
|
337
|
+
- Accession_Reference
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
### Samples
|
|
341
|
+
|
|
342
|
+
A completed Opex based on this data:
|
|
343
|
+
|
|
344
|
+

|
|
345
|
+
|
|
346
|
+
Using the command: `opex_generate /home/chris/dev/opex_manifest_generator -i /home/chris/Dev/opex_manifest_generator/meta/opex_manifest_generator_AutoRef.xlsx`
|
|
347
|
+
|
|
348
|
+
Will generate the following for folder manifest:
|
|
349
|
+
|
|
350
|
+

|
|
351
|
+
|
|
352
|
+
For file manifest:
|
|
353
|
+
|
|
354
|
+

|
|
355
|
+
|
|
356
|
+
### Custom Spreadsheets
|
|
357
|
+
|
|
358
|
+
The OMG is only dependent on the `FullName` header being present for correct functionality. You can use any spreadsheet as long as the `FullName` header is present and correctly matches the hierarchy. Additional headers can be dropped in/out without interfering.
|
|
359
|
+
|
|
360
|
+

|
|
361
|
+
|
|
362
|
+
### XML Metadata - Basic Templates
|
|
363
|
+
|
|
364
|
+
DC, MODS, GDPR, and EAD templates are supported out of the box. The column headers are also 'drop-in/drop-out'.
|
|
365
|
+
|
|
366
|
+
XML Column Headers need to be written as: `ns:tagname` with `ns` being the XML's namespace and `tagname` the tag name.
|
|
367
|
+
|
|
368
|
+

|
|
369
|
+
|
|
370
|
+
There are two ways to enter the column header: `exactly` or `flatly` (also known as 'nested' vs 'flat' mode). When entering `exact`, you must enter all parents of the tag separated by `/`. Flatly only requires the end tag to be present. In both cases, case-sensitivity matters. `exact` is the default method.
|
|
371
|
+
|
|
372
|
+
If you enter a non-matching header (such as a misspelling), it won't match to the field.
|
|
373
|
+
|
|
374
|
+
```
|
|
375
|
+
# Exactly:
|
|
376
|
+
mods:recordInfo/mods:recordIdentifier
|
|
377
|
+
|
|
378
|
+
# Flatly:
|
|
379
|
+
mods:recordIdentifier
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
In both cases, these match to the same `recordIdentifier` field.
|
|
383
|
+
|
|
384
|
+
While using the `flatly` method is easier, if non-unique tags are present, such as `mods:note`, it will match to the first occurrence in the XML, which might not be its intended destination. For complex XMLs, I'd recommend sticking with the `exact` method.
|
|
385
|
+
|
|
386
|
+
Once you have added your headers and data, you can run like so:
|
|
387
|
+
|
|
388
|
+
```bash
|
|
389
|
+
# Run with flat method
|
|
390
|
+
opex_generate -i "/path/to/your/spreadsheet.xlsx" "/path/to/root/dir" -m flat
|
|
391
|
+
|
|
392
|
+
# Run with exact method
|
|
393
|
+
opex_generate -i "/path/to/your/spreadsheet.xlsx" "/path/to/root/dir" -m exact
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
### XML Metadata - Quick Notes
|
|
397
|
+
|
|
398
|
+
- You can use `--print-xmls` and `--convert-xmls` to return XMLs to the console or generate spreadsheet templates.
|
|
399
|
+
|
|
400
|
+
```bash
|
|
401
|
+
# You can use `--print-xmls` to display the correct header names of your XMLs to the console
|
|
402
|
+
opex_generate /path/to/root --print-xmls
|
|
403
|
+
|
|
404
|
+
# You can also use `--convert-xmls` to create spreadsheets with all the right headers. Will be output to the cwd of your terminal
|
|
405
|
+
opex_generate /path/to/root --convert-xmls
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
- When you have multiple non-unique tags, such as `mods:note`, you will need to add an index in square brackets `[0]` like so: `mods:note[1] mods:note[2] ...` The number should correspond to the order they appear in the XML tree.
|
|
409
|
+
- If you use `-m` option without adding any data, a blank XML template will be added to the opex.
|
|
410
|
+
- I've also included sample spreadsheets for DC, MODS, GDPR and EAD templates with the `exact` headers [here](https://github.com/CPJPRINCE/opex_manifest_generator/tree/master/samples/spreads).
|
|
411
|
+
|
|
412
|
+
### XML Metadata Templates - Custom Templates
|
|
413
|
+
|
|
414
|
+
Any custom XML template that is functioning in your system will work!
|
|
415
|
+
|
|
416
|
+
To use custom XMLs, place your XMLs in a specific folder, then use the `-mdir` option with `/path/to/metadata`. You can also use `--print-xmls` and `--convert-xmls` in conjunction with this to generate.
|
|
417
|
+
|
|
418
|
+
```bash
|
|
419
|
+
# Will use /path/to/metadata as source for files
|
|
420
|
+
opex_generate /path/to/root -mdir /path/to/metadata
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
### Input Hashes
|
|
424
|
+
|
|
425
|
+
If you use the column headers `Hash` and `Algorithm` with hash data, when using the `-fx` option in combination with `-i`, the program will read the hashes from the spreadsheet instead of generating them.
|
|
426
|
+
|
|
427
|
+

|
|
428
|
+
|
|
429
|
+
**Does not currently support multiple hashes**
|
|
430
|
+
|
|
431
|
+
### Removals & Ignore
|
|
432
|
+
|
|
433
|
+
You can set the column header `Removals`, and when the cell is marked TRUE, the specified folder/file will be deleted. To activate, use the option `-rm` and confirm when prompted. A text log will be generated for the deleted files in the `meta` folder.
|
|
434
|
+
|
|
435
|
+
Similarly, you can set the column header `Ignore`, and when the cell is marked `TRUE` it will skip the generation of an Opex for the specified file/folder.
|
|
436
|
+
|
|
437
|
+
### Options File
|
|
438
|
+
|
|
439
|
+
You can use your own `options.properties` file to change the default column headers and some other defaults. Like so:
|
|
440
|
+
|
|
441
|
+
```bash
|
|
442
|
+
opex_generate --options-file path/to/options.properties /path/to/root
|
|
443
|
+
```
|
|
444
|
+
|
|
445
|
+
The default options look like:
|
|
446
|
+
|
|
447
|
+
```
|
|
448
|
+
[options]
|
|
449
|
+
|
|
450
|
+
INDEX_FIELD = FullName
|
|
451
|
+
TITLE_FIELD = Title
|
|
452
|
+
DESCRIPTION_FIELD = Description
|
|
453
|
+
SECURITY_FIELD = Security
|
|
454
|
+
IDENTIFIER_FIELD = Identifier
|
|
455
|
+
IDENTIFIER_DEFAULT = code
|
|
456
|
+
REMOVAL_FIELD = Removals
|
|
457
|
+
IGNORE_FIELD = Ignore
|
|
458
|
+
SOURCEID_FIELD = SourceID
|
|
459
|
+
HASH_FIELD = Hash
|
|
460
|
+
ALGORITHM_FIELD = Algorithm
|
|
461
|
+
|
|
462
|
+
ACCREF_CODE = accref
|
|
463
|
+
ARCREF_FIELD = Archive_Reference
|
|
464
|
+
ACCREF_FIELD = Accession_Reference
|
|
465
|
+
|
|
466
|
+
METAFOLDER = meta
|
|
467
|
+
FIXITY_SUFFIX = _Fixity
|
|
468
|
+
REMOVALS_SUFFIX = _Removals
|
|
469
|
+
GENERIC_DEFAULT_SECURITY = open
|
|
470
|
+
```
|
|
471
|
+
|
|
472
|
+
## Full Options
|
|
473
|
+
|
|
474
|
+
The below covers the full range of options. Use `-h` option to show this dialog.
|
|
475
|
+
|
|
476
|
+
<!-- argparse_to_md:opex_manifest_generator:create_parser -->
|
|
477
|
+
Usage:
|
|
478
|
+
```
|
|
479
|
+
Opex_Manifest_Generator [-h] [-v] [-fx [{SHA-1,MD5,SHA-256,SHA-512} ...]] [--pax-fixity]
|
|
480
|
+
[-z] [--remove-zipped-files] [--remove-empty] [--hidden] [-clr]
|
|
481
|
+
[-opt OPTIONS_FILE] [-i [INPUT]] [-mdir [METADATA_DIR]]
|
|
482
|
+
[-m [{exact,flat}]] [-rm] [--print-xmls] [--convert-xmls]
|
|
483
|
+
[--autoref-options AUTOREF_OPTIONS]
|
|
484
|
+
[-r {catalog,accession,both,generic,catalog-generic,accession-generic,both-generic}]
|
|
485
|
+
[-p PREFIX [PREFIX ...]] [-s [SUFFIX]]
|
|
486
|
+
[--suffix-option {file,directory,both}]
|
|
487
|
+
[--accession-mode [{file,directory,both}]] [-str [START_REF]]
|
|
488
|
+
[-dlm [DELIMITER]] [--sort-by [{folders_first,alphabetical}]]
|
|
489
|
+
[-key [KEYWORDS ...]]
|
|
490
|
+
[-keym [{initialise,firstletters,from_json}]]
|
|
491
|
+
[--keywords-case-sensitivity] [--keywords-retain-order]
|
|
492
|
+
[--keywords-abbreviation-number KEYWORDS_ABBREVIATION_NUMBER [KEYWORDS_ABBREVIATION_NUMBER ...]]
|
|
493
|
+
[--log-level [{DEBUG,INFO,WARNING,ERROR}]]
|
|
494
|
+
[--log-file [LOG_FILE]] [-o [OUTPUT]] [--disable-meta-dir]
|
|
495
|
+
[--disable-all-exports] [--disable-fixity-export]
|
|
496
|
+
[--disable-empty-export] [--disable-removal-export] [-ex]
|
|
497
|
+
[-fmt {xlsx,csv,json,ods,xml}]
|
|
498
|
+
[root]
|
|
499
|
+
```
|
|
500
|
+
OPEX Manifest Generator for Preservica Uploads
|
|
501
|
+
|
|
502
|
+
Positional arguments:
|
|
503
|
+
- `root`: The root path to generate Opexes for, will recursively traverse all sub-directories.
|
|
504
|
+
Generates an Opex for each folder & (depending on options) file in the directory tree.
|
|
505
|
+
|
|
506
|
+
Optional arguments:
|
|
507
|
+
- `-v`, `--version`: show program's version number and exit
|
|
508
|
+
|
|
509
|
+
Opex Options:
|
|
510
|
+
Options that control the generation of Opex Manifests
|
|
511
|
+
|
|
512
|
+
- `-fx [{SHA-1`, `MD5`, `SHA-256`, `SHA-512} ...]`, `--fixity [{SHA-1`, `MD5`, `SHA-256`, `SHA-512} ...]`: Generates a hash for each file and adds it to the opex.
|
|
513
|
+
Can select one or more algorithms to utilise: {-fx MD5 SHA-1}
|
|
514
|
+
If no algorithm is specified defaults to SHA-1.
|
|
515
|
+
|
|
516
|
+
- `--pax-fixity`: Enables use of PAX fixity generation, in line with Preservica's Recommendation.
|
|
517
|
+
"Files / folders ending in .pax or .pax.zip will have individual files in folder / zip added to Opex.
|
|
518
|
+
- `-z`, `--zip`: Set to zip files
|
|
519
|
+
- `--remove-zipped-files`: Set to remove the original files that have been zipped
|
|
520
|
+
- `--remove-empty`: Remove and log empty directories from root. Log will be exported to 'meta' / output folder.
|
|
521
|
+
- `--hidden`: Set whether to include hidden files and folders
|
|
522
|
+
- `-clr`, `--clear-opex`: Clears existing opex files from a directory. If set with no further options will only clear opexes;
|
|
523
|
+
if multiple options are set will clear opexes and then run the program
|
|
524
|
+
- `-opt OPTIONS_FILE`, `--options-file OPTIONS_FILE`: Specify a custom Options file, changing the set presets for column headers (Title,Description,etc)
|
|
525
|
+
|
|
526
|
+
Input Override Options:
|
|
527
|
+
Options that control the Input Override features
|
|
528
|
+
|
|
529
|
+
- `-i [INPUT]`, `--input [INPUT]`: Set to utilise a CSV / XLSX spreadsheet to import data from
|
|
530
|
+
- `-mdir [METADATA_DIR]`, `--metadata-dir [METADATA_DIR]`: Specify the metadata directory to pull XML files from
|
|
531
|
+
- `-m [{exact`, `flat}]`, `--metadata [{exact`, `flat}]`: Set whether to include xml metadata fields in the generation of the Opex
|
|
532
|
+
- `-rm`, `--remove`: Set whether to enable removals of files and folders from a directory. ***Currently in testing
|
|
533
|
+
- `--print-xmls`: Prints the elements from your xmls to the consoles
|
|
534
|
+
- `--convert-xmls`: Convert XMLs templates files in mdir to spreadsheets/csv files
|
|
535
|
+
- `--autoref-options AUTOREF_OPTIONS`: Specify a custom Auto Reference Options file, changing the set presets for Input Override / Auto Reference Generator
|
|
536
|
+
|
|
537
|
+
Auto Reference Generator Options:
|
|
538
|
+
Options that control the Auto Reference Generator features
|
|
539
|
+
|
|
540
|
+
- `-r {catalog`, `accession`, `both`, `generic`, `catalog-generic`, `accession-generic`, `both-generic}`, `--autoref {catalog`, `accession`, `both`, `generic`, `catalog-generic`, `accession-generic`, `both-generic}`: Toggles whether to utilise the auto_reference_generator
|
|
541
|
+
to generate an on the fly Reference listing.
|
|
542
|
+
|
|
543
|
+
There are several options, {catalog} will generate
|
|
544
|
+
a Archival Reference following an ISAD(G) structure.
|
|
545
|
+
|
|
546
|
+
{accession} will create a running number of files.
|
|
547
|
+
{both} will do both at the same time!
|
|
548
|
+
{generic} will populate the title and description fields with the folder/file's name,
|
|
549
|
+
if used in conjunction with one of the above options:
|
|
550
|
+
{generic-catalog,generic-accession, generic-both} it will do both simultaneously.
|
|
551
|
+
|
|
552
|
+
- `-p PREFIX [PREFIX ...]`, `--prefix PREFIX [PREFIX ...]`: Assign a prefix when utilising the --autoref option. Prefix will append any text before all generated text.
|
|
553
|
+
When utilising the {both} option fill in like: [catalog-prefix, accession-prefix] without square brackets.
|
|
554
|
+
|
|
555
|
+
- `-s [SUFFIX]`, `--suffix [SUFFIX]`: Assign a suffix when utilising the --autoref option. Suffix will append any text after all generated text.
|
|
556
|
+
- `--suffix-option {file`, `directory`, `both}`: Set whether to apply the suffix to files, folders or both when utilising the --autoref option.
|
|
557
|
+
- `--accession-mode [{file`, `directory`, `both}]`: Set the mode when utilising the Accession option in autoref.
|
|
558
|
+
file - only adds on files, folder - only adds on folders, both - adds on files and folders
|
|
559
|
+
- `-str [START_REF]`, `--start-ref [START_REF]`: Set a custom Starting reference for the Auto Reference Generator. The generated reference will
|
|
560
|
+
- `-dlm [DELIMITER]`, `--delimiter [DELIMITER]`: Set a custom delimiter for generated references, default is '/'
|
|
561
|
+
- `--sort-by [{folders_first`, `alphabetical}]`: Set the sorting method, 'folders_first' sorts folders first then files alphabetically; 'alphabetically' sorts alphabetically (ignoring folder distinction)
|
|
562
|
+
|
|
563
|
+
Keyword Options:
|
|
564
|
+
Options that control the Keyword features for Auto Reference Generation
|
|
565
|
+
|
|
566
|
+
- `-key [KEYWORDS ...]`, `--keywords [KEYWORDS ...]`: Set to replace reference numbers with given Keywords for folders (only Folders atm). Can be a list of keywords or a JSON file mapping folder names to keywords.
|
|
567
|
+
- `-keym [{initialise`, `firstletters`, `from_json}]`, `--keywords-mode [{initialise`, `firstletters`, `from_json}]`: Set to alternate keyword mode: 'initialise' will use initials of words; 'firstletters' will use the first letters of the string; 'from_json' will use a JSON file mapping names to keywords
|
|
568
|
+
- `--keywords-case-sensitivity`: Set to change case keyword matching sensitivity. By default keyword matching is insensitive
|
|
569
|
+
- `--keywords-retain-order`: Set when using keywords to continue reference numbering. If not used keywords don't 'count' to reference numbering, e.g. if using initials 'Project Alpha' -> 'PA' then the next folder/file will still be '001' not '003'
|
|
570
|
+
- `--keywords-abbreviation-number KEYWORDS_ABBREVIATION_NUMBER [KEYWORDS_ABBREVIATION_NUMBER ...]`: Set to set the number of letters to abbreviate for 'firstletters' mode, does not impact 'initialise' mode.
|
|
571
|
+
|
|
572
|
+
Export Options:
|
|
573
|
+
Options that control various export features
|
|
574
|
+
|
|
575
|
+
- `--log-level [{DEBUG`, `INFO`, `WARNING`, `ERROR}]`: Set the logging level (default: INFO)
|
|
576
|
+
- `--log-file [LOG_FILE]`: Optional path to write logs to a file (default: stdout)
|
|
577
|
+
- `-o [OUTPUT]`, `--output [OUTPUT]`: Sets the output of the meta folder to send any generated files (Remove Empty, Fixity List, Autoref Export) to. Can be used in conjunction with --disable-meta-dir to set output location without generating meta directory.
|
|
578
|
+
- `--disable-meta-dir`: Set whether to disable the creation of a 'meta' directory for generated files,
|
|
579
|
+
default behaviour is to always generate this directory
|
|
580
|
+
- `--disable-all-exports`: Set to prevent all exports (Fixity, Removal, Empty) from being created in the meta directory.
|
|
581
|
+
- `--disable-fixity-export`: Set whether to export the generated fixity list to a text file in the meta directory.
|
|
582
|
+
Enabled by default, disable with this flag.
|
|
583
|
+
- `--disable-empty-export`: Set whether to export the generated empty list to a text file in the meta directory.
|
|
584
|
+
Enabled by default, disable with this flag.
|
|
585
|
+
- `--disable-removal-export`: Set whether to export the generated removals list to a text file in the meta directory.
|
|
586
|
+
Enabled by default, disable with this flag.
|
|
587
|
+
- `-ex`, `--export-autoref`: Set whether to export the generated references to an AutoRef spreadsheet
|
|
588
|
+
- `-fmt {xlsx`, `csv`, `json`, `ods`, `xml}`, `--output-format {xlsx`, `csv`, `json`, `ods`, `xml}`: Set whether to export AutoRef Spreadsheet to: xlsx, csv, json, ods or xml format
|
|
589
|
+
<!-- argparse_to_md_end -->
|
|
590
|
+
|
|
591
|
+
## Future Developments
|
|
592
|
+
|
|
593
|
+
- ~~Customizable Filtering~~ *Added!*
|
|
594
|
+
- ~~Adjust Accession so the different modes can use from Opex~~ *Added!*
|
|
595
|
+
- ~~Add SourceID as option for use with Auto Ref Spreadsheets~~ *Added!*
|
|
596
|
+
- ~~Allow for multiple Identifiers to be added with Auto Ref Spreadsheets. Currently only 1 or 2 identifiers can be added at a time, under "Archive_Reference" or "Accession_Reference". These are also tied to be either "code" or "accref". An Option needs to be added to allow custom setting of identifier~~ *Added!*
|
|
597
|
+
- ~~Add an option/make it a default for Metadata XMLs to be located in a specified directory rather than in the package~~ *Added!*
|
|
598
|
+
- Zipping to conform to PAX - Last on the check list; it technically does...
|
|
599
|
+
- In theory, this tool should be compatible with any system that uses the OPEX standard... But in theory Communism works, in theory...
|
|
600
|
+
|
|
601
|
+
## Troubleshooting
|
|
602
|
+
|
|
603
|
+
- On Windows, ensure that when you enter the root folder it does not end in a `\`. This is slightly annoying as it adds it by default when tabbing.
|
|
604
|
+
- In the examples above, I've used Linux paths. If you're on Windows, don't forget to change these to backslashes `\`
|
|
605
|
+
- There are a number of helpers when entering options: use SHA1 instead of SHA-1, c for catalog, acc for accession.
|
|
606
|
+
|
|
607
|
+
## Developers
|
|
608
|
+
|
|
609
|
+
For Developers, you can also use the tool as a module:
|
|
610
|
+
|
|
611
|
+
```python
|
|
612
|
+
from opex_manifest_generator import OpexManifestGenerator
|
|
613
|
+
|
|
614
|
+
omg = OpexManifestGenerator(root="/path/to/root", algorithm="SHA-256").main()
|
|
615
|
+
```
|
|
616
|
+
|
|
617
|
+
## Contributing
|
|
618
|
+
|
|
619
|
+
I welcome further contributions and feedback! If there are any issues please raise them [here](https://github.com/CPJPRINCE/opex_manifest_generator/issues)
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
opex_manifest_generator/__init__.py,sha256=HsSQLRVsUMOzvT1Cqb3K_J_f5jUOTOu1LkKHJCgwOGY,460
|
|
2
|
+
opex_manifest_generator/cli.py,sha256=fq8BRMqoZ9cm9OZ_tdKPbM4OfszoPuxymkFYTtFrLrM,24011
|
|
3
|
+
opex_manifest_generator/common.py,sha256=uCwyca2cppo4-xemF1Evaouxh9D5VJhPfVAuevOAv7s,3055
|
|
4
|
+
opex_manifest_generator/hash.py,sha256=KcVP96J6zaRacFhsyuGC48CqES3JiytYlZe5Kc3aMdQ,2833
|
|
5
|
+
opex_manifest_generator/opex_manifest.py,sha256=rvXcTNFSzX79HbfINmojuPPqY-trdXX5NM6O73IYx-s,55202
|
|
6
|
+
opex_manifest_generator/metadata/DublinCore Template.xml,sha256=csNGXzSH27Whs4BQNuwMZl8nLSdDq7Y_OblTfzeBqWQ,775
|
|
7
|
+
opex_manifest_generator/metadata/EAD Template.xml,sha256=qr_kaBdt4Klb9IzCrgPN8fZwdS614U4fXHvI2sZQ1Ok,2168
|
|
8
|
+
opex_manifest_generator/metadata/GDPR Template.xml,sha256=-lbX2cp8ubqU21grkcrr4y5rCDdam4h53lOV8gYM2wM,476
|
|
9
|
+
opex_manifest_generator/metadata/MODS Template.xml,sha256=j9KE3f6WuuDyvscLKVF01172q58DCaistku18I_oCO8,2636
|
|
10
|
+
opex_manifest_generator/options/options.properties,sha256=X-svlvQ-mM5AkJLprwRnqQlYjRX6qMIoHKZmdKY5JTk,480
|
|
11
|
+
opex_manifest_generator-1.3.7.dist-info/licenses/LICENSE.md,sha256=z8d0m5b2O9McPEK1xHG_dWgUBT6EfBDz6wA0F7xSPTA,11358
|
|
12
|
+
opex_manifest_generator-1.3.7.dist-info/METADATA,sha256=eTj_KQH9j-21i21yGMLqGSkfpsRvgGq9NrXWSS5MgoQ,28845
|
|
13
|
+
opex_manifest_generator-1.3.7.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
|
|
14
|
+
opex_manifest_generator-1.3.7.dist-info/entry_points.txt,sha256=6IZhtmfD045LUtJcitYNWzE9hLu_IePjQBm8gan2krw,67
|
|
15
|
+
opex_manifest_generator-1.3.7.dist-info/top_level.txt,sha256=K48eGnaDLVO6YDJdAZLqbeoZvJHBGX25cvYT-i8gWt0,24
|
|
16
|
+
opex_manifest_generator-1.3.7.dist-info/RECORD,,
|
|
@@ -199,4 +199,4 @@
|
|
|
199
199
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
200
200
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
201
201
|
See the License for the specific language governing permissions and
|
|
202
|
-
limitations under the License.
|
|
202
|
+
limitations under the License.
|