longleaf 0.2.0.pre.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (165) hide show
  1. checksums.yaml +4 -4
  2. data/.circleci/config.yml +84 -0
  3. data/.gitignore +4 -2
  4. data/.rubocop.yml +42 -2
  5. data/.rubocop_todo.yml +390 -311
  6. data/.yardopts +1 -0
  7. data/Gemfile +16 -1
  8. data/README.md +67 -13
  9. data/Rakefile +6 -0
  10. data/bin/setup +16 -1
  11. data/docs/aboutlongleaf.md +28 -0
  12. data/docs/extra.css +32 -0
  13. data/docs/img/change-file.png +0 -0
  14. data/docs/img/ll-example-preserved.png +0 -0
  15. data/docs/index.md +19 -0
  16. data/docs/install.md +66 -0
  17. data/docs/ll-example/config-example-relative.yml +33 -0
  18. data/docs/ll-example/files-dir/LLexample-PDF.pdf +0 -0
  19. data/docs/ll-example/files-dir/LLexample-TOCHANGE.txt +15 -0
  20. data/docs/ll-example/files-dir/LLexample-tokeep.txt +10 -0
  21. data/docs/ll-example/metadata-dir/.gitkeep +0 -0
  22. data/docs/ll-example/replica-files/.gitkeep +0 -0
  23. data/docs/ll-example/replica-metadata/.gitkeep +0 -0
  24. data/docs/quickstart.md +270 -0
  25. data/docs/rdocs/Longleaf.html +135 -0
  26. data/docs/rdocs/Longleaf/AppFields.html +178 -0
  27. data/docs/rdocs/Longleaf/ApplicationConfigDeserializer.html +631 -0
  28. data/docs/rdocs/Longleaf/ApplicationConfigManager.html +610 -0
  29. data/docs/rdocs/Longleaf/ApplicationConfigValidator.html +238 -0
  30. data/docs/rdocs/Longleaf/CLI.html +909 -0
  31. data/docs/rdocs/Longleaf/ChecksumMismatchError.html +151 -0
  32. data/docs/rdocs/Longleaf/ConfigBuilder.html +1339 -0
  33. data/docs/rdocs/Longleaf/ConfigurationError.html +143 -0
  34. data/docs/rdocs/Longleaf/ConfigurationValidator.html +227 -0
  35. data/docs/rdocs/Longleaf/DeregisterCommand.html +420 -0
  36. data/docs/rdocs/Longleaf/DeregisterEvent.html +453 -0
  37. data/docs/rdocs/Longleaf/DeregistrationError.html +151 -0
  38. data/docs/rdocs/Longleaf/DigestHelper.html +419 -0
  39. data/docs/rdocs/Longleaf/EventError.html +147 -0
  40. data/docs/rdocs/Longleaf/EventNames.html +163 -0
  41. data/docs/rdocs/Longleaf/EventStatusTracking.html +656 -0
  42. data/docs/rdocs/Longleaf/FileCheckService.html +540 -0
  43. data/docs/rdocs/Longleaf/FileHelpers.html +520 -0
  44. data/docs/rdocs/Longleaf/FileRecord.html +716 -0
  45. data/docs/rdocs/Longleaf/FileSelector.html +901 -0
  46. data/docs/rdocs/Longleaf/FixityCheckService.html +691 -0
  47. data/docs/rdocs/Longleaf/IndexManager.html +1155 -0
  48. data/docs/rdocs/Longleaf/InvalidDigestAlgorithmError.html +143 -0
  49. data/docs/rdocs/Longleaf/InvalidStoragePathError.html +143 -0
  50. data/docs/rdocs/Longleaf/Logging.html +405 -0
  51. data/docs/rdocs/Longleaf/Logging/RedirectingLogger.html +1213 -0
  52. data/docs/rdocs/Longleaf/LongleafError.html +139 -0
  53. data/docs/rdocs/Longleaf/MDFields.html +193 -0
  54. data/docs/rdocs/Longleaf/MetadataBuilder.html +787 -0
  55. data/docs/rdocs/Longleaf/MetadataDeserializer.html +537 -0
  56. data/docs/rdocs/Longleaf/MetadataError.html +143 -0
  57. data/docs/rdocs/Longleaf/MetadataPersistenceManager.html +539 -0
  58. data/docs/rdocs/Longleaf/MetadataRecord.html +1411 -0
  59. data/docs/rdocs/Longleaf/MetadataSerializer.html +786 -0
  60. data/docs/rdocs/Longleaf/PreservationServiceError.html +147 -0
  61. data/docs/rdocs/Longleaf/PreserveCommand.html +410 -0
  62. data/docs/rdocs/Longleaf/PreserveEvent.html +491 -0
  63. data/docs/rdocs/Longleaf/RegisterCommand.html +428 -0
  64. data/docs/rdocs/Longleaf/RegisterEvent.html +628 -0
  65. data/docs/rdocs/Longleaf/RegisteredFileSelector.html +446 -0
  66. data/docs/rdocs/Longleaf/RegistrationError.html +151 -0
  67. data/docs/rdocs/Longleaf/ReindexCommand.html +576 -0
  68. data/docs/rdocs/Longleaf/RsyncReplicationService.html +1180 -0
  69. data/docs/rdocs/Longleaf/SequelIndexDriver.html +1978 -0
  70. data/docs/rdocs/Longleaf/ServiceCandidateFilesystemIterator.html +572 -0
  71. data/docs/rdocs/Longleaf/ServiceCandidateIndexIterator.html +532 -0
  72. data/docs/rdocs/Longleaf/ServiceCandidateLocator.html +333 -0
  73. data/docs/rdocs/Longleaf/ServiceClassCache.html +725 -0
  74. data/docs/rdocs/Longleaf/ServiceDateHelper.html +425 -0
  75. data/docs/rdocs/Longleaf/ServiceDefinition.html +683 -0
  76. data/docs/rdocs/Longleaf/ServiceDefinitionManager.html +371 -0
  77. data/docs/rdocs/Longleaf/ServiceDefinitionValidator.html +269 -0
  78. data/docs/rdocs/Longleaf/ServiceFields.html +173 -0
  79. data/docs/rdocs/Longleaf/ServiceManager.html +1229 -0
  80. data/docs/rdocs/Longleaf/ServiceMappingManager.html +410 -0
  81. data/docs/rdocs/Longleaf/ServiceMappingValidator.html +347 -0
  82. data/docs/rdocs/Longleaf/ServiceRecord.html +821 -0
  83. data/docs/rdocs/Longleaf/StorageLocation.html +985 -0
  84. data/docs/rdocs/Longleaf/StorageLocationManager.html +729 -0
  85. data/docs/rdocs/Longleaf/StorageLocationUnavailableError.html +143 -0
  86. data/docs/rdocs/Longleaf/StorageLocationValidator.html +373 -0
  87. data/docs/rdocs/Longleaf/StoragePathValidator.html +253 -0
  88. data/docs/rdocs/Longleaf/SystemConfigBuilder.html +441 -0
  89. data/docs/rdocs/Longleaf/SystemConfigFields.html +163 -0
  90. data/docs/rdocs/Longleaf/ValidateConfigCommand.html +451 -0
  91. data/docs/rdocs/Longleaf/ValidateMetadataCommand.html +408 -0
  92. data/docs/rdocs/_index.html +660 -0
  93. data/docs/rdocs/class_list.html +51 -0
  94. data/docs/rdocs/css/common.css +1 -0
  95. data/docs/rdocs/css/full_list.css +58 -0
  96. data/docs/rdocs/css/style.css +496 -0
  97. data/docs/rdocs/file.README.html +165 -0
  98. data/docs/rdocs/file_list.html +56 -0
  99. data/docs/rdocs/frames.html +17 -0
  100. data/docs/rdocs/index.html +165 -0
  101. data/docs/rdocs/js/app.js +303 -0
  102. data/docs/rdocs/js/full_list.js +216 -0
  103. data/docs/rdocs/js/jquery.js +4 -0
  104. data/docs/rdocs/method_list.html +2051 -0
  105. data/docs/rdocs/top-level-namespace.html +110 -0
  106. data/lib/longleaf/candidates/file_selector.rb +47 -15
  107. data/lib/longleaf/candidates/registered_file_selector.rb +67 -0
  108. data/lib/longleaf/candidates/service_candidate_filesystem_iterator.rb +29 -35
  109. data/lib/longleaf/candidates/service_candidate_index_iterator.rb +84 -0
  110. data/lib/longleaf/candidates/service_candidate_locator.rb +9 -4
  111. data/lib/longleaf/cli.rb +162 -80
  112. data/lib/longleaf/commands/deregister_command.rb +12 -11
  113. data/lib/longleaf/commands/preserve_command.rb +13 -8
  114. data/lib/longleaf/commands/register_command.rb +9 -6
  115. data/lib/longleaf/commands/reindex_command.rb +92 -0
  116. data/lib/longleaf/commands/validate_config_command.rb +27 -6
  117. data/lib/longleaf/commands/validate_metadata_command.rb +11 -9
  118. data/lib/longleaf/errors.rb +12 -12
  119. data/lib/longleaf/events/deregister_event.rb +13 -15
  120. data/lib/longleaf/events/event_status_tracking.rb +7 -7
  121. data/lib/longleaf/events/preserve_event.rb +24 -14
  122. data/lib/longleaf/events/register_event.rb +21 -35
  123. data/lib/longleaf/helpers/digest_helper.rb +4 -4
  124. data/lib/longleaf/helpers/service_date_helper.rb +5 -6
  125. data/lib/longleaf/indexing/index_manager.rb +101 -0
  126. data/lib/longleaf/indexing/sequel_index_driver.rb +324 -0
  127. data/lib/longleaf/logging.rb +4 -4
  128. data/lib/longleaf/logging/redirecting_logger.rb +20 -20
  129. data/lib/longleaf/models/app_fields.rb +2 -1
  130. data/lib/longleaf/models/file_record.rb +10 -6
  131. data/lib/longleaf/models/md_fields.rb +1 -1
  132. data/lib/longleaf/models/metadata_record.rb +22 -12
  133. data/lib/longleaf/models/service_definition.rb +3 -3
  134. data/lib/longleaf/models/service_fields.rb +1 -1
  135. data/lib/longleaf/models/service_record.rb +6 -5
  136. data/lib/longleaf/models/storage_location.rb +26 -7
  137. data/lib/longleaf/models/system_config_fields.rb +9 -0
  138. data/lib/longleaf/preservation_services/file_check_service.rb +58 -0
  139. data/lib/longleaf/preservation_services/fixity_check_service.rb +16 -14
  140. data/lib/longleaf/preservation_services/rsync_replication_service.rb +32 -31
  141. data/lib/longleaf/services/application_config_deserializer.rb +55 -18
  142. data/lib/longleaf/services/application_config_manager.rb +16 -4
  143. data/lib/longleaf/services/application_config_validator.rb +1 -2
  144. data/lib/longleaf/services/configuration_validator.rb +6 -4
  145. data/lib/longleaf/services/metadata_deserializer.rb +40 -38
  146. data/lib/longleaf/services/metadata_persistence_manager.rb +46 -0
  147. data/lib/longleaf/services/metadata_serializer.rb +23 -22
  148. data/lib/longleaf/services/service_class_cache.rb +15 -15
  149. data/lib/longleaf/services/service_definition_manager.rb +5 -6
  150. data/lib/longleaf/services/service_definition_validator.rb +5 -6
  151. data/lib/longleaf/services/service_manager.rb +37 -17
  152. data/lib/longleaf/services/service_mapping_manager.rb +9 -9
  153. data/lib/longleaf/services/service_mapping_validator.rb +9 -10
  154. data/lib/longleaf/services/storage_location_manager.rb +22 -8
  155. data/lib/longleaf/services/storage_location_validator.rb +11 -8
  156. data/lib/longleaf/services/storage_path_validator.rb +1 -1
  157. data/lib/longleaf/specs/config_builder.rb +30 -17
  158. data/lib/longleaf/specs/custom_matchers.rb +1 -1
  159. data/lib/longleaf/specs/file_helpers.rb +15 -14
  160. data/lib/longleaf/specs/metadata_builder.rb +91 -0
  161. data/lib/longleaf/specs/system_config_builder.rb +27 -0
  162. data/lib/longleaf/version.rb +1 -1
  163. data/longleaf.gemspec +17 -7
  164. data/mkdocs.yml +20 -0
  165. metadata +233 -22
data/.yardopts ADDED
@@ -0,0 +1 @@
1
+ --output-dir docs/rdocs
data/Gemfile CHANGED
@@ -1,4 +1,19 @@
1
1
  source 'https://rubygems.org'
2
2
 
3
- # Specify your gem's dependencies in longleaf.gemspec
4
3
  gemspec
4
+
5
+ group :postgres, optional: true do
6
+ gem 'pg', '>= 1.0.0'
7
+ end
8
+
9
+ group :sqlite, optional: true do
10
+ gem 'sqlite3'
11
+ end
12
+
13
+ group :mysql2, optional: true do
14
+ gem 'mysql2', ">= 0.5.0"
15
+ end
16
+
17
+ group :mysql, optional: true do
18
+ gem 'mysql'
19
+ end
data/README.md CHANGED
@@ -1,4 +1,6 @@
1
1
  # Longleaf
2
+ Code: [![CircleCI](https://circleci.com/gh/UNC-Libraries/longleaf-preservation.svg?style=svg)](https://circleci.com/gh/UNC-Libraries/longleaf-preservation)
3
+
2
4
  Longleaf is a command-line tool which allows users to configure a set of storage locations and define custom sets of preservation services to run on their contents. These services are executed in response to applicable preservation events issued by clients. Its primary goal is to provide tools to create a simple and customizable preservation environment. Longleaf:
3
5
 
4
6
  * Offers a predictable command-line interface and integrates with standard command-line tools.
@@ -9,7 +11,29 @@ Longleaf is a command-line tool which allows users to configure a set of storage
9
11
 
10
12
  ## Installation
11
13
 
12
- Add this line to your application's Gemfile:
14
+ There are two primary ways to install Longleaf, depending on how you intend to use it:
15
+
16
+ #### Standalone gem
17
+
18
+ To use Longleaf as a command-line application, the gem can be installed using:
19
+
20
+ ```
21
+ $ gem install longleaf
22
+ ```
23
+
24
+ Or it may be built from source:
25
+
26
+ ```
27
+ $ git clone git@github.com:UNC-Libraries/longleaf-preservation.git
28
+ $ cd longleaf-preservation
29
+ $ bin/setup
30
+ $ bundle exec rake install # builds the gem
31
+ $ gem install --local pkg/longleaf* # installs gem
32
+ ```
33
+
34
+ #### Applicaton dependency
35
+
36
+ To make use of longleaf as a dependency of your application, add this line to your application's Gemfile:
13
37
 
14
38
  ```ruby
15
39
  gem 'longleaf'
@@ -17,23 +41,27 @@ gem 'longleaf'
17
41
 
18
42
  And then execute:
19
43
 
20
- $ bundle
21
-
22
- Or install it yourself as:
23
-
24
- $ gem install longleaf
44
+ ```
45
+ $ bundle
46
+ ```
25
47
 
26
48
  ## Usage
27
49
 
28
50
  #### Register a file
29
51
  In order to register a new file with Longleaf, use the register command:
30
- `longleaf register -c <config.yml> -f <path to file>`
52
+
53
+ ```
54
+ longleaf register -c <config.yml> -f <path to file>
55
+ ```
31
56
 
32
57
  In the case that a file's content is replaced, the file can be re-registered by providing the `--force` flag.
33
58
 
34
59
  #### Validate configuration files
35
60
  Application configuration files can be validated prior to usage with the following command:
36
- `longleaf validate_config -c <config.yml>`
61
+
62
+ ```
63
+ longleaf validate_config -c <config.yml>
64
+ ```
37
65
 
38
66
  #### Output and logging
39
67
 
@@ -42,21 +70,47 @@ The primary output from Longleaf is directed to STDOUT, and contains both succes
42
70
  Additional logging is sent to STDERR. To control the level of logging, you may provide the `--log-level` parameter, which expects the standard [Ruby Logger levels](https://ruby-doc.org/stdlib-2.4.0/libdoc/logger/rdoc/Logger.html). The default log level is 'WARN'.
43
71
 
44
72
  Messages sent to STDOUT are duplicated to STDERR at 'INFO' level, so they are excluded by default. In order to store an ongoing log of activity and errors, you would perform the following:
45
- `longleaf <command> --log-level 'INFO' 2> /logs/longleaf.log`
73
+
74
+ ```
75
+ longleaf <command> --log-level 'INFO' 2> /logs/longleaf.log
76
+ ```
46
77
 
47
78
  ## Development
48
79
 
49
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `bundle exec rspec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
80
+ After checking out the repo, run `bin/setup` to install dependencies.
50
81
 
51
- To run Longleaf with local changes without needing to do a local install, you may run `bundle exec exe/longleaf <command>`.
82
+ To perform the tests, run:
83
+ ```
84
+ bundle exec rspec
85
+ ```
52
86
 
53
- To install this gem onto your local machine, run `bundle exec rake install`. This will allow you to run `longleaf <command>` and places the gem into `pkg/`. Note: Only files committed to git will be included in the installed gem.
87
+ To run Longleaf with local changes without needing to do a local install, you may run:
88
+ ```
89
+ bundle exec exe/longleaf <command>
90
+ ```
91
+
92
+ To install this gem onto your local machine, run:
93
+ ```
94
+ bundle exec rake install
95
+ ```
96
+
97
+ This places a newly built gem into the `pkg/` directory. This gem may then be installed in order to run commands in the `longleaf <command>` form.
98
+ _Note:_ Only files committed to git will be included in the installed gem.
54
99
 
55
100
  To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
56
101
 
102
+ ## Indexing
103
+ To use an index to improve performance, you will need to install the database drivers separately or bundle longleaf with the driver you wish to use:
104
+
105
+ ```
106
+ bundle install --with postgres
107
+ ```
108
+
109
+ Options include: postgres, mysql2, mysql, sqlite, amalgalite
110
+
57
111
  ## Contributing
58
112
 
59
- Bug reports and pull requests are welcome on GitHub at https://gitlab.lib.unc.edu/cdr/longleaf.
113
+ Bug reports and pull requests are welcome on GitHub at https://github.com/UNC-Libraries/longleaf-preservation.
60
114
 
61
115
 
62
116
  ## License
data/Rakefile CHANGED
@@ -1,6 +1,12 @@
1
1
  require "bundler/gem_tasks"
2
2
  require "rspec/core/rake_task"
3
+ require "yard"
3
4
 
4
5
  RSpec::Core::RakeTask.new(:spec)
5
6
 
6
7
  task :default => :spec
8
+
9
+ YARD::Rake::YardocTask.new do |t|
10
+ t.options = ['--private', '--protected']
11
+ t.stats_options = ['--list-undoc']
12
+ end
data/bin/setup CHANGED
@@ -1,8 +1,23 @@
1
1
  #!/usr/bin/env bash
2
2
  set -euo pipefail
3
3
  IFS=$'\n\t'
4
+
5
+ INSTALL_DEST=(--path '.bundle/')
6
+
7
+ while [[ $# -gt 0 ]]
8
+ do
9
+ key="$1"
10
+
11
+ case $key in
12
+ -s|--system)
13
+ INSTALL_DEST=(--system)
14
+ shift # past argument
15
+ ;;
16
+ esac
17
+ done
18
+
4
19
  set -vx
5
20
 
6
- bundle install
21
+ bundle install ${INSTALL_DEST[@]} --without ""
7
22
 
8
23
  # Do any other automated setup that you need to do here
@@ -0,0 +1,28 @@
1
+ ## About Longleaf
2
+
3
+ Longleaf, developed at the University of North Carolina at Chapel Hill University Libraries, is an open-source, portable digital preservation tool, designed to enable the creation and application of highly configurable preservation plans for large and varied collections of digital content across multiple systems.
4
+
5
+ Longleaf addresses challenges we have encountered over the past 20 years of growing and managing our digital collections infrastructure, which we feel are shared by other institutions:
6
+
7
+ 1. Preservation activities being applied to files based on system affiliation (i.e. repository platform or lack thereof) rather than the needs of the content.
8
+
9
+ 1. Difficulty maintaining an ideal schedule of fixity checks as the sizes of our collections grow.
10
+
11
+ 1. Physical and computational costs to servers and storage devices imposed by ongoing cryptographic checksum algorithms (Altman et al., 2013).
12
+
13
+ 1. Difficulty gradually introducing cloud storage services into our replication strategy for vulnerable files.
14
+
15
+ The goal for Longleaf is to provide organizations with a flexible tool for creating and implementing tailored preservation practices across the scope of their content, based on appropriate levels of digital preservation need (Phillips, Bailey, Goethals, & Owens, 2013). To this end, we have designed Longleaf according to the principles of high “software availability” (Davidson & Casden, 2016) that prioritize ease of use by a broad set of users in a variety of environments.
16
+
17
+ The Longleaf application is a command-line utility that will run on any modern Linux operating system with only a ruby interpreter. Longleaf can be applied to any content storage system with a file system, and requires no repository, no external database, and no storage system other than the file system. It can be run completely from the command line or triggered by arbitrary external systems (e.g. initiated on file ingest). Preservation processes are targeted at the file and storage level rather than through a repository system intermediary, allowing files managed in temporary storage or non-preservation asset management systems to benefit from the same replication and verification processes as those ingested into preservation repositories. For example, in our own digital collections context, we will be applying Longleaf to content across a wide variety of systems, including files managed entirely on shared drives, files managed by Fedora-based repositories, as well as digitization masters managed by CONTENTdm.
18
+
19
+ Longleaf’s modular architecture and flexible configuration system reduce the interference of repository system constraints, to enable needs-based digital preservation planning processes such as evaluating and configuring specific preservation activities across subsets of larger collections. For our collections, we are using Longleaf to begin addressing specific challenges such as managing the physical and computational costs of large-scale fixity verification, and integrating storage endpoints with different access costs. For example, we are increasing coverage of ongoing and transactional fixity checks (Gallinger, Bailey, Cariani, Owens, & Altman, 2017) by providing typical cryptographic checksums alongside more scaleable non-cryptographic checks and filesystem checks. We can also determine more appropriate replication and verification schedules and techniques overall, based on the characteristics of both the source and destination storage locations for content.
20
+
21
+ ## References
22
+ Altman, M., Bailey, J., Cariani, K., Gallinger, M., Mandelbaum, J., & Owens, T. (2013). NDSA Storage Report: Reflections on National Digital Stewardship Alliance Member Approaches to Preservation Storage Technologies. D-Lib Magazine, 19(5/6). https://doi.org/10.1045/may2013-altman
23
+
24
+ Davidson, B., & Casden, J. (2016). Beyond open source. Code4Lib Journal, Issue 31. Retrieved from http://journal.code4lib.org/articles/11148
25
+
26
+ Gallinger, M., Bailey, J., Cariani, K., Owens, T., & Altman, M. (2017). Trends in Digital Preservation Capacity and Practice: Results from the 2nd Bi-annual National Digital Stewardship Alliance Storage Survey. D-Lib Magazine, 23(7/8). https://doi.org/10.1045/july2017-gallinger
27
+
28
+ Phillips, M., Bailey, J., Goethals, A., & Owens, T. (2013). The NDSA levels of digital preservation: Explanation and uses. In Archiving Conference (Vol. 2013, pp. 216–222). Society for Imaging Science and Technology.
data/docs/extra.css ADDED
@@ -0,0 +1,32 @@
1
+ /* added link colors to text that's not in side nav menu */
2
+
3
+ .rst-content a:link { color:#0000FF; text-decoration:none; font-weight:normal; }
4
+
5
+ .rst-content a:visited { color:#800080; text-decoration:none; font-weight:normal; }
6
+
7
+ .rst-content a:hover { color:aqua; text-decoration:none; font-weight:normal; }
8
+
9
+ .rst-content a:active { color:seagreen; text-decoration:none; font-weight:normal; }
10
+
11
+
12
+ /* changed background colors in side nav menu */
13
+
14
+ .wy-side-nav-search {
15
+ background-color: mediumseagreen;
16
+ }
17
+
18
+ .wy-menu-vertical span {
19
+ /* line-height: 18px;*/
20
+ line-height: 20px;
21
+
22
+ padding: 0.4045em 1.618em;
23
+ display: block;
24
+ position: relative;
25
+ font-size: 90%;
26
+ color: #838383;
27
+ }
28
+
29
+ .wy-nav-side {
30
+ background: ##404040;
31
+ }
32
+ /*
Binary file
Binary file
data/docs/index.md ADDED
@@ -0,0 +1,19 @@
1
+ ## Longleaf Project Overview
2
+
3
+ Welcome to the Longleaf Project Documentation site.
4
+
5
+ This site contains supporting documentation for the Longleaf software project, currently in development by The University of North Carolina at Chapel Hill's University Libraries.
6
+
7
+ Longleaf is free, open-source software for creating highly configurable digital preservation environments. Longleaf provides a repository-agnostic command-line utility that enables users to configure and apply preservation processes such as monitoring and replication at the file level. Users can define preservation services using Longleaf's base set of commands, and then apply these preservation scripts to a set of user-defined storage locations that are relevant to the user's specific context.
8
+
9
+ For a step-by-step tutorial of Longleaf's basic functionality, the ["Basic Usage" page](quickstart.md) in this site walks through Longleaf's core operations using a pre-configured example data directory.
10
+
11
+ ## Project URLs
12
+
13
+ Code repository [Longleaf-preservation ](https://github.com/UNC-Libraries/longleaf-preservation)
14
+
15
+ ## Project Participants
16
+
17
+ * Ben Pennell, Technical lead and software developer
18
+ * Jason Casden, Product lead
19
+ * Morgan McKeehan, Documentation lead
data/docs/install.md ADDED
@@ -0,0 +1,66 @@
1
+ # Installing Longleaf
2
+
3
+ ### Ruby and other Prerequisites
4
+ Longleaf requires Ruby 2.3 or higher.
5
+
6
+ There also are optional gem dependencies if the user wishes to use an index to improve performance.
7
+
8
+ Additionally, Longleaf scripts rely on common Unix programs. In Mac OS X and Linux operating systems, these programs will likely already be installed, but some of these tools, such as `rsync` may be missing from a Windows system unless you have installed them.
9
+
10
+ ### Download Longleaf
11
+ Download Longleaf from UNC Chapel Hill's University Libraries [Longleaf github repository](https://github.com/UNC-Libraries/longleaf-preservation).
12
+
13
+ ### Install Longleaf
14
+
15
+ There are two ways to install Longleaf, depending on how you intend to use it:
16
+
17
+ **1. Standalone gem**
18
+
19
+ To use Longleaf as a command-line application, the gem can be installed using:
20
+
21
+ $ gem install longleaf
22
+
23
+ Or it may be built from source:
24
+
25
+ $ git clone git@github.com:UNC-Libraries/longleaf-preservation.git
26
+ $ cd longleaf-preservation
27
+ $ bin/setup --system
28
+ $ bundle exec rake install # builds the gem
29
+ $ gem install --local pkg/longleaf* # installs gem
30
+
31
+ **2. Application dependency**
32
+
33
+ To include longleaf as a dependency in an application, add this line to your application's Gemfile:
34
+
35
+ ```ruby
36
+ gem 'longleaf'
37
+ ```
38
+
39
+ And then execute:
40
+
41
+ $ bundle
42
+
43
+ ### Confirm Longleaf Installation
44
+ If you have installed Longleaf using the "Standalone gem" approach, you can check to make sure that the installation succeeded by typing the following into your terminal:
45
+
46
+ ```
47
+ longleaf
48
+ ```
49
+
50
+ You should see the Longleaf help page:
51
+ ```
52
+ Commands:
53
+ longleaf --version # Prints the Longleaf version number.
54
+ longleaf deregister # Deregister files with Longleaf
55
+ longleaf help [COMMAND] # Describe available commands or one specific command
56
+ longleaf preserve # Perform preservation services on files with Longleaf
57
+ longleaf register # Register files with Longleaf
58
+ longleaf reindex # Perform a full reindex of file metadata stored within the configured storage locations.
59
+ longleaf setup_index # Sets up the structure of the metadata index, if one is configured using the system configuration file provide...
60
+ longleaf validate_config # Validate an application configuration file, provided using --config.
61
+ longleaf validate_metadata # Validate metadata files.
62
+
63
+
64
+ ```
65
+ ### Installation Success!
66
+ If the Longleaf Help page printed successfully, you are ready to proceed to the [Basic Usage tutorial](quickstart.md) to try out basic Longleaf functionality.
@@ -0,0 +1,33 @@
1
+ # Storage locations below use relative paths. When validating and using
2
+ # the configuration file, paths for the storage locations will be evaluated
3
+ # relative to the location of the configuration file. In this example tutorial,
4
+ # the storage locations are directories ('files-dir' and 'replica-files')
5
+ # located at the same level as the config file, within the ll-example
6
+ # directory.
7
+
8
+ locations:
9
+ data-directory:
10
+ path: files-dir
11
+ metadata_path: metadata-dir
12
+ backup-directory:
13
+ path: replica-files
14
+ metadata_path: replica-metadata
15
+ services:
16
+ example_replication:
17
+ work_script: 'rsync_replication_service'
18
+ to:
19
+ - backup-directory
20
+ example_fixity:
21
+ work_script: 'fixity_check_service'
22
+ frequency: 30 seconds
23
+ absent_digest: generate
24
+ algorithms:
25
+ - sha1
26
+ service_mappings:
27
+ - locations: data-directory
28
+ services:
29
+ - example_replication
30
+ - example_fixity
31
+ - locations: backup-directory
32
+ services:
33
+ - example_fixity
@@ -0,0 +1,15 @@
1
+ ## MAKE CHANGES TO THIS FILE
2
+
3
+ * make any changes here!
4
+ * ( insert changed text )
5
+
6
+ ## Longleaf
7
+ Longleaf is a command-line tool which allows users to configure a set of storage locations and define custom sets of preservation services to run on their contents. These services are executed in response to applicable preservation events issued by clients. Its primary goal is to provide tools to create a simple and customizable preservation environment.
8
+
9
+ Longleaf:
10
+
11
+ * Offers a predictable command-line interface and integrates with standard command-line tools.
12
+ * Offers configurable and customizable criteria based preservation workflows.
13
+ * Provides a base set of tools and a framework for building extensions.
14
+ * Provides activity logging and notifications.
15
+ * Performs preservation services only when required.
@@ -0,0 +1,10 @@
1
+ ## Longleaf
2
+ Longleaf is a command-line tool which allows users to configure a set of storage locations and define custom sets of preservation services to run on their contents. These services are executed in response to applicable preservation events issued by clients. Its primary goal is to provide tools to create a simple and customizable preservation environment.
3
+
4
+ Longleaf:
5
+
6
+ * Offers a predictable command-line interface and integrates with standard command-line tools.
7
+ * Offers configurable and customizable criteria based preservation workflows.
8
+ * Provides a base set of tools and a framework for building extensions.
9
+ * Provides activity logging and notifications.
10
+ * Performs preservation services only when required.
File without changes
File without changes
File without changes
@@ -0,0 +1,270 @@
1
+ ## Longleaf basic usage tutorial
2
+
3
+ This tutorial provides an introduction to the basic functionality of Longleaf by demonstrating how to configure and execute a small set of preservation tasks, using an example data directory as a local sandbox on your own computer.
4
+
5
+ **Longleaf basic usage tasks covered in this tutorial:**
6
+
7
+ * Validate the mandatory Longleaf configuration file that contains the storage locations and preservation activities you will use
8
+ * Register example data files
9
+ * Validate metadata for registered files
10
+ * Run a Preservation action on example data files
11
+ * Alter an example data file to cause an error; catch the error by re-running the Preservation action
12
+ * Re-register the altered file, and preserve the file in its new, altered state
13
+
14
+ ### System Requirements for this tutorial
15
+ Longleaf scripts rely on common UNIX programs. In Mac OS X and Linux operating systems, these programs will likely already be installed, but some of these tools, such as `rsync` may be missing from a Windows system unless you have installed it.
16
+
17
+ ### Example data directory for this tutorial
18
+ The example data directory for this tutorial is named 'll-example'; it is located in the 'docs' folder of the longleaf-preservation repository.
19
+
20
+ The 'll-example' directory contains all the materials required for completing this tutorial:
21
+
22
+ * an example configuration file, 'config-example-relative.yml', that is pre-configured for tutorial tasks
23
+ * a folder of example data files to be preserved
24
+ * empty scaffolding folders that will be used for storing: 1) metadata files about the original data files, 2) the replicated data files created from the originals, and 3) metadata files about the replicated data files.
25
+
26
+ **folder structure of 'll-example' directory: materials for tutorial tasks**
27
+ ```tree
28
+ └───ll-example
29
+ │ config-example-relative.yml
30
+
31
+ ├───files-dir
32
+ │ LLexample-PDF.pdf
33
+ │ LLexample-TOCHANGE.txt
34
+ │ LLexample-tokeep.txt
35
+
36
+ ├───metadata-dir
37
+
38
+
39
+ ├───replica-files
40
+
41
+
42
+ └───replica-metadata
43
+ ```
44
+
45
+ Subdirectories in 'll-example' are used as follows:
46
+
47
+ **'files-dir'**
48
+ Contains three example data files that will be preserved and replicated.
49
+
50
+ **'metadata-dir'**
51
+ Empty at start of tutorial; metadata files about the data files in 'files-dir' will be stored and automatically updated in this directory during Register and Preserve actions.
52
+
53
+ **'replica-files'**
54
+ Empty at start of tutorial; replicated data files (copies) created during Preserve actions run on the data files in 'files-dir' will be stored in this directory.
55
+
56
+ **'replica-metadata'**
57
+ Empty at start of tutorial; metadata files about the replicated data files in 'replica-files' will be stored in this directory.
58
+
59
+ ### Set up your local sandbox: copy the example data directory to your Desktop
60
+ Copy the 'll-example' directory to a location where it will be easy to access and work with. For example,it might be convenient to copy 'll-example' to your 'Desktop' location.
61
+
62
+ Note: All paths in the commands in this tutorial are relative to the 'll-example' folder itself, so if you execute commands from within 'll-example,' you can use the paths as shown with no alterations.
63
+
64
+ ### Review the example Longleaf configuration file
65
+
66
+ **Locations**
67
+ The example configuration file references storage locations based on the subdirectory names in 'll-example':
68
+
69
+ * The 'data-directory' location points to 'files-dir' as the location of the original data files to be preserved, and 'metadata-dir' for storing metadata files created about these original data files.
70
+ * The 'backup-directory' location refers to the 'replica-files' folder as the storage location where the copies of the original files will be stored, and 'replica-metadata' for metadata files.
71
+
72
+ **Services**
73
+ The 'services' area in the configuration file defines characteristics of work scripts, such as the target storage location for the replication script, and the checksum algorithm for the fixity script.
74
+
75
+ **Service Mappings**
76
+ The 'service-mappings' area specifies which services will run at a particular storage location. In the example configuration file, for example, the services 'example_replication' and 'example_fixity' are assigned to run at location 'data-directory'. At location 'backup_directory', however, only 'example_fixity' is assigned to run, so files there will have fixity checks, but will not get replicated.
77
+
78
+ ```yml
79
+ locations: data-directory
80
+ services:
81
+ - example_replication
82
+ - example_fixity
83
+ - locations: backup-directory
84
+ services:
85
+ - example_fixity
86
+ ```
87
+
88
+ **Relative Paths in the Configuration File**
89
+ Storage locations in this configuration file specify paths relative to the location of the configuration file itself, and will also be evaluated relative to the location of the configuration file itself when Longleaf commands are run. In this tutorial, the configuration file is located inside 'll-example', at the same level as the data and metadata folders.
90
+
91
+ ### Check that you can run Longleaf, by viewing the help text page
92
+ Once you have copied the 'll-example' directory to an accessible working location, and have familiarized yourself with the configuration yaml file, you are ready to start using Longleaf!
93
+
94
+ First confirm that you can run Longleaf, as you did in the ["Installation Instructions" page](install.md) of this site. Open a terminal window and `cd` into the 'll-example' directory, then enter the command 'longleaf', with no arguments:
95
+ ```
96
+ longleaf
97
+ ```
98
+ This command will output the help page text, showing all available commands.
99
+
100
+ ### Validate the example configuration file
101
+ Next, you will validate the Longleaf configuration file, to ensure that storage locations and paths are correctly specified and accessible.
102
+
103
+ As noted above, all paths in this tutorial are relative to the 'll-example' directory, so make sure that you execute commands from within that directory, or amend paths accordingly.
104
+
105
+ Command for validating configuration file:
106
+
107
+ ```
108
+ longleaf validate_config -c config-example-relative.yml
109
+ ```
110
+
111
+ Successful validation of the configuration file outputs the following message in terminal:
112
+ ```
113
+ SUCCESS: Application configuration passed validation: config-example-relative.yml
114
+ ```
115
+
116
+ ### Register example data files
117
+ Once the configuration file has been validated, you can proceed to registering the example data files, so that they are ready for further preservation actions with Longleaf.
118
+
119
+ Note that Longleaf commands can be run on directories, or individual files. For example, when using the Register command, you can either:
120
+
121
+ 1. register a whole directory at once, which will individually register every file in that directory
122
+ **or**
123
+ 2. register individual files.
124
+
125
+ In this example of using the Register command, we will register an entire directory of files by specifying the directory path.
126
+
127
+ Use command:
128
+ ```
129
+ longleaf register -c config-example-relative.yml -f files-dir
130
+ ```
131
+ If files successfully registered, Longleaf will list the registration outcome for each file within the directory:
132
+ ```
133
+ SUCCESS register /data/ll-example/files-dir/LLexample-PDF.pdf
134
+ SUCCESS register /data/ll-example/files-dir/LLexample-TOCHANGE.txt
135
+ SUCCESS register /data/ll-example/files-dir/LLexample-tokeep.txt
136
+
137
+ ```
138
+
139
+ ### Validate the new metadata files
140
+ Validate the metadata files that were created for the files you just registered.
141
+ Use command:
142
+ ```
143
+ longleaf validate_metadata -c config-example-relative.yml -f files-dir
144
+ ```
145
+ Confirmation of validation for the metadata files:
146
+ ```
147
+ SUCCESS: Metadata for file passed validation: /data/ll-example/files-dir/LLexample-PDF.pdf
148
+ SUCCESS: Metadata for file passed validation: /data/ll-example/files-dir/LLexample-TOCHANGE.txt
149
+ SUCCESS: Metadata for file passed validation: /data/ll-example/files-dir/LLexample-tokeep.txt
150
+
151
+ ```
152
+
153
+
154
+ ### Run the Preserve command, to replicate and check fixity on the registered files.
155
+ Use command:
156
+ ```
157
+ longleaf preserve -c config-example-relative.yml -f files-dir
158
+ ```
159
+ Confirmation of successful Preservation action completed on all files in the files-dir directory:
160
+ ```
161
+ SUCCESS register /data/ll-example/replica-files/LLexample-PDF.pdf
162
+ SUCCESS preserve[example_replication] /data/ll-example/files-dir/LLexample-PDF.pdf
163
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-PDF.pdf
164
+ SUCCESS register /data/ll-example/replica-files/LLexample-TOCHANGE.txt
165
+ SUCCESS preserve[example_replication] /data/ll-example/files-dir/LLexample-TOCHANGE.txt
166
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-TOCHANGE.txt
167
+ SUCCESS register /data/ll-example/replica-files/LLexample-tokeep.txt
168
+ SUCCESS preserve[example_replication] /data/ll-example/files-dir/LLexample-tokeep.txt
169
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-tokeep.txt
170
+
171
+ ```
172
+
173
+ ### Look at the output of the Preserve command
174
+ Take a look at the SUCCESS output messages from the `preserve` command above.
175
+
176
+ Notice that for each file included in the Preservation action, Longleaf indicates "SUCCESS" for the "register" and "preserve" components of the Preservation action, as well as the individual scripts within the "preserve" component.
177
+
178
+ Also take a look inside the subfolders in your 'll-example' sandbox folder, using Finder or command-line tools.
179
+
180
+ You'll notice that there are newly-created data files and metadata files in the 'replica-' directories. The original data files from 'files-dir' have been copied to the 'replica-files' directory, and metadata files for those copies have been created in the 'replica-metadata' directory.
181
+
182
+ ![preserve files folders output](img/ll-example-preserved.png)
183
+
184
+ ### Re-run the Preserve command, to check the integrity of the files that are being preserved.
185
+
186
+ Since the 'example_fixity' script in our example configuration file is set to frequency: 30 seconds, we can check fixity on the preserved files once 30 seconds have elapsed since the last preservation action.
187
+
188
+ Use command:
189
+ ```
190
+ longleaf preserve -c config-example-relative.yml -f files-dir
191
+ ```
192
+ Success output message:
193
+ ```
194
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-PDF.pdf
195
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-TOCHANGE.txt
196
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-tokeep.txt
197
+
198
+ ```
199
+ Note that the output in terminal window is much shorter than the first `preserve` command. This output only shows "SUCCESS" and preserve[example_fixity] for each file. This time register and replication are not included because those actions already ran, and do not need to run this time. Only fixity checks ran in this Preserve action.
200
+
201
+ In the finder window, notice that the Date Modified is updated on the metadata files in the metadata-dir. The metadata files for the original data files are updated to reflect these Preservation actions running.
202
+
203
+ ### Cause a change to a file, and catch the change with the Preservation command.
204
+ Now make a change to one of the files that is being preserved, which you will then catch by re-running `preserve` on the storage directory for that file.
205
+
206
+ Open up the file: 'll-example/files-dir/LLexample-TOCHANGE.txt'. On line 4 of the text file, you'll see "( insert changed text ) ". Insert your cursor near this line, and type something to make a change to the file. Then save the file and close it. In your 'll-example/files-dir' directory, notice that the Date Modified has changed for this file.
207
+
208
+ Run the `preserve` command again. This time it will fail, because one of the files under preservation has been altered since the file's last checksums were recorded.
209
+
210
+ Use command:
211
+ ```
212
+ longleaf preserve -c config-example-relative.yml -f files-dir
213
+ ```
214
+ Output showing failed fixity check on the file 'll-example/files-dir/LLexample-TOCHANGE.txt':
215
+ ```
216
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-PDF.pdf
217
+ FAILURE preserve[example_fixity] /data/ll-example/files-dir/LLexample-TOCHANGE.txt: Fixity check using algorithm 'sha1' failed for file /data/ll-example/files-dir/LLexample-TOCHANGE.txt: expected 'effa1bfc1b93f1260a36044bcd668240cf14738a', calculated 'cd5e53f6945ceda2d90b003a39af509d7acb068f.'
218
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-tokeep.txt
219
+
220
+ ```
221
+
222
+ ### Re-register the altered file.
223
+ Now you will re-register the altered file, using the `--force` option for the `register` command. Run this registration command at the file level so that only this file is re-registered, not the entire directory.
224
+
225
+ Re-registration will update the file's preservation metadata to the new state of 'LLexample-TOCHANGE.txt'.
226
+
227
+ Note, however, that the re-registered altered file (ll-example/files-dir/LLexample-TOCHANGE.txt) will still not match the copy of LLexample-TOCHANGE.txt that is stored in "replica-files" (ll-example/replica-files/LLexample-TOCHANGE.txt), because the replica copy was created earlier, from the unaltered original file.
228
+
229
+ Use command:
230
+ ```
231
+ longleaf register -c config-example-relative.yml -f files-dir/LLexample-TOCHANGE.txt --force
232
+ ```
233
+ Success output message:
234
+ ```
235
+ SUCCESS register /data/ll-example/files-dir/LLexample-TOCHANGE.txt
236
+ ```
237
+
238
+ ### Run Preserve on the re-registered file.
239
+ Running `preserve` on the re-registered altered file will replicate the new state of this file to the copied data location. As with `register --force`, run `preserve` at the file level, not the directory.
240
+
241
+ Use command:
242
+ ```
243
+ longleaf preserve -c config-example-relative.yml -f files-dir/LLexample-TOCHANGE.txt
244
+ ```
245
+
246
+ The terminal output shows the new registration of the file, and the replication and fixity scripts running on this file.
247
+ In finder, you can see that the Date Modified of the 'll-example-TOCHANGE.txt' file in the 'replica-files' directory now matches the original file. Open up 'replica-files/ll-example-TOCHANGE.txt' and you'll see the added line of text in this file.
248
+
249
+ Successful preservation of re-registered changed file:
250
+ ```
251
+ SUCCESS register /data/ll-example/replica-files/LLexample-TOCHANGE.txt
252
+ SUCCESS preserve[example_replication] /data/ll-example/files-dir/LLexample-TOCHANGE.txt
253
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-TOCHANGE.txt
254
+
255
+ ```
256
+
257
+ ### Re-run Preserve at the directory level to check all three files.
258
+ After the 30-sec fixity check interval has elapsed, re-run the `preserve` command at the directory level as before, and now all three files pass again.
259
+
260
+ Use command:
261
+ ```
262
+ longleaf preserve -c config-example-relative.yml -f files-dir
263
+ ```
264
+ Success output message:
265
+ ```
266
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-PDF.pdf
267
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-TOCHANGE.txt
268
+ SUCCESS preserve[example_fixity] /data/ll-example/files-dir/LLexample-tokeep.txt
269
+
270
+ ```