fingerprint 1.4.0 → 3.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 2c6867e9566430e12dc2d913a013c469e3b2bc47
4
- data.tar.gz: 58d9a1812dcd041743816328ac3e71fc79496014
2
+ SHA256:
3
+ metadata.gz: d80181117a0710868e55f5218a6dd6b9241d151878826bfeffc81a4d79b3d2bf
4
+ data.tar.gz: 2ec8a17ce51398454fcaa082b0f08d87a37ef3753e28a3bb9c1a7252bc92db95
5
5
  SHA512:
6
- metadata.gz: ac0bfcca4b51a31797849aa039c795626f78d7a5d0b5278aadbb9872b635d7ca3cd369b3cd08de77c48545ca58af4aad83ede79e99749b56e896a03b8e238e5e
7
- data.tar.gz: 76f51cf43136114a35cafb99958da8e7099c98ee68a931efa7a1b853e76dc9025a2bedd6df697ebf69b7c7f47004f3616865514beea9c980d8a9e4798f7eec58
6
+ metadata.gz: a075aad140130acd196e13ee9c265d3b722d5879e681ff4852b81ea9d3885d34e649daf821a0da33a4e9c9b370f67905bee573dfe380d88db6e0195e95c41ce1
7
+ data.tar.gz: 594c2a23c98b18cd76be232c184b0176f4f6ac1f1fd240183e35969633440dc7deceeef442a43a80fb1d0d705e9b829e5ce5e9b4167fa022f0ad5facc843bb16
@@ -0,0 +1,15 @@
1
+ .tags
2
+
3
+ /.bundle/
4
+ /.yardoc
5
+ /Gemfile.lock
6
+ /_yardoc/
7
+ /coverage/
8
+ /doc/
9
+ /pkg/
10
+ /spec/reports/
11
+ /tmp/
12
+
13
+ .rspec_status
14
+ .covered.db
15
+ index.fingerprint
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --warnings
3
+ --require spec_helper
@@ -0,0 +1,23 @@
1
+ language: ruby
2
+ dist: xenial
3
+ cache: bundler
4
+
5
+ matrix:
6
+ include:
7
+ - rvm: 2.3
8
+ - rvm: 2.4
9
+ - rvm: 2.5
10
+ - rvm: 2.6
11
+ - rvm: 2.6
12
+ env: COVERAGE=Summary,Coveralls
13
+ - rvm: truffleruby
14
+ - rvm: jruby-head
15
+ env: JRUBY_OPTS="--debug -X+O"
16
+ - rvm: ruby-head
17
+ - rvm: 2.6
18
+ os: osx
19
+ allow_failures:
20
+ - rvm: truffleruby
21
+ - rvm: ruby-head
22
+ - rvm: jruby-head
23
+ - rvm: truffleruby
@@ -0,0 +1,233 @@
1
+ # Fingerprint Guide
2
+
3
+ This guide gives an overview of the various ways in which fingerprint can be used.
4
+
5
+ ## Fingerprint Digests
6
+
7
+ The following are the options for which digests can be generated:
8
+
9
+ * `MD5`
10
+ * `SHA1`
11
+ * `SHA2.256`
12
+ * `SHA2.384`
13
+ * `SHA2.512`
14
+
15
+ `SHA2.256` is the default.
16
+
17
+ The use of `MD5` and `SHA1` are no longer recommended due to the
18
+ risk of hash collisions.
19
+
20
+ ## Generating fingerprints
21
+
22
+ Fingerprint is designed to index directories. The basic verb `scan` will index all paths specified in order (or the current path if none is given).
23
+
24
+ ``` text
25
+ $ fingerprint scan /tmp/test
26
+ C /tmp/test
27
+ fingerprint.version 2.1.5
28
+ options.checksums SHA2.256
29
+ options.extended false
30
+ summary.time.start 2016-12-09 16:42:41 -0800
31
+ D
32
+ F file-1.txt
33
+ file.size 0
34
+ key.SHA2.256 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
35
+ F file-2.txt
36
+ file.size 0
37
+ key.SHA2.256 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
38
+ F file-3.sh
39
+ file.size 0
40
+ key.SHA2.256 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
41
+ S 3 files processed.
42
+ summary.directories 1
43
+ summary.excluded 0
44
+ summary.files 3
45
+ summary.size 0
46
+ summary.time.end 2016-12-09 16:42:41 -0800
47
+ ```
48
+
49
+ ## Verifing Fingerprints
50
+
51
+ Files can be checked against a given fingerprint. In the case of `verify`, only changes and deletions will be reported. Additions are not reported.
52
+
53
+ To generate a fingerprint for a given path:
54
+
55
+ fingerprint scan /tmp/test > /tmp/test.fingerprint
56
+
57
+ This fingerprint can then be used to verify that no files have changed:
58
+
59
+ $ fingerprint verify -n /tmp/test.fingerprint /tmp/test
60
+ S 0 error(s) detected.
61
+ error.count 0
62
+
63
+ ## Comparing Fingerprints
64
+
65
+ The `verify` operation checks a given fingerprint against a current filesystem, and can work efficiently according to the files in the source fingerprint. In the case that you want to compare two fingerprints, you can find all differences, including additions. This can be useful when comparing two backups to see what files changed (e.g. a tripwire).
66
+
67
+ ```
68
+ /tmp$ fingerprint scan /tmp/test > /tmp/test1.fingerprint
69
+ /tmp$ vi test/file-1.txt (change this file)
70
+ /tmp$ fingerprint scan /tmp/test > /tmp/test2.fingerprint
71
+ /tmp$ fingerprint compare /tmp/test1.fingerprint /tmp/test2.fingerprint
72
+ W file-1.txt
73
+ changes.file.size.new 4
74
+ changes.file.size.old 0
75
+ changes.key.SHA2.256.new b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c
76
+ changes.key.SHA2.256.old e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
77
+ error.code keys_different
78
+ error.message Key SHA2.256 does not match
79
+ S 1 error(s) detected.
80
+ error.count 1
81
+ ```
82
+
83
+ Here we can see files which have been changed and added to `/tmp/test` after a modifying a file.
84
+
85
+ ## Transmission and Archival Usage
86
+
87
+ Fingerprint provides two high-level operations `analyze` which is roughly equivalent to `scan` but outputs to the file `index.fingerprint` by default, and `verify` which reads by default from `index.fingerprint`.
88
+
89
+ Typical use cases would include analyzing a directory before copying it, and verifying it after the copy, or analyzing a backup and verifying it before restore.
90
+
91
+ -- Fingerprint the local data:
92
+ $ fingerprint --root /srv/http analyze
93
+
94
+ -- Copy it to the remote system:
95
+ $ rsync -a /srv/http production:/srv/http
96
+
97
+ -- Run fingerprint on the remote system to verify the data copied correctly:
98
+ $ ssh production fingerprint --root /srv/http verify
99
+ S 0 error(s) detected.
100
+ error.count 0
101
+
102
+ This is equally useful for traditional backup mediums, e.g. write-only storage, offline backups, etc.
103
+
104
+ ## Data Preservation
105
+
106
+ Data preservation means that data is available in a useful form for as long as it is required (which could be indefinitely). Faulty hardware, software, user error or malicious attack are the primary concerns affecting data preservation. Fingerprint can provide insight into data integrity problems, which can then be resolved appropriately.
107
+
108
+ In almost all cases of data corruption, the sooner the corruption is detected, the less damage that occurs overall. It is in this sense that regular fingerprinting of critical files should be considered an important step in comprehensive data preservation policy.
109
+
110
+ Data preservation almost always involves data replication and verification. If data becomes corrupt, it it essential that backups are available to recover the original data. However, it is also important that backup data can be verified otherwise there is no guarantee that the recovered data is correct.
111
+
112
+ End to end testing on a regular basis is the only sane policy if your data is important.
113
+
114
+ ### Non-malicious Bit-rot
115
+
116
+ Using standard analyze/verify procedure, fingerprint can detect file corruption. Make sure you record extended information using `-x`. The primary indication of file corruption is a change in checksum data but not in modified time. Standard operating system functions update the modified time when the file is changed. But, if the file changes without this, it may indicate hardware or software fault, and should be investigated further.
117
+
118
+ Once data has been analyzed, you can store it on archival media (e.g. optical or tape storage). At a later date, you can verify this to ensure the data has not become damaged over time.
119
+
120
+ ### Malicious Changes
121
+
122
+ Malicious modification of files can be detected using fingerprint. This setup is typically referred to as a tripwire, because when an attacker modifies some critical system files, the system administrator will be notified. To maintain the security of such a setup, the fingerprint should be stored on a separate server:
123
+
124
+ $ mv latest.fingerprint previous.fingerprint
125
+ $ ssh data.example.com fingerprint scan /etc > latest.fingerprint
126
+ $ fingerprint compare previous.fingerprint latest.fingerprint
127
+ S 0 error(s) detected.
128
+ error.count 0
129
+
130
+ This can be scripted and run in an hourly cron job:
131
+
132
+ #!/usr/bin/env ruby
133
+
134
+ require 'fileutils'
135
+
136
+ REMOTE = "server.example.com"
137
+ DIRECTORY = "/etc"
138
+ PREVIOUS = "previous.fingerprint"
139
+ LATEST = "latest.fingerprint"
140
+
141
+ if File.exist? LATEST
142
+ FileUtils.mv LATEST, PREVIOUS
143
+ end
144
+
145
+ $stderr.puts "Generating fingerprint of #{REMOTE}:#{DIRECTORY}..."
146
+ system("ssh #{REMOTE} fingerprint #{DIRECTORY} > #{LATEST}")
147
+
148
+ if File.exist? PREVIOUS
149
+ $stderr.puts "Comparing fingerprints..."
150
+ system('fingerprint', 'compare', '-a', PREVIOUS, LATEST)
151
+ end
152
+
153
+ ## Backup integrity
154
+
155
+ Data backup is typically an important part of any storage system. However, without end-to-end verification of backup data, it may not be possible to ensure that a backup system is working correctly. In the event that a failure occurs, data recovery may not be possible despite the existence of a backup, if that data has not been backed up reliably or correctly.
156
+
157
+ If you are backing up online data, the backup tool you are using may backup files at non-deterministic times. This means that if software (such as a database) is writing to a file at the time the backup occurs, the data may be transferred incorrectly. Fingerprint can help to detect this, by running fingerprint before the backup on the local storage, and then verifying the backup data after it has been copied. Ideally, you'd expect to see minimal changes to critical files.
158
+
159
+ However, the real world is often not so simple. Some software doesn't provide facilities for direct synchronization; other software provides facilities for dumping data (which may not be an option of the dataset is large). In these cases, fingerprint can give you some visibility about the potential issues you may face during a restore. You may want to consider [Synco](https://github.com/ioquatix/synco) which can coordinate more complex backup tasks.
160
+
161
+ ### Ensuring Data Validity
162
+
163
+ To ensure that data has been backed up correctly, use fingerprint to analyze the data before it is backed up.
164
+
165
+ -- Perform the data analysis
166
+ $ sudo fingerprint analyze -f /etc
167
+
168
+ -- Backup the data to a remote system
169
+ $ sudo rsync --archive /etc backups.example.com:/mnt/backups/server.example.com/etc
170
+
171
+ After the data has been copied to the remote backup device, restore the data to a temporary location and use fingerprint to verify the data. The exact procedure will depend on your backup system, e.g. if you use a tape you may need to restore from the tape to local storage first.
172
+
173
+ -- On the backup server
174
+ $ cd /mnt/backups/server.example.com/etc
175
+ $ fingerprint verify
176
+ S 0 error(s) detected.
177
+ error.count 0
178
+
179
+ ### Preserving Backups
180
+
181
+ If your primary concern is ensuring that backup data is consistent over time (e.g. files are not modified or damaged), fingerprint can be used directly on the backup data to check for corruption. After the data has been backed up successfully, simply analyze the data as above, but on the backup server. Once this is done, at any point in the future you can verify the correctness of the backup set.
182
+
183
+ ## Cryptographic Sealing
184
+
185
+ Fingerprint can be used to ensure that a set of files has been delivered without manipulation, by creating a fingerprint and signing this with a private key. The fingerprint and associated files can later be verified using the public key.
186
+
187
+ ### Generating Keys
188
+
189
+ To sign fingerprints, the first step is to create a private and public key pair. This is easily achieved using OpenSSL:
190
+
191
+ -- Create a private key, which you must keep secure.
192
+ $ openssl genrsa -out private.pem 2048
193
+ Generating RSA private key, 2048 bit long modulus
194
+ .............+++
195
+ ........+++
196
+ e is 65537 (0x10001)
197
+
198
+ -- Create a public key, which can be used to verify sealed fingerprints.
199
+ $ openssl rsa -in private.pem -pubout -out public.pem
200
+ writing RSA key
201
+
202
+ ### Signing Fingerprints
203
+
204
+ After you have generated a fingerprint, you can sign it easily using the private key:
205
+
206
+ -- We assume here that you are using fingerprint analyze to generate fingerprints.
207
+ $ openssl dgst -sha1 -sign private.pem -out index.signature index.fingerprint
208
+
209
+ ### Verifying Fingerprint Signature
210
+
211
+ You can easily verify the security of the fingerprint data:
212
+
213
+ $ openssl dgst -sha1 -verify public.pem -signature index.signature index.fingerprint
214
+ Verified OK
215
+ -- Fingerprint data has been cryptographically verified
216
+
217
+ $ fingerprint verify
218
+ S 0 error(s) detected.
219
+ error.count 0
220
+ Data verified, 0 errors found.
221
+ -- File list has been checked and no errors.
222
+
223
+ As long as private key is kept secure, we can be sure that these files have not been tampered with.
224
+
225
+ ### Notarizing
226
+
227
+ In many cases it is good to ensure that documents existed at a particular time. With modern document storage systems, it may be impossible to verify this by simply relying on databases and filesystems alone, especially because technology can be manipulated.
228
+
229
+ Fingerprint can be used to produce printed documents which can be used to verify the existence of files at a given time. With appropriate physical signatures, these could be used to verify the existence of a given set of files at a specific time and place.
230
+
231
+ Simply follow the procedure to produce a cryptographic hash of a directory, print these documents out, and get them signed.
232
+
233
+ N.B. Please consult a lawyer for the correct procedure and legal standing of such techniques. This document is not legal advice and we accept no responsibility for any use or misuse of this tool.
data/Gemfile CHANGED
@@ -1,4 +1,4 @@
1
1
  source 'https://rubygems.org'
2
2
 
3
- # Specify your gem's dependencies in fingerprint.gemspec
3
+ # Specify your gem's dependencies in ..gemspec
4
4
  gemspec
data/README.md CHANGED
@@ -1,43 +1,138 @@
1
1
  # Fingerprint
2
2
 
3
- Fingerprint is primarily a command line tool to compare directory structures on
4
- disk. It also provides a programmatic interface for this procedure.
3
+ > Matter and energy degrade to more probable, less informative states. The larger the amounts of information processed or diffused, the more likely it is that information will degrade toward meaningless variety, like noise or information overload, or sterile uniformity — Orrin Klapp
5
4
 
6
- Because Fingerprint produces output to `IO` like structures, it is easy to transmit
7
- this data across a network, or store it for later use. As an example, it could be
8
- used to check the integrity of a remote backup.
5
+ Fingerprint is a general purpose data integrity tool that uses cryptographic hashes to detect changes in files. Fingerprint scans a directory tree and generates a small transcript file containing the names and hashes of the files. This snapshot file can then be used to generate a list of files that have been created, deleted, or modified. If so much as a single bit in a single file in the directory tree has changed, fingerprint will detect it.
9
6
 
10
- For examples and documentation please see the main [project page][1].
7
+ Traditionally, the only way to preserve data was to take regular backups and hope that any unwanted changes that occurred would be major, obvious ones (such as loss of the disk). This approach means trusting all the software to which the data is exposed: operating systems, backup software, communications software, compression software, encryption software, and archiving software. Unfortunately, each of these systems is highly complex and can inflict all kinds of damage on data, much of the damage undetectable to humans. Fingerprint allows data to be monitored, detecting even the change of a single bit.
11
8
 
12
- [1]: http://www.codeotaku.com/projects/fingerprint/index
9
+ Fingerprint can be used for:
10
+
11
+ - Preservation: Detect corruption of important data, e.g. web server integrity, write-once storage verification.
12
+ - Security: Detect changes made by intruders, e.g. firewall integrity, network configuration, software auditing.
13
+ - Transfers: Verify file copies and transfers between different systems, e.g. file transfer integrity.
14
+ - Sealing: Cryptographically seal critical files, e.g. document verification.
15
+ - Notarizing: Prove that documents existed at a particular time.
16
+ - Backups: Verify restored backups to ensure that backups are sound, e.g. backup verification and integrity.
17
+
18
+ A companion app is available in the [Mac App Store](https://itunes.apple.com/nz/app/fingerprint/id470866821). Purchasing this app helps fund the open source software development.
19
+
20
+ [![Build Status](https://travis-ci.org/ioquatix/fingerprint.svg?branch=master)](https://travis-ci.org/ioquatix/fingerprint)
21
+ [![Code Climate](https://codeclimate.com/github/ioquatix/fingerprint.svg)](https://codeclimate.com/github/ioquatix/fingerprint)
22
+ [![Coverage Status](https://coveralls.io/repos/ioquatix/fingerprint/badge.svg)](https://coveralls.io/r/ioquatix/fingerprint)
23
+
24
+ ## Motivation
25
+
26
+ As the world becomes further entrenched in digital data and storage, the accuracy and correctness of said data is going to become a bigger problem. As humans create information, we are ultimately decreasing the amount of entropy in the universe. By the second law of thermodynamics, when a closed system moves from "the least to the most probable, from differentiation to sameness, from ordered individuality to a kind of chaos," (Thomas Pynchon) the only logical conclusion is that what we consider to be important data is destined to become meaningless noise in the sands of time.
27
+
28
+ When I first suffered data-loss, it wasn't catastrophic - it was the slow deterioration of a drive which silently corrupted many files. After this event, I wanted a tool which would allow me to minimize the chance of this happening in the future. When I take a backup now, I also make a fingerprint. If I ever need to restore from backup, I can be confident the data is as it was when it was backed up.
29
+
30
+ As fingerprint provides a fast way to compare the files, I've also extended it to find duplicates within one or more fingerprints. This is useful for de-duplicating your home directory and I've also used it when marking assignments to find blatant copying.
31
+
32
+ In cases where I've been concerned about the migration of data (e.g. copying my entire home directory from one system to another), I've used fingerprint to generate a transcript on the source machine, and then run it on the destination machine, to reassure me that the data was copied correctly and completely.
13
33
 
14
34
  ## Installation
15
35
 
16
36
  Add this line to your application's Gemfile:
17
37
 
18
- gem 'fingerprint'
38
+ gem 'fingerprint'
19
39
 
20
40
  And then execute:
21
41
 
22
- $ bundle
42
+ $ bundle
23
43
 
24
44
  Or install it yourself as:
25
45
 
26
- $ gem install fingerprint
46
+ $ gem install fingerprint
27
47
 
28
48
  ## Usage
29
49
 
30
- Please refer to the [online documentation][http://www.codeotaku.com/projects/fingerprint/documentation/introduction].
50
+ Please consult the [GUIDE](GUIDE.md) for an overview of how fingerprint command can be used.
51
+
52
+ ### RSpec
53
+
54
+ The simplest usage of fingerprint is checking if two directories are equivalent:
55
+
56
+ Fingerprint.identical?(source_path, destination_path) do |record|
57
+ puts "#{record.path} is different"
58
+ end
59
+
60
+ This would catch additions, removals, and changes. You can use this in RSpec:
61
+
62
+ expect(Fingerprint).to be_identical(source_path, destination_path)
63
+
64
+ ### Command Line
65
+
66
+ The `fingerprint` command has a high-level and low-level interface.
67
+
68
+ #### High-level Interface
69
+
70
+ This usage is centered around analysing a given directory using `fingerprint analyze` and then, at a later date, checking that the directory is not missing any files and that all files are the same as they were originally, using `fingerprint verify`.
71
+
72
+ ```
73
+ $ fingerprint analyze
74
+ $ fingerprint verify
75
+ S 0 error(s) detected.
76
+ error.count 0
77
+ ```
78
+
79
+ If we modify a file (`file-1.txt` in this example), it will be reported:
80
+
81
+ ```
82
+ $ fingerprint verify
83
+ W file-1.txt
84
+ changes.file.size.new 8
85
+ changes.file.size.old 4
86
+ changes.key.SHA2.256.new 1f2ec52b774368781bed1d1fb140a92e0eb6348090619c9291f9a5a3c8e8d151
87
+ changes.key.SHA2.256.old b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c
88
+ error.code keys_different
89
+ error.message Key SHA2.256 does not match
90
+ S 1 error(s) detected.
91
+ error.count 1
92
+ ```
93
+
94
+ This command does not report files which have been added.
95
+
96
+ #### Low-level interface
97
+
98
+ It is possible to generate a fingerprint using the scan command, which takes a list of paths and writes out the transcript.
99
+
100
+ % fingerprint scan spec
101
+ C /home/samuel/Documents/Programming/ioquatix/fingerprint/spec
102
+ fingerprint.version 2.0.0
103
+ options.checksums MD5, SHA2.256
104
+ options.extended false
105
+ summary.time.start 2016-06-25 11:46:12 +1200
106
+ D
107
+ D fingerprint
108
+ F fingerprint/check_paths_spec.rb
109
+ file.size 1487
110
+ key.MD5 ef77034977daa683bbaaed47c553f6f5
111
+ key.SHA2.256 970ec4663ffc257ec1d4f49f54711c38434108d580afc0c92ea7bf864e08a1e0
112
+ S 1 files processed.
113
+ summary.directories 2
114
+ summary.excluded 0
115
+ summary.files 1
116
+ summary.size 1487
117
+ summary.time.end 2016-06-25 11:46:12 +1200
118
+
119
+ #### Duplicates
120
+
121
+ Fingerprint can efficiently find duplicates in one or more fingerprints.
122
+
123
+ $ fingerprint duplicates index.fingerprint
124
+ F .git/refs/heads/master
125
+ file.size 41
126
+ fingerprint index.fingerprint
127
+ key.MD5 aaadaeee72126dedcd4044d687a74068
128
+ key.SHA2.256 6750f057b38c2ea93e3725545333b8167301b6d8daa0626b0a2a613a6a4f4f04
129
+ original.fingerprint index.fingerprint
130
+ original.path .git/refs/remotes/origin/master
31
131
 
32
132
  ## Todo
33
133
 
34
- * Command line option to show files that have been created (e.g. don't exist in master fingerprint).
35
- * Command line option to show files that have changed but have the same modified time (hardware corrutpion).
36
- * Command line option to check fingerprint files based on checksums, e.g. duplicate files, unique files, over a set of directories.
37
- * Command line tool for extracting duplicate and unique files over a set of directories?
134
+ * Command line option to show files that have changed but have the same modified time (hardware corruption).
38
135
  * Supporting tools for signing fingerprints easily.
39
- * Support indexing specific files as well as whole directories (maybe?).
40
- * Support general filenames for `--archive`, e.g. along with `-n`, maybe support a file called `index.fingerprint` by default: improved visibility for end user.
41
136
  * Because fingerprint is currently IO bound in terms of performance, single-threaded checksumming is fine, but for SSD and other fast storage, it might be possible to improve speed somewhat by using a map-reduce style approach.
42
137
 
43
138
  ## Contributing
@@ -52,7 +147,7 @@ Please refer to the [online documentation][http://www.codeotaku.com/projects/fin
52
147
 
53
148
  Released under the MIT license.
54
149
 
55
- Copyright, 2012, by [Samuel G. D. Williams](http://www.codeotaku.com/samuel-williams).
150
+ Copyright, 2019, by [Samuel G. D. Williams](http://www.codeotaku.com/samuel-williams).
56
151
 
57
152
  Permission is hereby granted, free of charge, to any person obtaining a copy
58
153
  of this software and associated documentation files (the "Software"), to deal