newspaper_works_fixtures 0.2.0 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +33 -0
- data/lib/newspaper_works_fixtures.rb +10 -0
- data/lib/newspaper_works_fixtures/version.rb +1 -1
- data/spec/fixtures/files/tiff_batch/README.md +28 -0
- data/spec/fixtures/files/tiff_batch/sn93059126/1853060401/page1.tiff +0 -0
- data/spec/fixtures/files/tiff_batch/sn93059126/1853060401/page2.tiff +0 -0
- data/spec/fixtures/files/tiff_batch/sn93059126/1853060401/page3.tiff +0 -0
- data/spec/fixtures/files/tiff_batch/sn93059126/1853060401/page4.tiff +0 -0
- data/spec/fixtures/files/tiff_batch/sn93059126/1856051001/page1.tiff +0 -0
- data/spec/fixtures/files/tiff_batch/sn93059126/1856051001/page2.tiff +0 -0
- data/spec/fixtures/files/tiff_batch/sn93059126/1856051001/page3.tiff +0 -0
- data/spec/fixtures/files/tiff_batch/sn93059126/1856051001/page4.tiff +0 -0
- metadata +21 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f9ba6498fb2a01749ad7c3d151d21ea1104840cdde8fb9027379f278522665e3
|
4
|
+
data.tar.gz: 6f223cf8bd3610ed00deb861989db8349b46c5e35736c4dfbdac0a1055e3ef8d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 51366b43d15d130e5cd02107964412f6d71319413014229619f666764ff5031d6df4b4a348963bd3c7d443310509e6c6b13cd7c1d6af8c3789ac0bded632a89f
|
7
|
+
data.tar.gz: 395e113f45e7d914197d6958f17e2e088efc93099a85b148adae6e3e6ae04b152d80a8183044bd4d447586d759ac2388c3023a58f29e5747e1cc5cb8de93f66e
|
data/README.md
CHANGED
@@ -61,6 +61,39 @@ files are included as well.
|
|
61
61
|
|
62
62
|
11 image scans; 149 MB
|
63
63
|
|
64
|
+
#### PDF and TIFF batch (Chicopee Weekly)
|
65
|
+
```ruby
|
66
|
+
# /path/to/gem/spec/fixtures/files/article_segmented/batch_deseret_news
|
67
|
+
NewspaperWorksFixtures.pdf_batch
|
68
|
+
NewspaperWorksFixtures.tiff_batch
|
69
|
+
```
|
70
|
+
These are two variants of four-page issues of Chicopee Weekly, via
|
71
|
+
[Digital Commonwealth](https://www.digitalcommonwealth.org/collections/commonwealth-oai:xd07gx07n).
|
72
|
+
|
73
|
+
The PDF source materials are 400 ppi monochrome (CCITT group 4 compressed),
|
74
|
+
with each PDF representing a single four page issue. The file naming convention
|
75
|
+
is as follows:
|
76
|
+
|
77
|
+
- Publication directory named with Library of Congress Control Number (LCCN).
|
78
|
+
|
79
|
+
- Inside publication directory are PDF files using naming convention of
|
80
|
+
`YYYYMMDDEE.pdf`, where:
|
81
|
+
- `YYYY` is four digit year.
|
82
|
+
- `MM` is month (zero padded).
|
83
|
+
- `DD` is day of month (zero padded).
|
84
|
+
- `EE` is edition number (zero padded).
|
85
|
+
|
86
|
+
The TIFF batch likewise is one-bit "Group 4" compressed mononchrome images,
|
87
|
+
and use a similar `YYYMMDDEE` naming convention:
|
88
|
+
|
89
|
+
- Publication directory named with Library of Congress Control Number (LCCN).
|
90
|
+
|
91
|
+
- Directly contained in publication directory are directories, one per issue,
|
92
|
+
using the `YYYYMMDDEE` naming convention/
|
93
|
+
|
94
|
+
- Inside issue directories are TIFF files with lexically ordered filenames,
|
95
|
+
corresponding to page sequence order of that issue.
|
96
|
+
|
64
97
|
#### Deseret News article segmented batch
|
65
98
|
```ruby
|
66
99
|
# /path/to/gem/spec/fixtures/files/article_segmented/batch_deseret_news
|
@@ -31,4 +31,14 @@ module NewspaperWorksFixtures
|
|
31
31
|
def self.article_segmented_batch_topaz_times
|
32
32
|
File.join(file_fixtures, 'article_segmented', 'batch_topaz_times')
|
33
33
|
end
|
34
|
+
|
35
|
+
# returns the PDF batch (single publication) for Chicopee Weekly
|
36
|
+
def self.pdf_batch
|
37
|
+
File.join(file_fixtures, 'pdf_batch', 'sn93059126')
|
38
|
+
end
|
39
|
+
|
40
|
+
# returns the TIFF batch (extracted from materials in PDF batch)
|
41
|
+
def self.tiff_batch
|
42
|
+
File.join(file_fixtures, 'tiff_batch', 'sn93059126')
|
43
|
+
end
|
34
44
|
end
|
@@ -0,0 +1,28 @@
|
|
1
|
+
## Batch contents
|
2
|
+
|
3
|
+
* A root directory named with the LCCN of the publication.
|
4
|
+
|
5
|
+
* Contained in the publication directory, two four-page issues, each with
|
6
|
+
its own directory, named with the `YYYYMMDDEE` convention, where:
|
7
|
+
- `YYYY` represents the 4-digit year
|
8
|
+
- `MM` represents the 2-digit month
|
9
|
+
- `DD` represents the 2-digit day
|
10
|
+
- `EE` represents the 2-digit edition number (default is 01)
|
11
|
+
|
12
|
+
* In each issue directory, page TIFF files are named in such a way that
|
13
|
+
lexical ordering is observed ("page1.tiff" is before "page2.tiff" in both
|
14
|
+
string evaluation and in human readability).
|
15
|
+
|
16
|
+
### Batch details
|
17
|
+
|
18
|
+
* _Contained herein are images created from like issue PDFs in `../pdf_batch`._
|
19
|
+
|
20
|
+
* These page images were extracted via Ghostscript, with the following command
|
21
|
+
syntax:
|
22
|
+
|
23
|
+
```
|
24
|
+
gs -dNOPAUSE -dBATCH -sDEVICE=tiffg4 -dTextAlphaBits=4 -sOutputFile=page%d.tiff -r400 -f $PDFFILE
|
25
|
+
```
|
26
|
+
|
27
|
+
* Like the PDF equivalents, containing one-bit monochrome CCITT "Group 4"
|
28
|
+
compressed images, these TIFF images use same compression and bit depth.
|
Binary file
|
Binary file
|
Binary file
|
Binary file
|
Binary file
|
Binary file
|
Binary file
|
Binary file
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: newspaper_works_fixtures
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Sean Upton
|
@@ -11,7 +11,7 @@ authors:
|
|
11
11
|
autorequire:
|
12
12
|
bindir: bin
|
13
13
|
cert_chain: []
|
14
|
-
date: 2019-
|
14
|
+
date: 2019-09-10 00:00:00.000000000 Z
|
15
15
|
dependencies:
|
16
16
|
- !ruby/object:Gem::Dependency
|
17
17
|
name: rails
|
@@ -278,6 +278,15 @@ files:
|
|
278
278
|
- spec/fixtures/files/pdf_batch/sn93059126/1856051001.pdf
|
279
279
|
- spec/fixtures/files/pdf_batch/sn93059126/1856051701.pdf
|
280
280
|
- spec/fixtures/files/pdf_batch/sn93059126/1856052401.pdf
|
281
|
+
- spec/fixtures/files/tiff_batch/README.md
|
282
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1853060401/page1.tiff
|
283
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1853060401/page2.tiff
|
284
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1853060401/page3.tiff
|
285
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1853060401/page4.tiff
|
286
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1856051001/page1.tiff
|
287
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1856051001/page2.tiff
|
288
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1856051001/page3.tiff
|
289
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1856051001/page4.tiff
|
281
290
|
homepage: https://github.com/marriott-library/newspaper_works_fixtures
|
282
291
|
licenses:
|
283
292
|
- Apache-2.0
|
@@ -298,7 +307,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
298
307
|
version: '0'
|
299
308
|
requirements: []
|
300
309
|
rubyforge_project:
|
301
|
-
rubygems_version: 2.7.
|
310
|
+
rubygems_version: 2.7.6
|
302
311
|
signing_key:
|
303
312
|
specification_version: 4
|
304
313
|
summary: newspaper_works_fixtures is a Rails Engine gem providing file fixtures for
|
@@ -536,3 +545,12 @@ test_files:
|
|
536
545
|
- spec/fixtures/files/pdf_batch/sn93059126/1856051001.pdf
|
537
546
|
- spec/fixtures/files/pdf_batch/sn93059126/1856051701.pdf
|
538
547
|
- spec/fixtures/files/pdf_batch/sn93059126/1856052401.pdf
|
548
|
+
- spec/fixtures/files/tiff_batch/README.md
|
549
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1853060401/page1.tiff
|
550
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1853060401/page2.tiff
|
551
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1853060401/page3.tiff
|
552
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1853060401/page4.tiff
|
553
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1856051001/page1.tiff
|
554
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1856051001/page2.tiff
|
555
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1856051001/page3.tiff
|
556
|
+
- spec/fixtures/files/tiff_batch/sn93059126/1856051001/page4.tiff
|