newspaper_works_fixtures 0.2.0 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 02a902019c642f1b7211948001975e8bd636df81fa3da61064b3fcbea18440c6
4
- data.tar.gz: fa8964c81494e85b92f8a8ba5267742d7235c52a6c4983f9a4116d5f81ad895e
3
+ metadata.gz: f9ba6498fb2a01749ad7c3d151d21ea1104840cdde8fb9027379f278522665e3
4
+ data.tar.gz: 6f223cf8bd3610ed00deb861989db8349b46c5e35736c4dfbdac0a1055e3ef8d
5
5
  SHA512:
6
- metadata.gz: c6cd10271a654703b913aba2405a9fd4c27827312bab3f26aca80f5432c2cc2461122a43dee175b42bb031c8124a3064ec509ded6387a623685f4c2de5ce30be
7
- data.tar.gz: 44e4895761568a68871df76c4f6da14b3ec4a6fb04fb4236296558f180eff10817377dc21280df5bbea1663934bdff00f019a0e33c1a90bb9a0e9ab1229270fe
6
+ metadata.gz: 51366b43d15d130e5cd02107964412f6d71319413014229619f666764ff5031d6df4b4a348963bd3c7d443310509e6c6b13cd7c1d6af8c3789ac0bded632a89f
7
+ data.tar.gz: 395e113f45e7d914197d6958f17e2e088efc93099a85b148adae6e3e6ae04b152d80a8183044bd4d447586d759ac2388c3023a58f29e5747e1cc5cb8de93f66e
data/README.md CHANGED
@@ -61,6 +61,39 @@ files are included as well.
61
61
 
62
62
  11 image scans; 149 MB
63
63
 
64
+ #### PDF and TIFF batch (Chicopee Weekly)
65
+ ```ruby
66
+ # /path/to/gem/spec/fixtures/files/article_segmented/batch_deseret_news
67
+ NewspaperWorksFixtures.pdf_batch
68
+ NewspaperWorksFixtures.tiff_batch
69
+ ```
70
+ These are two variants of four-page issues of Chicopee Weekly, via
71
+ [Digital Commonwealth](https://www.digitalcommonwealth.org/collections/commonwealth-oai:xd07gx07n).
72
+
73
+ The PDF source materials are 400 ppi monochrome (CCITT group 4 compressed),
74
+ with each PDF representing a single four page issue. The file naming convention
75
+ is as follows:
76
+
77
+ - Publication directory named with Library of Congress Control Number (LCCN).
78
+
79
+ - Inside publication directory are PDF files using naming convention of
80
+ `YYYYMMDDEE.pdf`, where:
81
+ - `YYYY` is four digit year.
82
+ - `MM` is month (zero padded).
83
+ - `DD` is day of month (zero padded).
84
+ - `EE` is edition number (zero padded).
85
+
86
+ The TIFF batch likewise is one-bit "Group 4" compressed mononchrome images,
87
+ and use a similar `YYYMMDDEE` naming convention:
88
+
89
+ - Publication directory named with Library of Congress Control Number (LCCN).
90
+
91
+ - Directly contained in publication directory are directories, one per issue,
92
+ using the `YYYYMMDDEE` naming convention/
93
+
94
+ - Inside issue directories are TIFF files with lexically ordered filenames,
95
+ corresponding to page sequence order of that issue.
96
+
64
97
  #### Deseret News article segmented batch
65
98
  ```ruby
66
99
  # /path/to/gem/spec/fixtures/files/article_segmented/batch_deseret_news
@@ -31,4 +31,14 @@ module NewspaperWorksFixtures
31
31
  def self.article_segmented_batch_topaz_times
32
32
  File.join(file_fixtures, 'article_segmented', 'batch_topaz_times')
33
33
  end
34
+
35
+ # returns the PDF batch (single publication) for Chicopee Weekly
36
+ def self.pdf_batch
37
+ File.join(file_fixtures, 'pdf_batch', 'sn93059126')
38
+ end
39
+
40
+ # returns the TIFF batch (extracted from materials in PDF batch)
41
+ def self.tiff_batch
42
+ File.join(file_fixtures, 'tiff_batch', 'sn93059126')
43
+ end
34
44
  end
@@ -1,3 +1,3 @@
1
1
  module NewspaperWorksFixtures
2
- VERSION = '0.2.0'.freeze
2
+ VERSION = '0.3.0'.freeze
3
3
  end
@@ -0,0 +1,28 @@
1
+ ## Batch contents
2
+
3
+ * A root directory named with the LCCN of the publication.
4
+
5
+ * Contained in the publication directory, two four-page issues, each with
6
+ its own directory, named with the `YYYYMMDDEE` convention, where:
7
+ - `YYYY` represents the 4-digit year
8
+ - `MM` represents the 2-digit month
9
+ - `DD` represents the 2-digit day
10
+ - `EE` represents the 2-digit edition number (default is 01)
11
+
12
+ * In each issue directory, page TIFF files are named in such a way that
13
+ lexical ordering is observed ("page1.tiff" is before "page2.tiff" in both
14
+ string evaluation and in human readability).
15
+
16
+ ### Batch details
17
+
18
+ * _Contained herein are images created from like issue PDFs in `../pdf_batch`._
19
+
20
+ * These page images were extracted via Ghostscript, with the following command
21
+ syntax:
22
+
23
+ ```
24
+ gs -dNOPAUSE -dBATCH -sDEVICE=tiffg4 -dTextAlphaBits=4 -sOutputFile=page%d.tiff -r400 -f $PDFFILE
25
+ ```
26
+
27
+ * Like the PDF equivalents, containing one-bit monochrome CCITT "Group 4"
28
+ compressed images, these TIFF images use same compression and bit depth.
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: newspaper_works_fixtures
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sean Upton
@@ -11,7 +11,7 @@ authors:
11
11
  autorequire:
12
12
  bindir: bin
13
13
  cert_chain: []
14
- date: 2019-06-28 00:00:00.000000000 Z
14
+ date: 2019-09-10 00:00:00.000000000 Z
15
15
  dependencies:
16
16
  - !ruby/object:Gem::Dependency
17
17
  name: rails
@@ -278,6 +278,15 @@ files:
278
278
  - spec/fixtures/files/pdf_batch/sn93059126/1856051001.pdf
279
279
  - spec/fixtures/files/pdf_batch/sn93059126/1856051701.pdf
280
280
  - spec/fixtures/files/pdf_batch/sn93059126/1856052401.pdf
281
+ - spec/fixtures/files/tiff_batch/README.md
282
+ - spec/fixtures/files/tiff_batch/sn93059126/1853060401/page1.tiff
283
+ - spec/fixtures/files/tiff_batch/sn93059126/1853060401/page2.tiff
284
+ - spec/fixtures/files/tiff_batch/sn93059126/1853060401/page3.tiff
285
+ - spec/fixtures/files/tiff_batch/sn93059126/1853060401/page4.tiff
286
+ - spec/fixtures/files/tiff_batch/sn93059126/1856051001/page1.tiff
287
+ - spec/fixtures/files/tiff_batch/sn93059126/1856051001/page2.tiff
288
+ - spec/fixtures/files/tiff_batch/sn93059126/1856051001/page3.tiff
289
+ - spec/fixtures/files/tiff_batch/sn93059126/1856051001/page4.tiff
281
290
  homepage: https://github.com/marriott-library/newspaper_works_fixtures
282
291
  licenses:
283
292
  - Apache-2.0
@@ -298,7 +307,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
298
307
  version: '0'
299
308
  requirements: []
300
309
  rubyforge_project:
301
- rubygems_version: 2.7.4
310
+ rubygems_version: 2.7.6
302
311
  signing_key:
303
312
  specification_version: 4
304
313
  summary: newspaper_works_fixtures is a Rails Engine gem providing file fixtures for
@@ -536,3 +545,12 @@ test_files:
536
545
  - spec/fixtures/files/pdf_batch/sn93059126/1856051001.pdf
537
546
  - spec/fixtures/files/pdf_batch/sn93059126/1856051701.pdf
538
547
  - spec/fixtures/files/pdf_batch/sn93059126/1856052401.pdf
548
+ - spec/fixtures/files/tiff_batch/README.md
549
+ - spec/fixtures/files/tiff_batch/sn93059126/1853060401/page1.tiff
550
+ - spec/fixtures/files/tiff_batch/sn93059126/1853060401/page2.tiff
551
+ - spec/fixtures/files/tiff_batch/sn93059126/1853060401/page3.tiff
552
+ - spec/fixtures/files/tiff_batch/sn93059126/1853060401/page4.tiff
553
+ - spec/fixtures/files/tiff_batch/sn93059126/1856051001/page1.tiff
554
+ - spec/fixtures/files/tiff_batch/sn93059126/1856051001/page2.tiff
555
+ - spec/fixtures/files/tiff_batch/sn93059126/1856051001/page3.tiff
556
+ - spec/fixtures/files/tiff_batch/sn93059126/1856051001/page4.tiff