digital_scriptorium 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 62b823f25e2940c6a68ee4ea8db949cebbdc95f9d31682d72a60dbbb62c0a0ff
4
- data.tar.gz: 53c69f7e20af8b7efc327214c4bfbb8378f6d80133106458231277961f4ed613
3
+ metadata.gz: c72c2b9f90471cda9645a21bfdbec8e617d92372c7139eaab149ace86ae1d2ba
4
+ data.tar.gz: 2b594eec7517dc48419e94c33571835108d1abc16c5fb2a4348c524efb083dd7
5
5
  SHA512:
6
- metadata.gz: 20ae390598e32c3276426c98dd437cd9d097e485d50bcb86757058defbbce42408a3a00b8c20fec9f86f2010078483bec4e1cd3053fc39ad42a462cf283e30e4
7
- data.tar.gz: e200d3a053fba432f9819c68ceb3c48db5ad7d6daafc648f39ef5039b0fba345275c04f00215aff8c369fae2cf897e3eef31152bf93d375fdfa560b55b86564c
6
+ metadata.gz: 0c26cb8f29768b8c8f49a44774021f18aa4e0c0fa11ae98672ea3431eccac29d23eaae5a194695e238e47edd46387ffdd8ca8a32a9e49a3a8bb1aabb489de508
7
+ data.tar.gz: 25e1557a0020df6f64fd4b6fe45f48ee0bbf8521810e8d719e7151045913a3d77dcd3e37cac30a0180fd972a7932085dac3cb2b819412fa838ad75043776e7f3
data/doc/overview.md CHANGED
@@ -2,12 +2,12 @@
2
2
 
3
3
  For a general description of the Wikibase data model, see [Wikibase/DataModel](https://www.mediawiki.org/wiki/Wikibase/DataModel) on mediawiki.org.
4
4
 
5
- The Digital Scriptorium Wikibase data export is a JSON-formatted array of Wikibase entities. The bulk of the entities in the export consist of triplets that together form a meta-record consisting of one each of the DS Catalog core model types: manuscipts, holdings, and records. The export also contains entities representing property definitions and authoritative references to common topics.
5
+ The Digital Scriptorium Wikibase data export is a JSON-formatted array of Wikibase entities. The bulk of the entities in the export consist of the DS Catalog core model types: manuscipts (Q1), holdings (Q2), and DS 2.0 records (Q3). The export also contains entities representing property definitions and authoritative references to common topics.
6
6
 
7
7
  The [ExportRepresenter](../lib/digital_scriptorium/export_representer.rb) class can be used to deserialize an export in its entirety. The resulting [Export](../lib/digital_scriptorium/export.rb) object is essentially an array of Item and Property objects. Entities in the export are modeled using domain-specific classes provided by the [wikibase_representable](https://rubygems.org/gems/wikibase_representable) gem, such as Items, Properties, Statements (also known as Claims), and Snaks, which represent the primary claim of any statement as well as any qualifiers. Convenience methods are also provided to facilitate extracting data values.
8
8
 
9
- The conversion script [wikibase_to_solr_new.rb](../wikibase_to_solr_new.rb) proceeds by deserializing the export and converting the resulting array of Wikibase objects to a hash keyed by entity ID. It then iterates over the elements of the hash. When it finds a record item based on the value of its instance-of (P16) claim, it retrieves the linked manuscript item, as well as the holding item linked in turn to the manuscript item, from the export hash by entity ID. It then iterates over the claims attached to manuscript, holding, and record in turn, extracting the Solr fields requested based on the property ID that is the subject of the claim and adding them to the Solr record to be produced for the meta-record. Claims for most properties are transformed to Solr fields using a generic algorithm implemented in [ClaimTransformer](../lib/digital_scriptorium/claim_transformer.rb). Name and date claims require some special handling, and are handled in dedicated claim transformer classes ([NameClaimTransformer](../lib/digital_scriptorium/name_claim_transformer.rb) and [DateClaimTransformer](../lib/digital_scriptorium/date_claim_transformer.rb) respectively). After all claims from the manuscript, holding, and record have been processed, the resulting Solr record is written to the output file.
9
+ The conversion script [wikibase_to_solr.rb](https://github.com/mdholloway/hxs-blacklight/blob/main/lib/wikibase_to_solr.rb) proceeds by deserializing the export and converting the resulting array of Wikibase objects to a hash keyed by entity ID. It then iterates over the elements of the hash. When it finds a DS 2.0 record item based on the value of its instance-of (where P16 is Q3) claim, it retrieves the linked manuscript item (P1, described manuscript) from the export hash by entity ID. From the manuscript, in turn, it retrieves the ID of the item containing current holding information (P2, manuscript holding), and retrieves that too from the export hash. With the manuscript, current holding, and record items obtained, it iterates over each, extracting the Solr fields requested based on the property ID that is the subject of the claim and adding them to the Solr record to be produced. After all claims from the manuscript, holding, and record have been processed, the resulting Solr record is written to the output file. The script is written so as not to rely on the structure of the export file beyond that it will be a JSON array consisting of all entities in the DS 2.0 Wikibase, with record items linked to manuscript items and manuscript items linked to holding items by P3 (described manuscript) and P2 (holding) claims respectively.
10
10
 
11
- The specific Solr fields produced for each claim are controlled by the configuration file [property_config.yml](../property_config.yml). This file also defines the prefix (representing the property name) to be attached to each field for a given property, and whether a claim based on the property might have a related authority qualifier.
11
+ Solr field extraction logic is encapsulated in the Transformer classes. The [BaseClaimTransformer](../lib/digital_scriptorium/transformers/base_claim_transformer.rb) class sets out the basic contract, which consists of three methods: `display_values`, `search_values`, and `facet_values`. These methods return the collections of values to be included in the `_display`, `_search`, and `_facet` fields for the claim in the Solr object. For a title (P10) claim, for example, they would return the values to be used in the `title_display`, `title_search`, and `title_facet` fields. The remaining Transformer classes build on BaseClaimTransformer in various ways. For some claim types, the Transformer simply extracts the recorded value and returns it in one or more of the `_values` methods. For other claim types, it is expected that a claim will be qualified with a representation of the recorded value in its original script, or with references to a standard title or value from an authority file. This logic is contained in the [QualifiedClaimTransformer](../lib/digital_scriptorium/transformers/qualified_claim_transformer.rb) class. For these claim types, the standard title or value from authority file is returned in the `facet_values` collection. For claim types where the recorded value should be provided as a facet value in the absence of a qualifier, the [QualifiedClaimTransformerWithFacetFallback](../lib/digital_scriptorium/transformers/qualified_claim_transformer_with_facet_fallback.rb) class is provided. Finally, the [LinkClaimTransformer](../lib/digital_scriptorium/transformers/link_claim_transformer.rb) class handles a couple of claim types for which the value to be extracted is a URL. The [Transformers](../lib/digital_scriptorium/transformers.rb) class contains the mapping of claim property IDs to Transformer classes, as well as the prefixes to be used in the Solr fields based on the property name, and provides factory methods used by the conversion script to obtain Transformers as it iterates over claims.
12
12
 
13
- The script was written so as not to rely on the structure of the export file beyond that it will be a JSON array consisting of all entities in the DS 2.0 Wikibase, with record items linked to manuscript items and manuscript items linked to holding items by P3 (described manuscript) and P2 (holding) claims respectively.
13
+ The `_search` and `_facet` fields contain values pulled directly from the source Wikibase data. The `_display` field values contain serialized JSON objects that are used to support linked data bars beneath recorded values when viewing item details in the Catalog. These objects contain a `recorded_value` property with the recorded value and an optional `original_script` property with the value in original script where present. Additionally, where the recorded value is qualified by one or more qualifiers, the object will contain a `linked_terms` array to support sets of one or more linked data bars beneath the recorded value. The objects in this array will contain a `label` property that can be passed to a faceted search, as well as an optional `source_url` property that contains a link to an external vocabulary item.
@@ -3,46 +3,12 @@
3
3
  require 'wikibase_representable'
4
4
 
5
5
  module DigitalScriptorium
6
- # Represents a Digital Scriptorium item
6
+ # Represents a generic Digital Scriptorium item
7
7
  class DsItem < WikibaseRepresentable::Model::Item
8
- def instance_of_claims
9
- claims_by_property_id PropertyId::INSTANCE_OF # P16
10
- end
11
-
12
- def ds_id
13
- claims_by_property_id(PropertyId::DS_ID)&.first&.data_value # P1
14
- end
15
-
16
- def holding_ids
17
- claims_by_property_id(PropertyId::MANUSCRIPT_HOLDING)&.map(&:entity_id_value) # P2
18
- end
19
-
20
- def described_manuscript_id
21
- claims_by_property_id(PropertyId::DESCRIBED_MANUSCRIPT)&.first&.entity_id_value # P3
22
- end
23
-
24
- def holding_status
25
- claims_by_property_id(PropertyId::HOLDING_STATUS)&.first&.entity_id_value # P6
26
- end
27
-
28
- def iiif_manifest
29
- claims_by_property_id(PropertyId::IIIF_MANIFEST)&.first&.entity_id_value # P41
30
- end
31
-
32
- def core_model_item?
33
- instance_of_claims.any? { |claim| ItemId::CORE_MODEL_ITEMS.include? claim.entity_id_value }
34
- end
35
-
36
- def manuscript?
37
- instance_of_claims.any? { |claim| claim.entity_id_value == ItemId::MANUSCRIPT }
38
- end
39
-
40
- def holding?
41
- instance_of_claims.any? { |claim| claim.entity_id_value == ItemId::HOLDING }
42
- end
8
+ include PropertyId
43
9
 
44
- def record?
45
- instance_of_claims.any? { |claim| claim.entity_id_value == ItemId::RECORD }
10
+ def instance_of
11
+ claims_by_property_id(INSTANCE_OF)&.first&.entity_id_value
46
12
  end
47
13
  end
48
14
  end
@@ -21,14 +21,8 @@ module DigitalScriptorium
21
21
  @record = record
22
22
  end
23
23
 
24
- def current?(holding)
25
- holding.holding_status == HOLDING_STATUS_CURRENT
26
- end
27
-
28
24
  def current_holdings(manuscript, export_hash)
29
- manuscript.holding_ids
30
- .map { |id| export_hash[id] }
31
- .filter { |holding| current?(holding) }
25
+ manuscript.holding_ids.filter_map { |id| export_hash[id] if export_hash[id]&.current? }
32
26
  end
33
27
  end
34
28
  end
@@ -11,5 +11,12 @@ module DigitalScriptorium
11
11
  end
12
12
  hash
13
13
  end
14
+
15
+ def instance_of_id_from(item_hash)
16
+ claims = item_hash['claims']
17
+ return nil unless claims&.any?
18
+
19
+ claims.dig('P16', 0, 'mainsnak', 'datavalue', 'value', 'id')
20
+ end
14
21
  end
15
22
  end
@@ -6,6 +6,7 @@ require 'wikibase_representable'
6
6
  module DigitalScriptorium
7
7
  # Representer class for deserializing Wikibase data exports from JSON.
8
8
  class ExportRepresenter < Representable::Decorator
9
+ include ItemId
9
10
  include Representable::JSON::Collection
10
11
  include WikibaseRepresentable::Model
11
12
  include WikibaseRepresentable::Representers
@@ -13,7 +14,18 @@ module DigitalScriptorium
13
14
  items decorator: lambda { |input:, **|
14
15
  input.type == Item::ENTITY_TYPE ? ItemRepresenter : PropertyRepresenter
15
16
  }, class: lambda { |input:, **|
16
- input['type'] == Item::ENTITY_TYPE ? DsItem : Property
17
+ return Property unless input['type'] == Item::ENTITY_TYPE
18
+
19
+ case instance_of_id_from input
20
+ when MANUSCRIPT
21
+ Manuscript
22
+ when HOLDING
23
+ Holding
24
+ when DS_20_RECORD
25
+ Record
26
+ else
27
+ DsItem
28
+ end
17
29
  }
18
30
  end
19
31
  end
@@ -3,32 +3,21 @@
3
3
  module DigitalScriptorium
4
4
  # An item representing a Digital Scriptorium holding (instance of Q2)
5
5
  class Holding < DsItem
6
- def institution_as_recorded_claims
7
- claims_by_property_id HOLDING_INSTITUTION_AS_RECORDED # P5
8
- end
6
+ include ItemId
7
+ include PropertyId
9
8
 
10
9
  def status_claims
11
10
  claims_by_property_id HOLDING_STATUS # P6
12
11
  end
13
12
 
14
- def institutional_id_claims
15
- claims_by_property_id INSTITUTIONAL_ID # P7
16
- end
17
-
18
- def shelfmark_claims
19
- claims_by_property_id SHELFMARK # P8
20
- end
21
-
22
- def link_to_institutional_record_claims
23
- claims_by_property_id LINK_TO_INSTITUTIONAL_RECORD # P9
24
- end
13
+ def status
14
+ return unless status_claims&.any?
25
15
 
26
- def start_time_claims
27
- claims_by_property_id START_TIME # P38
16
+ status_claims&.first&.entity_id_value
28
17
  end
29
18
 
30
- def end_time_claims
31
- claims_by_property_id END_TIME # P39
19
+ def current?
20
+ status == HOLDING_STATUS_CURRENT
32
21
  end
33
22
  end
34
23
  end
@@ -5,11 +5,24 @@ require 'set'
5
5
  module DigitalScriptorium
6
6
  # Constants for core model item IDs.
7
7
  module ItemId
8
- MANUSCRIPT = 'Q1'
9
- HOLDING = 'Q2'
10
- RECORD = 'Q3'
11
- HOLDING_STATUS_CURRENT = 'Q4'
8
+ MANUSCRIPT = 'Q1'
9
+ HOLDING = 'Q2'
10
+ DS_20_RECORD = 'Q3'
11
+ HOLDING_STATUS_CURRENT = 'Q4'
12
+ HOLDING_STATUS_NON_CURRENT = 'Q5'
13
+ STANDARD_TITLE = 'Q6'
14
+ ACTOR = 'Q7'
15
+ PERSONAL_NAME = 'Q8'
16
+ CORPORATE_NAME = 'Q9'
17
+ ROLE = 'Q10'
18
+ TERM = 'Q11'
19
+ LANGUAGE = 'Q12'
20
+ CENTURY = 'Q13'
21
+ DATED = 'Q14'
22
+ UNDATED = 'Q15'
23
+ PLACE = 'Q16'
24
+ MATERIAL = 'Q17'
12
25
 
13
- CORE_MODEL_ITEMS = Set[MANUSCRIPT, HOLDING, RECORD]
26
+ CORE_MODEL_ITEMS = Set[MANUSCRIPT, HOLDING, DS_20_RECORD]
14
27
  end
15
28
  end
@@ -8,65 +8,5 @@ module DigitalScriptorium
8
8
  def described_manuscript_id
9
9
  claims_by_property_id(DESCRIBED_MANUSCRIPT)&.first&.entity_id_value # P3
10
10
  end
11
-
12
- def title_as_recorded_claims
13
- claims_by_property_id TITLE_AS_RECORDED # P10
14
- end
15
-
16
- def uniform_title_as_recorded_claims
17
- claims_by_property_id UNIFORM_TITLE_AS_RECORDED # P12
18
- end
19
-
20
- def associated_name_as_recorded_claims
21
- claims_by_property_id ASSOCIATED_NAME_AS_RECORDED # P14
22
- end
23
-
24
- def genre_as_recorded_claims
25
- claims_by_property_id GENRE_AS_RECORDED # P18
26
- end
27
-
28
- def language_as_recorded_claims
29
- claims_by_property_id LANGUAGE_AS_RECORDED # P21
30
- end
31
-
32
- def production_date_as_recorded_claims
33
- claims_by_property_id PRODUCTION_DATE_AS_RECORDED # P23
34
- end
35
-
36
- def dated_claims
37
- claims_by_property_id DATED # P26
38
- end
39
-
40
- def production_place_as_recorded_claims
41
- claims_by_property_id PRODUCTION_PLACE_AS_RECORDED # P27
42
- end
43
-
44
- def physical_description_claims
45
- claims_by_property_id PHYSICAL_DESCRIPTION # P29
46
- end
47
-
48
- def material_as_recorded_claims
49
- claims_by_property_id MATERIAL_AS_RECORDED # P30
50
- end
51
-
52
- def note_claims
53
- claims_by_property_id NOTE # P32
54
- end
55
-
56
- def acknowledgements_claims
57
- claims_by_property_id ACKNOWLEDGEMENTS # P33
58
- end
59
-
60
- def date_added_claims
61
- claims_by_property_id DATE_ADDED # P34
62
- end
63
-
64
- def date_last_updated_claims
65
- claims_by_property_id DATE_LAST_UPDATED # P35
66
- end
67
-
68
- def iiif_manifest_claims
69
- claims_by_property_id IIIF_MANIFEST # P41
70
- end
71
11
  end
72
12
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module DigitalScriptorium
4
- VERSION = '0.2.1'
4
+ VERSION = '0.3.0'
5
5
  end
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: digital_scriptorium
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Michael Holloway
8
8
  bindir: exe
9
9
  cert_chain: []
10
- date: 2025-01-18 00:00:00.000000000 Z
10
+ date: 2025-02-10 00:00:00.000000000 Z
11
11
  dependencies:
12
12
  - !ruby/object:Gem::Dependency
13
13
  name: wikibase_representable
@@ -37,6 +37,20 @@ dependencies:
37
37
  - - "~>"
38
38
  - !ruby/object:Gem::Version
39
39
  version: '2.5'
40
+ - !ruby/object:Gem::Dependency
41
+ name: pry
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - "~>"
45
+ - !ruby/object:Gem::Version
46
+ version: '0.14'
47
+ type: :development
48
+ prerelease: false
49
+ version_requirements: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - "~>"
52
+ - !ruby/object:Gem::Version
53
+ version: '0.14'
40
54
  - !ruby/object:Gem::Dependency
41
55
  name: rake
42
56
  requirement: !ruby/object:Gem::Requirement