cocina-models 0.120.0 → 0.121.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.claude/skills/cocina-jq-query/SKILL.md +8 -0
- data/.gitignore +5 -0
- data/AGENTS.md +208 -0
- data/Gemfile +2 -0
- data/Gemfile.lock +23 -2
- data/README.md +4 -1
- data/bin/enhance-report-csv +90 -0
- data/bin/validate-data +5 -0
- data/lib/cocina/models/contributor.rb +0 -3
- data/lib/cocina/models/validators/description_types_visitor_validator.rb +1 -1
- data/lib/cocina/models/version.rb +1 -1
- data/schema.json +0 -58
- metadata +4 -2
- data/lib/cocina/models/descriptive_parallel_contributor.rb +0 -29
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: cfcb2c2a393845b1c9129ab48f5fc7340beb8067936694798e800a5935bf1ac9
|
|
4
|
+
data.tar.gz: f9dc80c520ddecb670a5df20273c635146f75d75025132402d4d1329090e29fd
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 2f7ccadc053ba81117275251401432ec92c65fefd86da94c25e7c862f8680c868d3f392a84b3966fb48609a5c383abd8dc90b8a51e457fd2fe8d3792371bd4a6
|
|
7
|
+
data.tar.gz: c1355245a0571f8933155c9de04b38226b55b96a3ffdb899af3d2af553e30f53ae7986a41e6293339a90cc2a017fcd6eec273fcd1ce64df57b26cbbb5f161193
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: cocina-jq-query
|
|
3
|
+
description: Build and validate a jq query against a Cocina Model JSON serialization. Use when the user wants to query, filter, or transform a Cocina object (DRO, Collection, AdminPolicy) using jq, or asks for help writing a jq expression for Cocina data.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# cocina-jq-query
|
|
7
|
+
|
|
8
|
+
Follow the workflow defined in [AGENTS.md](../../../AGENTS.md) under **cocina-jq**.
|
data/.gitignore
CHANGED
data/AGENTS.md
ADDED
|
@@ -0,0 +1,208 @@
|
|
|
1
|
+
# Agent Instructions for cocina-models
|
|
2
|
+
|
|
3
|
+
## cocina-jq — Build jq queries for Cocina Model JSON
|
|
4
|
+
|
|
5
|
+
Use this workflow when the user wants to query, filter, or transform a Cocina object (DRO, Collection, AdminPolicy) using jq.
|
|
6
|
+
|
|
7
|
+
### Step 1 — Check prerequisites
|
|
8
|
+
|
|
9
|
+
#### Check jq
|
|
10
|
+
Run `jq --version`. If jq is not installed, tell the user:
|
|
11
|
+
> `jq` is not installed. Install it with `brew install jq`, then retry.
|
|
12
|
+
|
|
13
|
+
Stop here if jq is missing.
|
|
14
|
+
|
|
15
|
+
#### Check parallel
|
|
16
|
+
Run `parallel --version`. If parallel is not installed, tell the user:
|
|
17
|
+
> `parallel` is not installed. Install it with `brew install parallel`, then retry.
|
|
18
|
+
|
|
19
|
+
#### Check pv
|
|
20
|
+
Run `pv --version`. If pv is not installed, tell the user:
|
|
21
|
+
> `pv` is not installed. Install it with `brew install pv`, then retry.
|
|
22
|
+
|
|
23
|
+
### Output format (always apply)
|
|
24
|
+
|
|
25
|
+
Every jq query produced by this skill **must output a CSV line** using `@csv`. The **first field must always be the external identifier** (`externalIdentifier`). Additional fields follow based on the user's query. Example:
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
"druid:bc123df4567","some value","another value"
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Use `[.externalIdentifier, ...] | @csv` as the output expression. Apply this constraint automatically — do not ask the user whether to include the external identifier.
|
|
32
|
+
|
|
33
|
+
### Step 2 — Resume or collect inputs
|
|
34
|
+
|
|
35
|
+
First, ask the user:
|
|
36
|
+
|
|
37
|
+
> Do you want to resume an existing query?
|
|
38
|
+
|
|
39
|
+
**If yes:** Ask for the filename of the `.jq.txt` file (e.g., `contributor-name-uri-non-loc.jq.txt`). Read that file from the project root. The file header contains the original inputs as comments (query description, expected output description, example JSON, example output). Parse those comments to reconstruct the inputs. Confirm with the user that the loaded values look correct, then proceed to Step 3 with those inputs (skip re-asking for them).
|
|
40
|
+
|
|
41
|
+
**If no:** Ask the user for each input, one at a time:
|
|
42
|
+
|
|
43
|
+
1. **Query description** — what should the query do? (e.g., "extract all file labels from structural")
|
|
44
|
+
2. **Expected output description** — what additional values (beyond the external identifier) should appear in the output?
|
|
45
|
+
3. **Example Cocina object** — paste JSON directly
|
|
46
|
+
4. **Example output** — paste the exact expected CSV output (must start with the external identifier as the first field)
|
|
47
|
+
|
|
48
|
+
Explicitly ask for each input; do not infer or guess.
|
|
49
|
+
|
|
50
|
+
### Step 3 — Clarify ambiguities
|
|
51
|
+
|
|
52
|
+
Review the inputs from Step 2. If anything is unclear or underspecified, ask the user targeted questions before proceeding. Examples of things to clarify:
|
|
53
|
+
|
|
54
|
+
- Is the query meant to return one value per object, or aggregate across many objects?
|
|
55
|
+
- Should missing or null fields be skipped, returned as null, or cause an error?
|
|
56
|
+
- Are there edge cases in the data structure the query must handle (e.g., empty arrays, nested arrays, optional fields)?
|
|
57
|
+
- Is the output format exactly as shown, or is there flexibility (e.g., flat vs. nested)?
|
|
58
|
+
|
|
59
|
+
Ask only questions that would change how the query is written. Do not ask about things already clear from the inputs. If everything is unambiguous, skip this step silently and proceed.
|
|
60
|
+
|
|
61
|
+
### Step 4 — Load relevant schema portion
|
|
62
|
+
|
|
63
|
+
Read `schema.json` from the project root. Extract only the `$defs` entries relevant to the Cocina object type found in the example's `type` field:
|
|
64
|
+
|
|
65
|
+
- `https://cocina.sul.stanford.edu/models/object` → DRO-related defs
|
|
66
|
+
- `https://cocina.sul.stanford.edu/models/collection` → Collection-related defs
|
|
67
|
+
- `https://cocina.sul.stanford.edu/models/admin_policy` → AdminPolicy-related defs
|
|
68
|
+
|
|
69
|
+
Include only the defs actually referenced (follow `$ref` chains up to 2 levels deep). Do not load the entire schema.
|
|
70
|
+
|
|
71
|
+
### Step 5 — Generate and validate the query (up to 3 attempts)
|
|
72
|
+
|
|
73
|
+
**Attempt 1:** Use the schema excerpt, example JSON, query description, and expected output to write a jq query.
|
|
74
|
+
|
|
75
|
+
Run it:
|
|
76
|
+
```bash
|
|
77
|
+
echo '<example_json>' | jq '<query>'
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Compare actual output to the example output. If it matches → proceed to Step 5.
|
|
81
|
+
|
|
82
|
+
**Attempt 2 (if attempt 1 fails):** Run `man jq` to load the jq manual. Use it to refine the query. Re-run and validate.
|
|
83
|
+
|
|
84
|
+
**Attempt 3 (if attempt 2 fails):** Make a final attempt using all context. Re-run and validate.
|
|
85
|
+
|
|
86
|
+
**After 3 failures:** Present the best attempt, explain what is wrong, and ask the user to clarify.
|
|
87
|
+
|
|
88
|
+
### Step 6 — Generate local HTML playground
|
|
89
|
+
|
|
90
|
+
Write `<slug>-playground.html` in the project root using the template below.
|
|
91
|
+
|
|
92
|
+
Substitute:
|
|
93
|
+
- Every occurrence of `SLUG` → the actual slug string
|
|
94
|
+
- `JSON_PLACEHOLDER` → the example JSON (pretty-printed) passed through `JSON.stringify` a second time, producing a valid JS string literal (e.g. `"{\"foo\":\"bar\"}"`)
|
|
95
|
+
- `QUERY_PLACEHOLDER` → the validated jq query passed through `JSON.stringify`, producing a valid JS string literal (e.g. `".foo"`)
|
|
96
|
+
|
|
97
|
+
```html
|
|
98
|
+
<!DOCTYPE html>
|
|
99
|
+
<html lang="en">
|
|
100
|
+
<head>
|
|
101
|
+
<meta charset="UTF-8">
|
|
102
|
+
<title>jq playground — SLUG</title>
|
|
103
|
+
<style>
|
|
104
|
+
body { font-family: monospace; margin: 2rem; background: #1e1e1e; color: #d4d4d4; }
|
|
105
|
+
h2 { color: #9cdcfe; }
|
|
106
|
+
textarea, input { width: 100%; box-sizing: border-box; background: #252526; color: #d4d4d4; border: 1px solid #444; padding: 8px; font-family: monospace; font-size: 13px; border-radius: 3px; }
|
|
107
|
+
textarea { height: 260px; resize: vertical; }
|
|
108
|
+
input { height: 36px; }
|
|
109
|
+
button { margin-top: 8px; background: #0e639c; color: white; border: none; padding: 8px 20px; cursor: pointer; font-size: 14px; border-radius: 3px; }
|
|
110
|
+
button:hover { background: #1177bb; }
|
|
111
|
+
label { display: block; margin-top: 16px; margin-bottom: 4px; font-size: 12px; color: #9cdcfe; text-transform: uppercase; letter-spacing: 0.05em; }
|
|
112
|
+
#output { background: #252526; border: 1px solid #444; padding: 12px; min-height: 80px; white-space: pre-wrap; word-break: break-all; border-radius: 3px; }
|
|
113
|
+
.error { color: #f44747; }
|
|
114
|
+
</style>
|
|
115
|
+
</head>
|
|
116
|
+
<body>
|
|
117
|
+
<h2>jq playground — SLUG</h2>
|
|
118
|
+
<label>JSON Input</label>
|
|
119
|
+
<textarea id="json"></textarea>
|
|
120
|
+
<label>jq Filter</label>
|
|
121
|
+
<input id="query" type="text" />
|
|
122
|
+
<button onclick="run()">▶ Run</button>
|
|
123
|
+
<label>Output</label>
|
|
124
|
+
<pre id="output">(click Run)</pre>
|
|
125
|
+
|
|
126
|
+
<script src="https://cdn.jsdelivr.net/npm/jq-web@0.5.1/jq.wasm.js"></script>
|
|
127
|
+
<script>
|
|
128
|
+
const INITIAL_JSON = JSON_PLACEHOLDER;
|
|
129
|
+
const INITIAL_QUERY = QUERY_PLACEHOLDER;
|
|
130
|
+
document.getElementById('json').value = JSON.stringify(JSON.parse(INITIAL_JSON), null, 2);
|
|
131
|
+
document.getElementById('query').value = INITIAL_QUERY;
|
|
132
|
+
|
|
133
|
+
function run() {
|
|
134
|
+
const json = document.getElementById('json').value;
|
|
135
|
+
const query = document.getElementById('query').value;
|
|
136
|
+
const out = document.getElementById('output');
|
|
137
|
+
out.className = '';
|
|
138
|
+
out.textContent = 'Running…';
|
|
139
|
+
jq.promised.raw(json, query)
|
|
140
|
+
.then(r => { out.textContent = r || '(empty output)'; })
|
|
141
|
+
.catch(e => { out.className = 'error'; out.textContent = String(e); });
|
|
142
|
+
}
|
|
143
|
+
</script>
|
|
144
|
+
</body>
|
|
145
|
+
</html>
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
### Step 7 — Write the query to a .jq.txt file
|
|
149
|
+
|
|
150
|
+
Generate a short kebab-case slug summarizing the query (e.g., `invalid-encoding`, `file-label-extract`). Write the file `<slug>.jq.txt` in the project root with the following structure:
|
|
151
|
+
|
|
152
|
+
1. A comment header containing the user's inputs:
|
|
153
|
+
|
|
154
|
+
```
|
|
155
|
+
# Query description: <query description from Step 2>
|
|
156
|
+
#
|
|
157
|
+
# Expected output: <expected output description from Step 2>
|
|
158
|
+
#
|
|
159
|
+
# Example input:
|
|
160
|
+
# <example Cocina JSON, each line prefixed with "# ">
|
|
161
|
+
#
|
|
162
|
+
# Example output:
|
|
163
|
+
# <example output CSV, each line prefixed with "# ">
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
2. A blank line, then the validated jq query.
|
|
167
|
+
|
|
168
|
+
The comment lines must use `#` so the file remains valid jq syntax. When resuming (Step 2 resume path), parse these comment sections by their labels to reconstruct the inputs.
|
|
169
|
+
|
|
170
|
+
Find the most recent `.jsonl.xz` file in the project root by listing `*.jsonl.xz` files sorted by name descending and taking the first result.
|
|
171
|
+
|
|
172
|
+
### Step 8 — Output
|
|
173
|
+
|
|
174
|
+
Present:
|
|
175
|
+
1. The jq query in a code block
|
|
176
|
+
2. A 1–3 sentence explanation of how it works
|
|
177
|
+
3. A markdown link to the local playground file using a `file://` URL (e.g. `[Open playground](file:///Users/someuser/data/sdr/cocina-models/<slug>-playground.html)`) — substitute the actual absolute path — plus the equivalent shell command (`open <slug>-playground.html`) for reference
|
|
178
|
+
4. A ready-to-run shell snippet:
|
|
179
|
+
|
|
180
|
+
```
|
|
181
|
+
xzcat <most-recent .jsonl.xz filename> \
|
|
182
|
+
| pv -l -s 5500000 \
|
|
183
|
+
| parallel -j$(sysctl -n hw.logicalcpu) --pipe --block 50M --recend '\n' \
|
|
184
|
+
jq -rcf <slug>.jq.txt \
|
|
185
|
+
| bundle exec bin/enhance-report-csv \
|
|
186
|
+
| tee <slug>.csv
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
Substitute the actual filenames — do not leave placeholders.
|
|
190
|
+
|
|
191
|
+
Also, remind the user to tunnel to Solr in a separate terminal with:
|
|
192
|
+
```
|
|
193
|
+
ssh -L 8990:sul-solr-prod-a.stanford.edu:80 lyberadmin@argo-prod-02.stanford.edu
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
### Step 9 — Iterate
|
|
197
|
+
|
|
198
|
+
After presenting step 7 output, prompt the user:
|
|
199
|
+
|
|
200
|
+
> Want to refine the query? You can describe a change (e.g., "also filter by `type`") or paste a modified jq expression directly.
|
|
201
|
+
|
|
202
|
+
**If the user describes a change:** Update the query to satisfy the new requirement, re-run against the example JSON (same validation loop as Step 5, up to 3 attempts), then repeat Steps 6–8 with the updated query and slug.
|
|
203
|
+
|
|
204
|
+
**If the user pastes a modified query directly:** Validate it by running against the example JSON. If it produces valid output, skip straight to repeating Steps 6–8. If it errors, diagnose and fix (up to 3 attempts), then repeat Steps 5–7.
|
|
205
|
+
|
|
206
|
+
**Each iteration overwrites the `.jq.txt` file and `<slug>-playground.html`** (same slug unless the query purpose changed significantly, in which case generate a new slug) **and replaces all previous outputs** with updated versions.
|
|
207
|
+
|
|
208
|
+
Continue offering to iterate after each round until the user is satisfied.
|
data/Gemfile
CHANGED
data/Gemfile.lock
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
PATH
|
|
2
2
|
remote: .
|
|
3
3
|
specs:
|
|
4
|
-
cocina-models (0.
|
|
4
|
+
cocina-models (0.121.0)
|
|
5
5
|
activesupport
|
|
6
6
|
deprecation
|
|
7
7
|
dry-struct (~> 1.0)
|
|
@@ -35,8 +35,10 @@ GEM
|
|
|
35
35
|
attr_extras (7.1.0)
|
|
36
36
|
base64 (0.3.0)
|
|
37
37
|
bigdecimal (4.1.2)
|
|
38
|
+
builder (3.3.0)
|
|
38
39
|
concurrent-ruby (1.3.6)
|
|
39
40
|
connection_pool (3.0.2)
|
|
41
|
+
csv (3.3.5)
|
|
40
42
|
date (3.5.1)
|
|
41
43
|
debug (1.11.1)
|
|
42
44
|
irb (~> 1.10)
|
|
@@ -73,6 +75,12 @@ GEM
|
|
|
73
75
|
equivalent-xml (0.6.0)
|
|
74
76
|
nokogiri (>= 1.4.3)
|
|
75
77
|
erb (6.0.4)
|
|
78
|
+
faraday (2.14.2)
|
|
79
|
+
faraday-net_http (>= 2.0, < 3.5)
|
|
80
|
+
json
|
|
81
|
+
logger
|
|
82
|
+
faraday-net_http (3.4.4)
|
|
83
|
+
net-http (~> 0.5)
|
|
76
84
|
i18n (1.14.8)
|
|
77
85
|
concurrent-ruby (~> 1.0)
|
|
78
86
|
ice_nine (0.11.2)
|
|
@@ -93,6 +101,8 @@ GEM
|
|
|
93
101
|
minitest (6.0.6)
|
|
94
102
|
drb (~> 2.0)
|
|
95
103
|
prism (~> 1.5)
|
|
104
|
+
net-http (0.9.1)
|
|
105
|
+
uri (>= 0.11.1)
|
|
96
106
|
nokogiri (1.19.3-arm64-darwin)
|
|
97
107
|
racc (~> 1.4)
|
|
98
108
|
nokogiri (1.19.3-x86_64-linux-gnu)
|
|
@@ -121,6 +131,9 @@ GEM
|
|
|
121
131
|
regexp_parser (2.12.0)
|
|
122
132
|
reline (0.6.3)
|
|
123
133
|
io-console (~> 0.5)
|
|
134
|
+
rsolr (2.6.0)
|
|
135
|
+
builder (>= 2.1.2)
|
|
136
|
+
faraday (>= 0.9, < 3, != 2.0.0)
|
|
124
137
|
rspec (3.13.2)
|
|
125
138
|
rspec-core (~> 3.13.0)
|
|
126
139
|
rspec-expectations (~> 3.13.0)
|
|
@@ -187,8 +200,10 @@ PLATFORMS
|
|
|
187
200
|
DEPENDENCIES
|
|
188
201
|
bundler (>= 2.0, < 5)
|
|
189
202
|
cocina-models!
|
|
203
|
+
csv
|
|
190
204
|
debug
|
|
191
205
|
rake (~> 13.0)
|
|
206
|
+
rsolr
|
|
192
207
|
rspec (~> 3.0)
|
|
193
208
|
rspec_junit_formatter
|
|
194
209
|
rubocop (~> 1.24)
|
|
@@ -203,10 +218,12 @@ CHECKSUMS
|
|
|
203
218
|
attr_extras (7.1.0) sha256=d96fc9a9dd5d85ba2d37762440a816f840093959ae26bb90da994c2d9f1fc827
|
|
204
219
|
base64 (0.3.0) sha256=27337aeabad6ffae05c265c450490628ef3ebd4b67be58257393227588f5a97b
|
|
205
220
|
bigdecimal (4.1.2) sha256=53d217666027eab4280346fba98e7d5b66baaae1b9c3c1c0ffe89d48188a3fbd
|
|
221
|
+
builder (3.3.0) sha256=497918d2f9dca528fdca4b88d84e4ef4387256d984b8154e9d5d3fe5a9c8835f
|
|
206
222
|
bundler (4.0.13) sha256=19f08be7f27022cf0b89f27da0b044ae075e8270a9ef44ad248a932614e1ca3b
|
|
207
|
-
cocina-models (0.
|
|
223
|
+
cocina-models (0.121.0)
|
|
208
224
|
concurrent-ruby (1.3.6) sha256=6b56837e1e7e5292f9864f34b69c5a2cbc75c0cf5338f1ce9903d10fa762d5ab
|
|
209
225
|
connection_pool (3.0.2) sha256=33fff5ba71a12d2aa26cb72b1db8bba2a1a01823559fb01d29eb74c286e62e0a
|
|
226
|
+
csv (3.3.5) sha256=6e5134ac3383ef728b7f02725d9872934f523cb40b961479f69cf3afa6c8e73f
|
|
210
227
|
date (3.5.1) sha256=750d06384d7b9c15d562c76291407d89e368dda4d4fff957eb94962d325a0dc0
|
|
211
228
|
debug (1.11.1) sha256=2e0b0ac6119f2207a6f8ac7d4a73ca8eb4e440f64da0a3136c30343146e952b6
|
|
212
229
|
deprecation (1.1.0) sha256=01707cea9a6ed2d7270377457941f43394a345e6dd8048e1be6d18ff2f2a01e1
|
|
@@ -221,6 +238,8 @@ CHECKSUMS
|
|
|
221
238
|
edtf (3.2.0) sha256=a15a0ee274e49c8047a3ebb5d61d793ba44f7f8ffbf0595392c467e3ea8d2447
|
|
222
239
|
equivalent-xml (0.6.0) sha256=8919761efa848ad0846369ff8be1f646b17e5061698c4867b09829000cc3f487
|
|
223
240
|
erb (6.0.4) sha256=38e3803694be357fe2bfe312487c74beaf9fb4e5beb3e22498952fe1645b95d9
|
|
241
|
+
faraday (2.14.2) sha256=73ccb9994a9e8648f010e32eca2ae82e41c57860aa10932cda29418b9e0223ad
|
|
242
|
+
faraday-net_http (3.4.4) sha256=0e78af151747ed1b00f33e25973b4bc220d7f16c00c39676817c8b12331eb588
|
|
224
243
|
i18n (1.14.8) sha256=285778639134865c5e0f6269e0b818256017e8cde89993fdfcbfb64d088824a5
|
|
225
244
|
ice_nine (0.11.2) sha256=5d506a7d2723d5592dc121b9928e4931742730131f22a1a37649df1c1e2e63db
|
|
226
245
|
io-console (0.8.2) sha256=d6e3ae7a7cc7574f4b8893b4fca2162e57a825b223a177b7afa236c5ef9814cc
|
|
@@ -232,6 +251,7 @@ CHECKSUMS
|
|
|
232
251
|
lint_roller (1.1.0) sha256=2c0c845b632a7d172cb849cc90c1bce937a28c5c8ccccb50dfd46a485003cc87
|
|
233
252
|
logger (1.7.0) sha256=196edec7cc44b66cfb40f9755ce11b392f21f7967696af15d274dde7edff0203
|
|
234
253
|
minitest (6.0.6) sha256=153ea36d1d987a62942382b61075745042a2b3123b1cd48f4c3675af9cc7d6f1
|
|
254
|
+
net-http (0.9.1) sha256=25ba0b67c63e89df626ed8fac771d0ad24ad151a858af2cc8e6a716ca4336996
|
|
235
255
|
nokogiri (1.19.3-arm64-darwin) sha256=71b9bd424b1b7abc18b05052a1a3cfd3627abdca62be280854cc411791357e42
|
|
236
256
|
nokogiri (1.19.3-x86_64-linux-gnu) sha256=2f5078620fe12e83669b5b17311b32532a8153d02eee7ad06948b926d6080976
|
|
237
257
|
optimist (3.2.1) sha256=8cf8a0fd69f3aa24ab48885d3a666717c27bc3d9edd6e976e18b9d771e72e34e
|
|
@@ -248,6 +268,7 @@ CHECKSUMS
|
|
|
248
268
|
rdoc (7.2.0) sha256=8650f76cd4009c3b54955eb5d7e3a075c60a57276766ebf36f9085e8c9f23192
|
|
249
269
|
regexp_parser (2.12.0) sha256=35a916a1d63190ab5c9009457136ae5f3c0c7512d60291d0d1378ba18ce08ebb
|
|
250
270
|
reline (0.6.3) sha256=1198b04973565b36ec0f11542ab3f5cfeeec34823f4e54cebde90968092b1835
|
|
271
|
+
rsolr (2.6.0) sha256=4b3bcea772cac300562775c20eeddedf63a6b7516a070cb6fbde000b09cfe12b
|
|
251
272
|
rspec (3.13.2) sha256=206284a08ad798e61f86d7ca3e376718d52c0bc944626b2349266f239f820587
|
|
252
273
|
rspec-core (3.13.6) sha256=a8823c6411667b60a8bca135364351dda34cd55e44ff94c4be4633b37d828b2d
|
|
253
274
|
rspec-expectations (3.13.5) sha256=33a4d3a1d95060aea4c94e9f237030a8f9eae5615e9bd85718fe3a09e4b58836
|
data/README.md
CHANGED
|
@@ -51,6 +51,9 @@ exe/generator generate_vocab
|
|
|
51
51
|
exe/generator generate_descriptive_docs
|
|
52
52
|
```
|
|
53
53
|
|
|
54
|
+
## Reports / querying
|
|
55
|
+
jq-based queries can be authored against a [local data export](https://github.com/sul-dlss/dor-services-app#export-data) using the `/cocina-jq-query` skill. This will help with constructing and efficiently running the query.
|
|
56
|
+
|
|
54
57
|
## Testing
|
|
55
58
|
|
|
56
59
|
The generator is tested via its output when run against `schema.json`, viz., the Cocina model classes. Thus, `generate` should be run after any changes to `schema.json`.
|
|
@@ -158,7 +161,7 @@ This list of services is known to include:
|
|
|
158
161
|
* [sul-dlss/sdr-api](https://github.com/sul-dlss/sdr-api)
|
|
159
162
|
* [sul-dlss/dor-services-app](https://github.com/sul-dlss/dor-services-app/)
|
|
160
163
|
|
|
161
|
-
Perform `bundle update
|
|
164
|
+
Perform `bundle update cocina-models dor-services-client --conservative` in the services above and make PRs for those repos. You may first need to update how these gems are pinned in the `Gemfile` in order to bump them.
|
|
162
165
|
|
|
163
166
|
Get the directly coupled services PRs merged before the deploy in step 5.
|
|
164
167
|
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
#!/usr/bin/env ruby
|
|
2
|
+
# frozen_string_literal: true
|
|
3
|
+
|
|
4
|
+
require 'bundler/setup'
|
|
5
|
+
require 'csv'
|
|
6
|
+
require 'optparse'
|
|
7
|
+
require 'rsolr'
|
|
8
|
+
|
|
9
|
+
# This script reads a CSV from standard input, where the first column is expected to be a druid.
|
|
10
|
+
# It queries Solr for each druid to fetch additional fields, then outputs an enhanced CSV to standard output.
|
|
11
|
+
# Usage: enhance-report-csv [options] < input.csv > output.csv
|
|
12
|
+
# To use locally, tunnel to solr with: ssh -L 8990:sul-solr-prod-a.stanford.edu:80 lyberadmin@argo-prod-02.stanford.edu
|
|
13
|
+
|
|
14
|
+
FIELD_HEADERS = {
|
|
15
|
+
'display_title_ss' => 'title',
|
|
16
|
+
'member_of_collection_ssim' => 'collection_druids',
|
|
17
|
+
'collection_title_ssimdv' => 'collection_titles',
|
|
18
|
+
'governed_by_ssim' => 'apo_druid',
|
|
19
|
+
'apo_title_ssimdv' => 'apo_title',
|
|
20
|
+
'folio_instance_hrid_ssim' => 'folio_hrid'
|
|
21
|
+
}.freeze
|
|
22
|
+
|
|
23
|
+
def parse_options # rubocop:disable Metrics/MethodLength
|
|
24
|
+
options = {
|
|
25
|
+
solr_url: 'http://localhost:8990/solr/argo_prod',
|
|
26
|
+
batch_size: 100
|
|
27
|
+
}
|
|
28
|
+
|
|
29
|
+
OptionParser.new do |opts|
|
|
30
|
+
opts.banner = 'Usage: enhance-report-csv [options] < input.csv > output.csv'
|
|
31
|
+
|
|
32
|
+
opts.on('--solr-url URL', 'Solr URL (default: http://localhost:8990/solr/argo_prod)') do |url|
|
|
33
|
+
options[:solr_url] = url
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
opts.on('--batch-size NUM', Integer, 'Solr batch size (default: 100)') do |n|
|
|
37
|
+
options[:batch_size] = n
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
opts.on('-h', '--help', 'Display this help message') do
|
|
41
|
+
puts opts
|
|
42
|
+
exit
|
|
43
|
+
end
|
|
44
|
+
end.parse!
|
|
45
|
+
|
|
46
|
+
options
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
def fetch_solr_docs(solr, druids)
|
|
50
|
+
druids.map { |d| "id:(#{d})" }.join(' OR ')
|
|
51
|
+
response = solr.get('select', params: {
|
|
52
|
+
q: '*:*',
|
|
53
|
+
fq: "{!terms f=id}#{druids.join(',')}",
|
|
54
|
+
fl: "id,#{FIELD_HEADERS.keys.join(',')}",
|
|
55
|
+
rows: druids.size
|
|
56
|
+
})
|
|
57
|
+
response['response']['docs'].to_h do |doc|
|
|
58
|
+
[doc['id'], doc]
|
|
59
|
+
end
|
|
60
|
+
end
|
|
61
|
+
|
|
62
|
+
def extract_fields(doc)
|
|
63
|
+
FIELD_HEADERS.keys.map do |field|
|
|
64
|
+
value = doc&.fetch(field, nil)
|
|
65
|
+
value.is_a?(Array) ? value.join(';') : value.to_s
|
|
66
|
+
end
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
def build_output(solr, rows, batch_size)
|
|
70
|
+
extra_col_count = (rows.first&.size || 1) - 1
|
|
71
|
+
extra_headers = extra_col_count.times.map { |i| "col#{i + 2}" }
|
|
72
|
+
|
|
73
|
+
CSV.generate do |out|
|
|
74
|
+
out << (['druid'] + FIELD_HEADERS.values + extra_headers)
|
|
75
|
+
|
|
76
|
+
rows.each_slice(batch_size) do |batch|
|
|
77
|
+
docs = fetch_solr_docs(solr, batch.map { |row| row[0] })
|
|
78
|
+
|
|
79
|
+
batch.each do |row|
|
|
80
|
+
druid = row[0]
|
|
81
|
+
out << ([druid] + extract_fields(docs[druid]) + row[1..])
|
|
82
|
+
end
|
|
83
|
+
end
|
|
84
|
+
end
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
options = parse_options
|
|
88
|
+
solr = RSolr.connect(url: options[:solr_url])
|
|
89
|
+
rows = CSV.parse($stdin.read, headers: false)
|
|
90
|
+
print build_output(solr, rows, options[:batch_size])
|
data/bin/validate-data
CHANGED
|
@@ -72,13 +72,18 @@ end
|
|
|
72
72
|
|
|
73
73
|
# Get total line count (either from option or by counting)
|
|
74
74
|
def get_total_lines(filename, provided_count)
|
|
75
|
+
count_filename = filename.sub(/\.xz$/, '.count.txt')
|
|
75
76
|
if provided_count
|
|
76
77
|
puts "Using provided line count: #{provided_count}"
|
|
77
78
|
provided_count
|
|
79
|
+
elsif File.exist?(count_filename)
|
|
80
|
+
puts "Reading line count from #{count_filename}..."
|
|
81
|
+
File.read(count_filename).to_i
|
|
78
82
|
else
|
|
79
83
|
puts 'Counting lines...'
|
|
80
84
|
total = count_lines(filename)
|
|
81
85
|
puts "Total lines to validate: #{total}"
|
|
86
|
+
File.write(count_filename, total)
|
|
82
87
|
total
|
|
83
88
|
end
|
|
84
89
|
end
|
|
@@ -23,9 +23,6 @@ module Cocina
|
|
|
23
23
|
attribute :note, Types::Strict::Array.of(DescriptiveValue).default([].freeze)
|
|
24
24
|
# URL or other pointer to the location of the contributor information.
|
|
25
25
|
attribute? :valueAt, Types::Strict::String
|
|
26
|
-
# For multiple representations of information about the same contributor (e.g. in different
|
|
27
|
-
# languages).
|
|
28
|
-
attribute :parallelContributor, Types::Strict::Array.of(DescriptiveParallelContributor).default([].freeze)
|
|
29
26
|
end
|
|
30
27
|
end
|
|
31
28
|
end
|
|
@@ -41,7 +41,7 @@ module Cocina
|
|
|
41
41
|
# Some part of the path are ignored for the purpose of matching.
|
|
42
42
|
def clean_path(path)
|
|
43
43
|
new_path = path.reject do |part|
|
|
44
|
-
part.is_a?(Integer) || %i[parallelValue
|
|
44
|
+
part.is_a?(Integer) || %i[parallelValue parallelEvent].include?(part.to_sym)
|
|
45
45
|
end.map(&:to_sym)
|
|
46
46
|
# This needs to happen after parallelValue is removed
|
|
47
47
|
# to handle structuredValue > parallelValue > structuredValue
|
data/schema.json
CHANGED
|
@@ -505,13 +505,6 @@
|
|
|
505
505
|
"valueAt": {
|
|
506
506
|
"description": "URL or other pointer to the location of the contributor information.",
|
|
507
507
|
"type": "string"
|
|
508
|
-
},
|
|
509
|
-
"parallelContributor": {
|
|
510
|
-
"description": "For multiple representations of information about the same contributor (e.g. in different languages).",
|
|
511
|
-
"type": "array",
|
|
512
|
-
"items": {
|
|
513
|
-
"$ref": "#/$defs/DescriptiveParallelContributor"
|
|
514
|
-
}
|
|
515
508
|
}
|
|
516
509
|
},
|
|
517
510
|
"unevaluatedProperties": false
|
|
@@ -1129,57 +1122,6 @@
|
|
|
1129
1122
|
}
|
|
1130
1123
|
}
|
|
1131
1124
|
},
|
|
1132
|
-
"DescriptiveParallelContributor": {
|
|
1133
|
-
"description": "Value model for multiple representations of information about the same contributor (e.g. in different languages).",
|
|
1134
|
-
"deprecated": true,
|
|
1135
|
-
"type": "object",
|
|
1136
|
-
"properties": {
|
|
1137
|
-
"name": {
|
|
1138
|
-
"description": "Names associated with a contributor.",
|
|
1139
|
-
"type": "array",
|
|
1140
|
-
"items": {
|
|
1141
|
-
"$ref": "#/$defs/DescriptiveValue"
|
|
1142
|
-
}
|
|
1143
|
-
},
|
|
1144
|
-
"type": {
|
|
1145
|
-
"description": "Entity type of the contributor (person, organization, etc.). See https://github.com/sul-dlss/cocina-models/blob/main/docs/description_types.md for valid types.",
|
|
1146
|
-
"type": "string"
|
|
1147
|
-
},
|
|
1148
|
-
"status": {
|
|
1149
|
-
"description": "Status of the contributor relative to other parallel contributors (e.g. the primary author among a group of contributors).",
|
|
1150
|
-
"type": "string"
|
|
1151
|
-
},
|
|
1152
|
-
"role": {
|
|
1153
|
-
"description": "Relationships of the contributor to the resource or to an event in its history.",
|
|
1154
|
-
"type": "array",
|
|
1155
|
-
"items": {
|
|
1156
|
-
"$ref": "#/$defs/DescriptiveValue"
|
|
1157
|
-
}
|
|
1158
|
-
},
|
|
1159
|
-
"identifier": {
|
|
1160
|
-
"description": "Identifiers and URIs associated with the contributor entity.",
|
|
1161
|
-
"type": "array",
|
|
1162
|
-
"items": {
|
|
1163
|
-
"$ref": "#/$defs/DescriptiveValue"
|
|
1164
|
-
}
|
|
1165
|
-
},
|
|
1166
|
-
"note": {
|
|
1167
|
-
"description": "Other information associated with the contributor.",
|
|
1168
|
-
"type": "array",
|
|
1169
|
-
"items": {
|
|
1170
|
-
"$ref": "#/$defs/DescriptiveValue"
|
|
1171
|
-
}
|
|
1172
|
-
},
|
|
1173
|
-
"valueAt": {
|
|
1174
|
-
"description": "URL or other pointer to the location of the contributor information.",
|
|
1175
|
-
"type": "string"
|
|
1176
|
-
},
|
|
1177
|
-
"valueLanguage": {
|
|
1178
|
-
"$ref": "#/$defs/DescriptiveValueLanguage"
|
|
1179
|
-
}
|
|
1180
|
-
},
|
|
1181
|
-
"unevaluatedProperties": false
|
|
1182
|
-
},
|
|
1183
1125
|
"DescriptiveParallelEvent": {
|
|
1184
1126
|
"description": "Value model for multiple representations of information about the same event (e.g. in different languages).",
|
|
1185
1127
|
"type": "object",
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: cocina-models
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.121.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Justin Coyne
|
|
@@ -290,17 +290,20 @@ extensions: []
|
|
|
290
290
|
extra_rdoc_files: []
|
|
291
291
|
files:
|
|
292
292
|
- ".circleci/config.yml"
|
|
293
|
+
- ".claude/skills/cocina-jq-query/SKILL.md"
|
|
293
294
|
- ".github/pull_request_template.md"
|
|
294
295
|
- ".gitignore"
|
|
295
296
|
- ".rspec"
|
|
296
297
|
- ".rubocop.yml"
|
|
297
298
|
- ".rubocop_todo.yml"
|
|
299
|
+
- AGENTS.md
|
|
298
300
|
- Gemfile
|
|
299
301
|
- Gemfile.lock
|
|
300
302
|
- LICENSE
|
|
301
303
|
- README.md
|
|
302
304
|
- Rakefile
|
|
303
305
|
- bin/console
|
|
306
|
+
- bin/enhance-report-csv
|
|
304
307
|
- bin/setup
|
|
305
308
|
- bin/validate-data
|
|
306
309
|
- bin/validate-schema
|
|
@@ -359,7 +362,6 @@ files:
|
|
|
359
362
|
- lib/cocina/models/descriptive_basic_value.rb
|
|
360
363
|
- lib/cocina/models/descriptive_geographic_metadata.rb
|
|
361
364
|
- lib/cocina/models/descriptive_grouped_value.rb
|
|
362
|
-
- lib/cocina/models/descriptive_parallel_contributor.rb
|
|
363
365
|
- lib/cocina/models/descriptive_parallel_event.rb
|
|
364
366
|
- lib/cocina/models/descriptive_parallel_value.rb
|
|
365
367
|
- lib/cocina/models/descriptive_structured_value.rb
|
|
@@ -1,29 +0,0 @@
|
|
|
1
|
-
# frozen_string_literal: true
|
|
2
|
-
|
|
3
|
-
module Cocina
|
|
4
|
-
module Models
|
|
5
|
-
# DEPRECATED
|
|
6
|
-
# Value model for multiple representations of information about the same contributor
|
|
7
|
-
# (e.g. in different languages).
|
|
8
|
-
class DescriptiveParallelContributor < Struct
|
|
9
|
-
# Names associated with a contributor.
|
|
10
|
-
attribute :name, Types::Strict::Array.of(DescriptiveValue).default([].freeze)
|
|
11
|
-
# Entity type of the contributor (person, organization, etc.). See https://github.com/sul-dlss/cocina-models/blob/main/docs/description_types.md
|
|
12
|
-
# for valid types.
|
|
13
|
-
attribute? :type, Types::Strict::String
|
|
14
|
-
# Status of the contributor relative to other parallel contributors (e.g. the primary
|
|
15
|
-
# author among a group of contributors).
|
|
16
|
-
attribute? :status, Types::Strict::String
|
|
17
|
-
# Relationships of the contributor to the resource or to an event in its history.
|
|
18
|
-
attribute :role, Types::Strict::Array.of(DescriptiveValue).default([].freeze)
|
|
19
|
-
# Identifiers and URIs associated with the contributor entity.
|
|
20
|
-
attribute :identifier, Types::Strict::Array.of(DescriptiveValue).default([].freeze)
|
|
21
|
-
# Other information associated with the contributor.
|
|
22
|
-
attribute :note, Types::Strict::Array.of(DescriptiveValue).default([].freeze)
|
|
23
|
-
# URL or other pointer to the location of the contributor information.
|
|
24
|
-
attribute? :valueAt, Types::Strict::String
|
|
25
|
-
# Language of the descriptive element value
|
|
26
|
-
attribute? :valueLanguage, DescriptiveValueLanguage.optional
|
|
27
|
-
end
|
|
28
|
-
end
|
|
29
|
-
end
|