dicom-curate 0.34.0 → 0.36.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -113,6 +113,31 @@ It is also possible to use S3-compatible buckets as input or output locations.
113
113
  Consult `OrganizeOptions` for further details. Please note that this feature is only
114
114
  available if you have the `@aws-sdk/client-s3` package installed.
115
115
 
116
+ ### Matching S3 ETags across uploaders
117
+
118
+ When uploading to S3, you can set `uploadPartSize` on the output S3
119
+ options to control the ETag S3 assigns to the written object:
120
+
121
+ ```ts
122
+ const options: OrganizeOptions = {
123
+ // other options are skipped
124
+ outputEndpoint: {
125
+ bucketName: 'my-bucket',
126
+ region: 'us-east-1',
127
+ // Bodies <= 5 MB: single PUT, S3 returns a plain-MD5 ETag.
128
+ // Bodies > 5 MB: multipart, S3 returns a composite "<md5>-<N>" ETag.
129
+ uploadPartSize: 5 * 1024 * 1024,
130
+ },
131
+ }
132
+ ```
133
+
134
+ This matches the ETag convention produced by any S3 client that uses
135
+ `@aws-sdk/lib-storage` at the same `partSize`, making cross-bucket
136
+ "equal bytes ⇒ equal ETag" comparisons well-defined.
137
+
138
+ When `uploadPartSize` is omitted, all uploads go through a single PUT
139
+ regardless of body size and S3 always returns a plain-MD5 ETag.
140
+
116
141
  This library can now automatically skip writing (or uploading) mapped files if the provided
117
142
  "previous" input file attributes match the record you pass in the `fileInfoIndex` property:
118
143
 
@@ -287,6 +312,64 @@ export function sampleBatchCurationSpecification(): TCurationSpecification {
287
312
  }
288
313
  ```
289
314
 
315
+ ## Excluding files with preExclude and postExclude
316
+
317
+ The curation specification supports two optional exclusion functions that let you skip files at different stages of processing. Both return **`true` to exclude** the file; returning `false` (or omitting the function entirely) lets the file through.
318
+
319
+ ### preExclude — skip before mapping
320
+
321
+ `preExclude` receives a `parser` with access to the **original, unmapped DICOM tags**. Return `true` to skip the file entirely — it will not be mapped, written, or uploaded.
322
+
323
+ ```ts
324
+ export function myCurationSpec(): TCurationSpecification {
325
+ return {
326
+ // ... other fields ...
327
+
328
+ // Exclude files whose PatientID doesn't match the expected study format.
329
+ preExclude(parser) {
330
+ return !/^AB\d{2}-\d{3}$/.test(parser.getDicom('PatientID'))
331
+ },
332
+ }
333
+ }
334
+ ```
335
+
336
+ ### postExclude — skip after mapping
337
+
338
+ `postExclude` receives a `parser` whose `getDicom()` returns **de-identified tag values** (PS315E de-identification has already run at this point), and exposes the computed output path as `parser.outputFilePath`. Return `true` to skip writing or uploading the mapped file.
339
+
340
+ Note: `parser.getFilePathComp()` still returns **input** path components inside `postExclude`, the same as in `preExclude`. Only `parser.outputFilePath` reflects the post-mapping location, as a full string.
341
+
342
+ ```ts
343
+ export function myCurationSpec(): TCurationSpecification {
344
+ return {
345
+ // ... other fields ...
346
+
347
+ // Exclude structured reports and files routed to an 'exclude' output folder.
348
+ postExclude(parser) {
349
+ if (parser.getDicom('Modality') === 'SR') return true
350
+ if (parser.outputFilePath.includes('/exclude/')) return true
351
+ return false
352
+ },
353
+ }
354
+ }
355
+ ```
356
+
357
+ ### Behaviour notes
358
+
359
+ - **Exclusions are re-evaluated on every run.** When a `preExclude` or `postExclude` is configured, the "unchanged source bytes" short-circuit is disabled so an exclusion added in a later run takes effect even if the file itself didn't change.
360
+ - **Composition across multiple specs is OR.** When `composeSpecs` merges specs that each define `preExclude` / `postExclude`, the composed function excludes a file if **any** spec's function returns `true`. Evaluation short-circuits on the first `true`.
361
+ - **Exceptions are fail-safe.** If an exclusion function throws, the file is treated as **included** and the error message is appended to `mapResults.errors`.
362
+
363
+ ### Result shape
364
+
365
+ When a file is excluded, `curateOne` / `curateMany` still returns a result object for it. The `excluded` field indicates which function rejected it:
366
+
367
+ ```ts
368
+ // 'pre' — excluded by preExclude (file was never mapped)
369
+ // 'post' — excluded by postExclude (file was mapped but not written)
370
+ result.excluded // => 'pre' | 'post' | undefined
371
+ ```
372
+
290
373
  ## DICOM Conformance Notes
291
374
 
292
375
  dicom-curate