@lindas/trifid-plugin-ckan 7.0.1 → 7.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +30 -1
- package/README.md +114 -114
- package/dist/tsconfig.tsbuildinfo +1 -1
- package/package.json +8 -8
- package/src/ckan.js +51 -47
- package/src/index.js +1 -1
- package/src/query.js +1 -1
- package/src/xml.js +344 -344
package/CHANGELOG.md
CHANGED
|
@@ -1,4 +1,33 @@
|
|
|
1
|
-
# @
|
|
1
|
+
# @lindas/trifid-plugin-ckan
|
|
2
|
+
|
|
3
|
+
## 5.0.3
|
|
4
|
+
|
|
5
|
+
### Patch Changes
|
|
6
|
+
|
|
7
|
+
- Fix brotli decompression errors when SPARQL proxy returns compressed responses.
|
|
8
|
+
|
|
9
|
+
The sparql-http-client library uses node-fetch which advertises brotli support via
|
|
10
|
+
Accept-Encoding header but cannot actually decompress brotli responses properly.
|
|
11
|
+
When sparql-proxy returns brotli-compressed data, the RDF parser receives compressed
|
|
12
|
+
bytes and crashes with "Decompression failed" error, resulting in 502 Bad Gateway.
|
|
13
|
+
|
|
14
|
+
This fix uses the new `nodeCompatibleFetch` utility from @lindas/trifid-core which
|
|
15
|
+
uses native Node.js fetch (with proper brotli support) while maintaining stream
|
|
16
|
+
compatibility with sparql-http-client.
|
|
17
|
+
|
|
18
|
+
- Updated dependency: @lindas/trifid-core ^6.1.0 (moved from devDependencies to dependencies)
|
|
19
|
+
|
|
20
|
+
## 5.0.2
|
|
21
|
+
|
|
22
|
+
### Patch Changes
|
|
23
|
+
|
|
24
|
+
- 48d1042: Republish all lindas-trifid packages to npm with LINDAS namespace
|
|
25
|
+
|
|
26
|
+
## 5.0.1
|
|
27
|
+
|
|
28
|
+
### Patch Changes
|
|
29
|
+
|
|
30
|
+
- c34d5af: Fix js-yaml vulnerability by upgrading xmlbuilder2 to v4
|
|
2
31
|
|
|
3
32
|
## 5.0.0
|
|
4
33
|
|
package/README.md
CHANGED
|
@@ -1,114 +1,114 @@
|
|
|
1
|
-
# CKAN harvester endpoint
|
|
2
|
-
|
|
3
|
-
This is a small HTTP endpoint that gathers datasets that are publishable to
|
|
4
|
-
opendata.swiss, transforms them to be compatible with the CKAN harvester and
|
|
5
|
-
outputs a format (a rigid form of RDF/XML) that the harvester can read.
|
|
6
|
-
|
|
7
|
-
The format expected by the harvester is described on
|
|
8
|
-
https://handbook.opendata.swiss/fr/content/glossar/bibliothek/dcat-ap-ch.html (fr/de).
|
|
9
|
-
|
|
10
|
-
In order to be considered as a "publishable" dataset by this endpoint, a
|
|
11
|
-
dataset must follow the following conditions:
|
|
12
|
-
|
|
13
|
-
- it **has** the `dcat:Dataset` type
|
|
14
|
-
- it **has one and only one** `dcterms:identifier`
|
|
15
|
-
- it **has** a `dcterms:creator`
|
|
16
|
-
- it **has** `schema:workExample` with value `<https://ld.admin.ch/application/opendataswiss>`
|
|
17
|
-
- it **has** `schema:creativeWorkStatus` with value `<https://ld.admin.ch/vocabulary/CreativeWorkStatus/Published>`
|
|
18
|
-
- it **does not have** `schema:validThrough`
|
|
19
|
-
- it **does not have** `schema:expires`
|
|
20
|
-
|
|
21
|
-
The endpoint copies the following properties with their original value.
|
|
22
|
-
Make sure they follow [the CKAN spec](https://handbook.opendata.swiss/fr/content/glossar/bibliothek/dcat-ap-ch.html).
|
|
23
|
-
|
|
24
|
-
- `dcterms:title`
|
|
25
|
-
- `dcterms:description`
|
|
26
|
-
- `dcterms:issued`
|
|
27
|
-
- `dcterms:modified`
|
|
28
|
-
- `dcat:contactPoint`
|
|
29
|
-
- `dcat:theme`
|
|
30
|
-
- `dcterms:language`
|
|
31
|
-
- `dcat:landingPage`
|
|
32
|
-
- `dcterms:spatial`
|
|
33
|
-
- `dcterms:coverage`
|
|
34
|
-
- `dcterms:temporal`
|
|
35
|
-
- `dcterms:keyword` (literals without a language are ignored)
|
|
36
|
-
|
|
37
|
-
The following properties are populated by the endpoint:
|
|
38
|
-
|
|
39
|
-
- `dcterms:identifier`
|
|
40
|
-
|
|
41
|
-
If the original `dcterms:identifer` already contains an "@", it is copied
|
|
42
|
-
as-is. Otherwise, an identifier is created with the shape
|
|
43
|
-
`<dcterms:identifier value>@<creator slug>`, where "creator slug" is
|
|
44
|
-
the last segment of the URI of the value of `dcterms:creator`.
|
|
45
|
-
|
|
46
|
-
- `dcterms:publisher`
|
|
47
|
-
|
|
48
|
-
The original `dcterms:publisher` value is used as `rdfs:label` of the
|
|
49
|
-
final `dcterms:publisher`.
|
|
50
|
-
|
|
51
|
-
- `dcterms:relation`
|
|
52
|
-
|
|
53
|
-
Populated from `dcterms:license`.
|
|
54
|
-
|
|
55
|
-
TODO: define valid values for license
|
|
56
|
-
|
|
57
|
-
- `dcterms:accrualPeriodicity`
|
|
58
|
-
|
|
59
|
-
Supports both DC (http://purl.org/cld/freq/) and EU
|
|
60
|
-
(http://publications.europa.eu/ontology/euvoc#Frequency) frequencies.
|
|
61
|
-
DC frequencies are transformed into EU ones.
|
|
62
|
-
|
|
63
|
-
- `dcat:distribution`
|
|
64
|
-
|
|
65
|
-
Populated from `schema:workExample`.
|
|
66
|
-
Only takes work examples with a `schema:encodingFormat` into account.
|
|
67
|
-
Each distribution is built in the following way, from the properties of the work example:
|
|
68
|
-
|
|
69
|
-
- `dcterms:issued` is copied as-is
|
|
70
|
-
- `dcat:mediaType` populated from `schema:encodingFormat`
|
|
71
|
-
- `dcat:accessURL` populated from `schema:url`
|
|
72
|
-
- `dcterms:title` populated from `schema:name`
|
|
73
|
-
- `dcterms:rights` populated from `schema:identifier` of **the dataset's** `dcterms:rights`
|
|
74
|
-
- `dcterms:format` populated from `schema:encodingFormat`, with the following mapping:
|
|
75
|
-
- `text/html` -> `HTML`
|
|
76
|
-
- `application/sparql-query` -> `SERVICE`
|
|
77
|
-
- other -> `UNKNOWN`
|
|
78
|
-
|
|
79
|
-
## Usage
|
|
80
|
-
|
|
81
|
-
This should be used as a Trifid plugin.
|
|
82
|
-
|
|
83
|
-
The following options are supported:
|
|
84
|
-
|
|
85
|
-
- `endpointUrl`: URL to the SPARQL endpoint
|
|
86
|
-
- `user`: User to connect to the SPARQL endpoint
|
|
87
|
-
- `password`: Password to connect to the SPARQL endpoint
|
|
88
|
-
|
|
89
|
-
Configuring Trifid to use `@zazuko/trifid-plugin-ckan` is easy, just add the following in your configuration file:
|
|
90
|
-
|
|
91
|
-
```yaml
|
|
92
|
-
plugins:
|
|
93
|
-
# …other plugins
|
|
94
|
-
|
|
95
|
-
ckan:
|
|
96
|
-
module: "@zazuko/trifid-plugin-ckan"
|
|
97
|
-
paths: /ckan
|
|
98
|
-
config:
|
|
99
|
-
endpointUrl: https://some-custom-endpoint/
|
|
100
|
-
# user: root
|
|
101
|
-
# password: super-secret
|
|
102
|
-
```
|
|
103
|
-
|
|
104
|
-
and update the `config` fields with correct informations.
|
|
105
|
-
|
|
106
|
-
Do not forget to add it to your Node dependencies:
|
|
107
|
-
|
|
108
|
-
```sh
|
|
109
|
-
npm install @zazuko/trifid-plugin-ckan
|
|
110
|
-
```
|
|
111
|
-
|
|
112
|
-
With this configuration, the service will be exposed at `/ckan` and will require the `organization` query parameter, like this: `/ckan?organization=…`.
|
|
113
|
-
|
|
114
|
-
This will trigger the download of a XML file.
|
|
1
|
+
# CKAN harvester endpoint
|
|
2
|
+
|
|
3
|
+
This is a small HTTP endpoint that gathers datasets that are publishable to
|
|
4
|
+
opendata.swiss, transforms them to be compatible with the CKAN harvester and
|
|
5
|
+
outputs a format (a rigid form of RDF/XML) that the harvester can read.
|
|
6
|
+
|
|
7
|
+
The format expected by the harvester is described on
|
|
8
|
+
https://handbook.opendata.swiss/fr/content/glossar/bibliothek/dcat-ap-ch.html (fr/de).
|
|
9
|
+
|
|
10
|
+
In order to be considered as a "publishable" dataset by this endpoint, a
|
|
11
|
+
dataset must follow the following conditions:
|
|
12
|
+
|
|
13
|
+
- it **has** the `dcat:Dataset` type
|
|
14
|
+
- it **has one and only one** `dcterms:identifier`
|
|
15
|
+
- it **has** a `dcterms:creator`
|
|
16
|
+
- it **has** `schema:workExample` with value `<https://ld.admin.ch/application/opendataswiss>`
|
|
17
|
+
- it **has** `schema:creativeWorkStatus` with value `<https://ld.admin.ch/vocabulary/CreativeWorkStatus/Published>`
|
|
18
|
+
- it **does not have** `schema:validThrough`
|
|
19
|
+
- it **does not have** `schema:expires`
|
|
20
|
+
|
|
21
|
+
The endpoint copies the following properties with their original value.
|
|
22
|
+
Make sure they follow [the CKAN spec](https://handbook.opendata.swiss/fr/content/glossar/bibliothek/dcat-ap-ch.html).
|
|
23
|
+
|
|
24
|
+
- `dcterms:title`
|
|
25
|
+
- `dcterms:description`
|
|
26
|
+
- `dcterms:issued`
|
|
27
|
+
- `dcterms:modified`
|
|
28
|
+
- `dcat:contactPoint`
|
|
29
|
+
- `dcat:theme`
|
|
30
|
+
- `dcterms:language`
|
|
31
|
+
- `dcat:landingPage`
|
|
32
|
+
- `dcterms:spatial`
|
|
33
|
+
- `dcterms:coverage`
|
|
34
|
+
- `dcterms:temporal`
|
|
35
|
+
- `dcterms:keyword` (literals without a language are ignored)
|
|
36
|
+
|
|
37
|
+
The following properties are populated by the endpoint:
|
|
38
|
+
|
|
39
|
+
- `dcterms:identifier`
|
|
40
|
+
|
|
41
|
+
If the original `dcterms:identifer` already contains an "@", it is copied
|
|
42
|
+
as-is. Otherwise, an identifier is created with the shape
|
|
43
|
+
`<dcterms:identifier value>@<creator slug>`, where "creator slug" is
|
|
44
|
+
the last segment of the URI of the value of `dcterms:creator`.
|
|
45
|
+
|
|
46
|
+
- `dcterms:publisher`
|
|
47
|
+
|
|
48
|
+
The original `dcterms:publisher` value is used as `rdfs:label` of the
|
|
49
|
+
final `dcterms:publisher`.
|
|
50
|
+
|
|
51
|
+
- `dcterms:relation`
|
|
52
|
+
|
|
53
|
+
Populated from `dcterms:license`.
|
|
54
|
+
|
|
55
|
+
TODO: define valid values for license
|
|
56
|
+
|
|
57
|
+
- `dcterms:accrualPeriodicity`
|
|
58
|
+
|
|
59
|
+
Supports both DC (http://purl.org/cld/freq/) and EU
|
|
60
|
+
(http://publications.europa.eu/ontology/euvoc#Frequency) frequencies.
|
|
61
|
+
DC frequencies are transformed into EU ones.
|
|
62
|
+
|
|
63
|
+
- `dcat:distribution`
|
|
64
|
+
|
|
65
|
+
Populated from `schema:workExample`.
|
|
66
|
+
Only takes work examples with a `schema:encodingFormat` into account.
|
|
67
|
+
Each distribution is built in the following way, from the properties of the work example:
|
|
68
|
+
|
|
69
|
+
- `dcterms:issued` is copied as-is
|
|
70
|
+
- `dcat:mediaType` populated from `schema:encodingFormat`
|
|
71
|
+
- `dcat:accessURL` populated from `schema:url`
|
|
72
|
+
- `dcterms:title` populated from `schema:name`
|
|
73
|
+
- `dcterms:rights` populated from `schema:identifier` of **the dataset's** `dcterms:rights`
|
|
74
|
+
- `dcterms:format` populated from `schema:encodingFormat`, with the following mapping:
|
|
75
|
+
- `text/html` -> `HTML`
|
|
76
|
+
- `application/sparql-query` -> `SERVICE`
|
|
77
|
+
- other -> `UNKNOWN`
|
|
78
|
+
|
|
79
|
+
## Usage
|
|
80
|
+
|
|
81
|
+
This should be used as a Trifid plugin.
|
|
82
|
+
|
|
83
|
+
The following options are supported:
|
|
84
|
+
|
|
85
|
+
- `endpointUrl`: URL to the SPARQL endpoint
|
|
86
|
+
- `user`: User to connect to the SPARQL endpoint
|
|
87
|
+
- `password`: Password to connect to the SPARQL endpoint
|
|
88
|
+
|
|
89
|
+
Configuring Trifid to use `@zazuko/trifid-plugin-ckan` is easy, just add the following in your configuration file:
|
|
90
|
+
|
|
91
|
+
```yaml
|
|
92
|
+
plugins:
|
|
93
|
+
# …other plugins
|
|
94
|
+
|
|
95
|
+
ckan:
|
|
96
|
+
module: "@zazuko/trifid-plugin-ckan"
|
|
97
|
+
paths: /ckan
|
|
98
|
+
config:
|
|
99
|
+
endpointUrl: https://some-custom-endpoint/
|
|
100
|
+
# user: root
|
|
101
|
+
# password: super-secret
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
and update the `config` fields with correct informations.
|
|
105
|
+
|
|
106
|
+
Do not forget to add it to your Node dependencies:
|
|
107
|
+
|
|
108
|
+
```sh
|
|
109
|
+
npm install @zazuko/trifid-plugin-ckan
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
With this configuration, the service will be exposed at `/ckan` and will require the `organization` query parameter, like this: `/ckan?organization=…`.
|
|
113
|
+
|
|
114
|
+
This will trigger the download of a XML file.
|