mustrd 0.1.8__tar.gz → 0.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. {mustrd-0.1.8 → mustrd-0.2.1}/LICENSE +21 -21
  2. {mustrd-0.1.8 → mustrd-0.2.1}/PKG-INFO +4 -2
  3. {mustrd-0.1.8 → mustrd-0.2.1}/README.adoc +58 -58
  4. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/README.adoc +210 -201
  5. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/TestResult.py +136 -136
  6. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/logger_setup.py +48 -48
  7. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/model/catalog-v001.xml +5 -5
  8. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/model/mustrdShapes.ttl +253 -253
  9. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/model/mustrdTestShapes.ttl +24 -24
  10. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/model/ontology.ttl +494 -494
  11. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/model/test-resources/resources.ttl +60 -60
  12. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/model/triplestoreOntology.ttl +174 -174
  13. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/model/triplestoreshapes.ttl +41 -41
  14. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/mustrd.py +787 -788
  15. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/mustrdAnzo.py +236 -236
  16. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/mustrdGraphDb.py +125 -125
  17. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/mustrdRdfLib.py +56 -56
  18. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/mustrdTestPlugin.py +327 -328
  19. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/namespace.py +125 -125
  20. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/run.py +106 -106
  21. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/spec_component.py +690 -682
  22. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/steprunner.py +166 -166
  23. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/templates/md_ResultList_leaf_template.jinja +18 -18
  24. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/templates/md_ResultList_template.jinja +8 -8
  25. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/templates/md_stats_template.jinja +2 -2
  26. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/test/test_mustrd.py +4 -4
  27. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/utils.py +38 -38
  28. {mustrd-0.1.8 → mustrd-0.2.1}/pyproject.toml +55 -54
  29. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/__init__.py +0 -0
  30. {mustrd-0.1.8 → mustrd-0.2.1}/mustrd/model/mustrdTestOntology.ttl +0 -0
@@ -1,21 +1,21 @@
1
- MIT License
2
-
3
- Copyright (c) 2023 Semantic Partners Ltd
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
1
+ MIT License
2
+
3
+ Copyright (c) 2023 Semantic Partners Ltd
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -1,17 +1,18 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: mustrd
3
- Version: 0.1.8
3
+ Version: 0.2.1
4
4
  Summary: A Spec By Example framework for RDF and SPARQL, Inspired by Cucumber.
5
5
  Home-page: https://github.com/Semantic-partners/mustrd
6
6
  License: MIT
7
7
  Author: John Placek
8
8
  Author-email: john.placek@semanticpartners.com
9
- Requires-Python: ==3.11.7
9
+ Requires-Python: >=3.11.7,<4.0.0
10
10
  Classifier: Framework :: Pytest
11
11
  Classifier: License :: OSI Approved :: MIT License
12
12
  Classifier: Natural Language :: English
13
13
  Classifier: Programming Language :: Python
14
14
  Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.12
15
16
  Classifier: Topic :: Software Development :: Quality Assurance
16
17
  Classifier: Topic :: Software Development :: Testing
17
18
  Classifier: Topic :: Utilities
@@ -22,6 +23,7 @@ Requires-Dist: colorlog (>=6.7.0,<7.0.0)
22
23
  Requires-Dist: coverage (==7.4.3)
23
24
  Requires-Dist: flake8 (==7.0.0)
24
25
  Requires-Dist: multimethods-py (>=0.5.3,<0.6.0)
26
+ Requires-Dist: numpy (>=1.26.0,<2.0.0)
25
27
  Requires-Dist: openpyxl (>=3.1.2,<4.0.0)
26
28
  Requires-Dist: pandas (>=1.5.2,<2.0.0)
27
29
  Requires-Dist: pyanzo (>=3.3.10,<4.0.0)
@@ -1,58 +1,58 @@
1
- == Mustrd
2
-
3
- // tag::body[]
4
-
5
- image::https://github.com/Semantic-partners/mustrd/raw/python-coverage-comment-action-data/badge.svg[Coverage badge,link="https://github.com/Semantic-partners/mustrd/tree/python-coverage-comment-action-data"]
6
-
7
- === Why?
8
-
9
- How do you know your SPARQL, whether it's in a pipeline, or a query, is doing what you intend?
10
-
11
- As much as we love RDF and SPARQL and Semantic Tech in general, we found a small gap in tooling which would give us that certainty.
12
-
13
- We missed the powerful testing frameworks that have evolved in imperative languages that help ensure you've written code that does what you think it should.
14
-
15
- We wanted to be able to:
16
-
17
- * setup data scenarios and ensure queries worked as expected
18
- * setup edge cases for queries and ensure they still work
19
- * isolate small sparql enrichment / transformation steps and to know we're only INSERTing what we intend
20
-
21
- Enter MustRD.
22
-
23
- === What?
24
-
25
- MustRD is a Spec-By-Example ontology, with a reference python implementation, inspired by the likes of Cucumber.
26
-
27
- It's designed to be triplestore/SPARQL engine agnostic (aren't open standards *wonderful*!).
28
-
29
- === What it is NOT
30
- MustRD is nothing to do with SHACL, or an alternative to it. In fact, we use SHACL for some of our features.
31
-
32
- SHACL provides validation around data.
33
-
34
- MustRD provides validation around data transformations.
35
-
36
- === How?
37
- You define your specs in ttl, or trig files.
38
- We use the SBE approach of *Given*, *When*, *Then* to define starting dataset, an action, and a set of expectations. We build up a set of data.
39
- Then, depending on whether your SPARQL is a CONSTRUCT, SELECT or a INSERT/DELETE, we run it, and compare results against a set of expectations (*Then*) that are defined in the same way as a *Given* .
40
- Alternatively, you could define your *Then*
41
-
42
- * as an explicit ASK, or
43
- * select; or
44
- * in a higher-order expectation language like you will be used to in various platforms, a set of expectations.
45
-
46
-
47
- === When?
48
-
49
- Soon. It's a work in progress, and we're building the things *we* need for the projects we work on at multiple clients, with multiple vendor stacks.
50
- We already think it's useful, but it might not meet *your* needs, out of the box.
51
-
52
- We invite you to try it, see where it doesn't fit, and raise an issue, or even better, a PR! If you need something custom, please check out our consultancy rates, and we might be able to prioritise a new feature for you.
53
-
54
- == Support
55
- We're a specialist consultancy in Semantic Tech, we're putting this out in case it's useful, but if you need more support, kindly contact our business team on info@semanticpartners.com
56
-
57
- // tag::body[]
58
- include::src/README.adoc[tags=body]
1
+ == Mustrd
2
+
3
+ // tag::body[]
4
+
5
+ image::https://github.com/Semantic-partners/mustrd/raw/python-coverage-comment-action-data/badge.svg[Coverage badge,link="https://github.com/Semantic-partners/mustrd/tree/python-coverage-comment-action-data"]
6
+
7
+ === Why?
8
+
9
+ How do you know your SPARQL, whether it's in a pipeline, or a query, is doing what you intend?
10
+
11
+ As much as we love RDF and SPARQL and Semantic Tech in general, we found a small gap in tooling which would give us that certainty.
12
+
13
+ We missed the powerful testing frameworks that have evolved in imperative languages that help ensure you've written code that does what you think it should.
14
+
15
+ We wanted to be able to:
16
+
17
+ * setup data scenarios and ensure queries worked as expected
18
+ * setup edge cases for queries and ensure they still work
19
+ * isolate small sparql enrichment / transformation steps and to know we're only INSERTing what we intend
20
+
21
+ Enter MustRD.
22
+
23
+ === What?
24
+
25
+ MustRD is a Spec-By-Example ontology, with a reference python implementation, inspired by the likes of Cucumber.
26
+
27
+ It's designed to be triplestore/SPARQL engine agnostic (aren't open standards *wonderful*!).
28
+
29
+ === What it is NOT
30
+ MustRD is nothing to do with SHACL, or an alternative to it. In fact, we use SHACL for some of our features.
31
+
32
+ SHACL provides validation around data.
33
+
34
+ MustRD provides validation around data transformations.
35
+
36
+ === How?
37
+ You define your specs in ttl, or trig files.
38
+ We use the SBE approach of *Given*, *When*, *Then* to define starting dataset, an action, and a set of expectations. We build up a set of data.
39
+ Then, depending on whether your SPARQL is a CONSTRUCT, SELECT or a INSERT/DELETE, we run it, and compare results against a set of expectations (*Then*) that are defined in the same way as a *Given* .
40
+ Alternatively, you could define your *Then*
41
+
42
+ * as an explicit ASK, or
43
+ * select; or
44
+ * in a higher-order expectation language like you will be used to in various platforms, a set of expectations.
45
+
46
+
47
+ === When?
48
+
49
+ Soon. It's a work in progress, and we're building the things *we* need for the projects we work on at multiple clients, with multiple vendor stacks.
50
+ We already think it's useful, but it might not meet *your* needs, out of the box.
51
+
52
+ We invite you to try it, see where it doesn't fit, and raise an issue, or even better, a PR! If you need something custom, please check out our consultancy rates, and we might be able to prioritise a new feature for you.
53
+
54
+ == Support
55
+ We're a specialist consultancy in Semantic Tech, we're putting this out in case it's useful, but if you need more support, kindly contact our business team on info@semanticpartners.com
56
+
57
+ // tag::body[]
58
+ include::src/README.adoc[tags=body]
@@ -1,201 +1,210 @@
1
- = Developer helper
2
- // tag::body[]
3
-
4
- == Try it out
5
-
6
- Ensure you have python3 installed, before you begin.
7
- To install the necessary dependencies, run the following command from the project root.
8
-
9
- `pip3 install -r requirements.txt`
10
-
11
- Run the following command to execute the accompanying tests specifications.
12
-
13
- `python3 src/run.py -v -p "test/test-specs" -g "test/data" -w "test/data" -t "test/data"`
14
-
15
- You will see some warnings. Do not worry, some tests specifications are invalid and intentionally skipped.
16
-
17
- For a brief explanation of the meaning of these options use the help option.
18
-
19
- `python3 src/run.py --help`
20
-
21
- == Run the tests
22
-
23
- Run `pytest` from the project root.
24
-
25
- == Creating your own Test Specifications
26
-
27
- If you have got this far then you are probably ready to create your own specifications to test your application SPARQL queries. These will be executed against the default RDFLib triplestore unless you configure one or more alternatives. The instructions for this are included in <<Configuring external triplestores>> below.
28
-
29
- === Givens
30
- These are used to specify the dataset against which the SPARQL statement will be run.
31
- They can be generated from external sources such as an existing graph, or a file or folder containing serialised RDF. It is also possible to specify the dataset as reified RDF directly in the test step. Currently tabular data sources such as csv files or TableDatasets are not supported.
32
- Multiple given statements can be supplied and data is combined into a single dataset for the test.
33
-
34
- * *InheritedDataset* - This is where no data is specified but the existing data in the target graph is retained rather than being replaced with a defined set. This can be used to chain tests together or to perform checks on application data.
35
- ----
36
- must:given [ a must:InheritedDataset ] ;
37
- ----
38
- * *FileDataset* - The dataset is a local file containing serialised RDF. The formats supported are the same as those for the RDFLib Graph().parse function i.e. Turtle (.ttl), NTriples (.nt), N3 (.n3), RDF/XML (.xml) and TriX. The data is used to replace any existing content in the target graph for the test.
39
- ----
40
- must:given [ a must:FileDataset ;
41
- must:file "test/data/given.ttl" . ] ;
42
- ----
43
- * *FolderDataset* - Very similar to the file dataset except that the location of the file is passed to the test specification as an argument from the caller. i.e. the -g option on the command line.
44
- ----
45
- must:given [ a must:FolderDataset ;
46
- must:fileName "given.ttl" ] ;
47
- ----
48
- * *StatementsDataset* - The dataset is defined within the test in the form of reified RDF statements. e.g.
49
- ----
50
- must:given [ a must:StatementsDataset ;
51
- must:hasStatement [ a rdf:Statement ;
52
- rdf:subject test-data:sub ;
53
- rdf:predicate test-data:pred ;
54
- rdf:object test-data:obj ; ] ; ] ;
55
- ----
56
- * *AnzoGraphmartDataset* - The dataset is contained in an Anzo graphmart and needs to be retrieved from there. The Anzo instance containing the dataset needs to be indicated in the configuration file as documented in <<Configuring external triplestores>>.
57
- ----
58
- must:given [ a must:AnzoGraphmartDataset ;
59
- must:graphmart "http://cambridgesemantics.com/Graphmart/43445aeadf674e09818c81cf7049e46a";
60
- must:layer "http://cambridgesemantics.com/Layer/33b97531d7e148748b75e4e3c6bbf164";
61
- ] .
62
- ----
63
- === Whens
64
- These are the actual SPARQL queries that you wish to test. Queries can be supplied as a string directly in the test or as a file containing the query. Only single When statements are currently supported.
65
- Mustrd does not derive the query type from the actual query, so it is necessary to provide this in the specification. Supported query types are SelectSparql, ConstructSparql and UpdateSparql.
66
-
67
- * *TextSparqlSource* - The SPARQL query is included in the test as a (multiline) string value for the property queryText.
68
- e.g.
69
- ----
70
- must:when [ a must:TextSparqlSource ;
71
- must:queryText "SELECT ?s ?p ?o WHERE { ?s ?p ?o }" ;
72
- must:queryType must:SelectSparql ] ;
73
- ----
74
-
75
- * *FileSparqlSource* - The SPARQL query is contained in a local file.
76
- e.g.
77
- ----
78
- must:when [ a must:FileSparqlSource ;
79
- must:file "test/data/construct.rq" ;
80
- must:queryType must:ConstructSparql ; ] ;
81
- ----
82
- * *FolderSparqlSource* - Similar to the file SPARQL source except that the location of the file is passed to the test specification as an argument from the caller. i.e. the -w option on the command line.
83
- ----
84
- must:when [ a must:FolderSparqlSource ;
85
- must:fileName "construct.rq" ;
86
- must:queryType must:ConstructSparql ; ] ;
87
- ----
88
- * *AnzoQueryBuilderDataset* - The query is saved in the Query Builder of an Anzo instance and needs to be retrieved from there. The Anzo instance containing the dataset needs to be indicated in the configuration file as documented in <<Configuring external triplestores>>.
89
- ----
90
- must:when [ a must:AnzoQueryBuilderDataset ;
91
- must:queryFolder "Mustrd";
92
- must:queryName "mustrd-construct" ;
93
- must:queryType must:ConstructSparql
94
- ];
95
- ----
96
- === Thens
97
- Then clauses are used to specify the expected result dataset for the test. These datasets can be specified in the same way as <<Givens>> except that an extended set of dataset types is supported. For the tabular results of SELECT queries TabularDatasets are required and again can be in file format such as CSV, or an inline table within the specification.
98
- * *FileDataset* - The dataset is a local file containing serialised RDF or tabular data. The formats supported are the same as those for the RDFLib Graph().parse function i.e. Turtle (.ttl), NTriples (.nt), N3 (.n3), RDF/XML (.xml) and TriX, as well as tabular formats (.csv, .xls, .xlsx).
99
- ----
100
- must:then [ a must:FileDataset ;
101
- must:file "test/data/thenSuccess.xlsx" ] .
102
- ----
103
- ----
104
- must:then [ a must:FileDataset ;
105
- must:file "test/data/thenSuccess.nt" ] .
106
- ----
107
- * *FolderDataset* - Very similar to the file dataset except that the location of the file is passed to the test specification as an argument from the caller. i.e. the -t option on the command line.
108
- ----
109
- must:then [ a must:FolderDataset ;
110
- must:fileName "then.ttl" ] ;
111
- ----
112
- * *StatementsDataset* - The dataset is defined within the test in the form of reified RDF statements e.g.
113
- ----
114
- must:then [ a must:StatementsDataset ;
115
- must:hasStatement [ a rdf:Statement ;
116
- rdf:subject test-data:sub ;
117
- rdf:predicate test-data:pred ;
118
- rdf:object test-data:obj ; ] ; ] ;
119
- ----
120
- * *TableDataset* - The contents of the table defined in RDF syntax within the specification.
121
- E.g. a table dataset consisting of a single row and three columns.
122
- ----
123
- must:then [ a must:TableDataset ;
124
- must:hasRow [ must:hasBinding[
125
- must:variable "s" ;
126
- must:boundValue test-data:sub ; ],
127
- [ must:variable "p" ;
128
- must:boundValue test-data:pred ; ],
129
- [ must:variable "o" ;
130
- must:boundValue test-data:obj ; ] ;
131
- ] ; ] .
132
- ----
133
- * *OrderedTableDataset* - This is an extension of the TableDataset which allows the row order of the dataset to be specified using the SHACL order property to support the ORDER BY clause in SPARQL SELECT queries
134
- E.g. A table dataset consisting of two ordered rows and three columns.
135
- ----
136
- must:then [ a must:OrderedTableDataset ;
137
- must:hasRow [ sh:order 1 ;
138
- must:hasBinding[ must:variable "s" ;
139
- must:boundValue test-data:sub1 ; ],
140
- [ must:variable "p" ;
141
- must:boundValue test-data:pred1 ; ],
142
- [ must:variable "o" ;
143
- must:boundValue test-data:obj1 ; ] ; ] ,
144
- [ sh:order 2 ;
145
- must:hasBinding[ must:variable "s" ;
146
- must:boundValue test-data:sub2 ; ],
147
- [ must:variable "p" ;
148
- must:boundValue test-data:pred2 ; ],
149
- [ must:variable "o" ;
150
- must:boundValue test-data:obj2 ; ] ; ] ;
151
- ] .
152
- ----
153
- * *EmptyTable* - This is used to indicate that we are expecting an empty result from a SPARQL SELECT query.
154
- ----
155
- must:then [ a must:EmptyTable ] .
156
- ----
157
- * *EmptyGraph* - Similar to EmptyTable but used to indicate that we are expecting an empty graph as a result from a SPARQL query.
158
- ----
159
- must:then [ a must:EmptyGraph ] .
160
- ----
161
- * *AnzoGraphmartDataset* - The dataset is contained in an Anzo graphmart and needs to be retrieved from there. The Anzo instance containing the dataset needs to be indicated in the configuration file as documented in <<Configuring external triplestores>>.
162
- ----
163
- must:then [ a must:AnzoGraphmartDataset ;
164
- must:graphmart "http://cambridgesemantics.com/Graphmart/43445aeadf674e09818c81cf7049e46a";
165
- must:layer "http://cambridgesemantics.com/Layer/33b97531d7e148748b75e4e3c6bbf164";
166
- ] .
167
- ----
168
- == Configuring external triplestores
169
- The configuration file for external triplestores can be located outside of the project root as it is specified as an argument to the mustard module or as the -c option on the commandline when running run.py.
170
-
171
- It is anticipated that the external triplestore is running as mustrd is not configured to start them.
172
-
173
- Currently, the supported external triplestores are GraphDB and Anzo.
174
-
175
- The configuration file should be serialised RDF. An example in Turtle format is included below for GraphDB. For Anzo the *must:repository* value is replaced with a *must:gqeURI*.
176
- ----
177
- @prefix must: <https://mustrd.com/model/> .
178
- must:GraphDbConfig1 a must:GraphDbConfig ;
179
- must:url "http://localhost";
180
- must:port "7200";
181
- must:inputGraph "http://localhost:7200/test-graph" ;
182
- must:repository "mustrd" .
183
- ----
184
- To avoid versioning secrets when you want to version triplestore configuration (for example in case you want to run mustrd in CI), you have to configure user/password in a different file.
185
- This file must be named as the triple store configuration file, but with "_secrets" just before the extension. For example triplestores.ttl -> triplestores_secrets.ttl
186
- Subjects in the two files must match, no need to redefine the type, for example:
187
- ----
188
- @prefix must: <https://mustrd.com/model/> .
189
- must:GraphDbConfig1 must:username 'test' ;
190
- must:password 'test' .
191
- ----
192
-
193
- == Additional Notes for Developers
194
- Mustrd remains very much under development. It is anticipated that additional functionality and triplestore support will be added over time. The project uses https://python-poetry.org/docs/[Poetry] to manage dependencies so it will be necessary to have this installed to contribute towards the project. The link contains instructions on how to install and use this.
195
- As the project is actually built from the requirements.txt file at the project root, it is necessary to export dependencies from poetry to this file before committing and pushing changes to the repository, using the following command.
196
-
197
- `poetry export -f requirements.txt --without-hashes > requirements.txt`
198
-
199
-
200
-
201
- // end::body[]
1
+ = Developer helper
2
+ // tag::body[]
3
+
4
+ == Try it out
5
+
6
+ Ensure you have python3 installed, before you begin.
7
+ To install the necessary dependencies, run the following command from the project root.
8
+
9
+ `pip3 install -r requirements.txt`
10
+
11
+ Run the following command to execute the accompanying tests specifications.
12
+
13
+ `python3 src/run.py -v -p "test/test-specs" -g "test/data" -w "test/data" -t "test/data"`
14
+
15
+ You will see some warnings. Do not worry, some tests specifications are invalid and intentionally skipped.
16
+
17
+ For a brief explanation of the meaning of these options use the help option.
18
+
19
+ `python3 src/run.py --help`
20
+
21
+ == Run the tests
22
+
23
+ Run `pytest` from the project root.
24
+
25
+ == Creating your own Test Specifications
26
+
27
+ If you have got this far then you are probably ready to create your own specifications to test your application SPARQL queries. These will be executed against the default RDFLib triplestore unless you configure one or more alternatives. The instructions for this are included in <<Configuring external triplestores>> below.
28
+
29
+ === Paths
30
+ All paths are consired relative. That way mustrd tests can be versionned and shared easily.
31
+ To get absolute path from relative path in a spec file, we prefix it with the first existing result in:
32
+ 1) Path where the spec is located
33
+ 2) spec_path defined in mustrd test configuration files or cmd line argument
34
+ 3) data_path defined in mustrd test configuration files or cmd line argument
35
+ 4) Mustrd folder: In case of default resources packaged with mustrd source (will be in venv when mustrd is called as library)
36
+ We intentionally use the same method to build paths in all spec components to avoid confusion.
37
+
38
+ === Givens
39
+ These are used to specify the dataset against which the SPARQL statement will be run.
40
+ They can be generated from external sources such as an existing graph, or a file or folder containing serialised RDF. It is also possible to specify the dataset as reified RDF directly in the test step. Currently tabular data sources such as csv files or TableDatasets are not supported.
41
+ Multiple given statements can be supplied and data is combined into a single dataset for the test.
42
+
43
+ * *InheritedDataset* - This is where no data is specified but the existing data in the target graph is retained rather than being replaced with a defined set. This can be used to chain tests together or to perform checks on application data.
44
+ ----
45
+ must:given [ a must:InheritedDataset ] ;
46
+ ----
47
+ * *FileDataset* - The dataset is a local file containing serialised RDF. The formats supported are the same as those for the RDFLib Graph().parse function i.e. Turtle (.ttl), NTriples (.nt), N3 (.n3), RDF/XML (.xml) and TriX. The data is used to replace any existing content in the target graph for the test.
48
+ ----
49
+ must:given [ a must:FileDataset ;
50
+ must:file "test/data/given.ttl" . ] ;
51
+ ----
52
+ * *FolderDataset* - Very similar to the file dataset except that the location of the file is passed to the test specification as an argument from the caller. i.e. the -g option on the command line.
53
+ ----
54
+ must:given [ a must:FolderDataset ;
55
+ must:fileName "given.ttl" ] ;
56
+ ----
57
+ * *StatementsDataset* - The dataset is defined within the test in the form of reified RDF statements. e.g.
58
+ ----
59
+ must:given [ a must:StatementsDataset ;
60
+ must:hasStatement [ a rdf:Statement ;
61
+ rdf:subject test-data:sub ;
62
+ rdf:predicate test-data:pred ;
63
+ rdf:object test-data:obj ; ] ; ] ;
64
+ ----
65
+ * *AnzoGraphmartDataset* - The dataset is contained in an Anzo graphmart and needs to be retrieved from there. The Anzo instance containing the dataset needs to be indicated in the configuration file as documented in <<Configuring external triplestores>>.
66
+ ----
67
+ must:given [ a must:AnzoGraphmartDataset ;
68
+ must:graphmart "http://cambridgesemantics.com/Graphmart/43445aeadf674e09818c81cf7049e46a";
69
+ must:layer "http://cambridgesemantics.com/Layer/33b97531d7e148748b75e4e3c6bbf164";
70
+ ] .
71
+ ----
72
+ === Whens
73
+ These are the actual SPARQL queries that you wish to test. Queries can be supplied as a string directly in the test or as a file containing the query. Only single When statements are currently supported.
74
+ Mustrd does not derive the query type from the actual query, so it is necessary to provide this in the specification. Supported query types are SelectSparql, ConstructSparql and UpdateSparql.
75
+
76
+ * *TextSparqlSource* - The SPARQL query is included in the test as a (multiline) string value for the property queryText.
77
+ e.g.
78
+ ----
79
+ must:when [ a must:TextSparqlSource ;
80
+ must:queryText "SELECT ?s ?p ?o WHERE { ?s ?p ?o }" ;
81
+ must:queryType must:SelectSparql ] ;
82
+ ----
83
+
84
+ * *FileSparqlSource* - The SPARQL query is contained in a local file.
85
+ e.g.
86
+ ----
87
+ must:when [ a must:FileSparqlSource ;
88
+ must:file "test/data/construct.rq" ;
89
+ must:queryType must:ConstructSparql ; ] ;
90
+ ----
91
+ * *FolderSparqlSource* - Similar to the file SPARQL source except that the location of the file is passed to the test specification as an argument from the caller. i.e. the -w option on the command line.
92
+ ----
93
+ must:when [ a must:FolderSparqlSource ;
94
+ must:fileName "construct.rq" ;
95
+ must:queryType must:ConstructSparql ; ] ;
96
+ ----
97
+ * *AnzoQueryBuilderDataset* - The query is saved in the Query Builder of an Anzo instance and needs to be retrieved from there. The Anzo instance containing the dataset needs to be indicated in the configuration file as documented in <<Configuring external triplestores>>.
98
+ ----
99
+ must:when [ a must:AnzoQueryBuilderDataset ;
100
+ must:queryFolder "Mustrd";
101
+ must:queryName "mustrd-construct" ;
102
+ must:queryType must:ConstructSparql
103
+ ];
104
+ ----
105
+ === Thens
106
+ Then clauses are used to specify the expected result dataset for the test. These datasets can be specified in the same way as <<Givens>> except that an extended set of dataset types is supported. For the tabular results of SELECT queries TabularDatasets are required and again can be in file format such as CSV, or an inline table within the specification.
107
+ * *FileDataset* - The dataset is a local file containing serialised RDF or tabular data. The formats supported are the same as those for the RDFLib Graph().parse function i.e. Turtle (.ttl), NTriples (.nt), N3 (.n3), RDF/XML (.xml) and TriX, as well as tabular formats (.csv, .xls, .xlsx).
108
+ ----
109
+ must:then [ a must:FileDataset ;
110
+ must:file "test/data/thenSuccess.xlsx" ] .
111
+ ----
112
+ ----
113
+ must:then [ a must:FileDataset ;
114
+ must:file "test/data/thenSuccess.nt" ] .
115
+ ----
116
+ * *FolderDataset* - Very similar to the file dataset except that the location of the file is passed to the test specification as an argument from the caller. i.e. the -t option on the command line.
117
+ ----
118
+ must:then [ a must:FolderDataset ;
119
+ must:fileName "then.ttl" ] ;
120
+ ----
121
+ * *StatementsDataset* - The dataset is defined within the test in the form of reified RDF statements e.g.
122
+ ----
123
+ must:then [ a must:StatementsDataset ;
124
+ must:hasStatement [ a rdf:Statement ;
125
+ rdf:subject test-data:sub ;
126
+ rdf:predicate test-data:pred ;
127
+ rdf:object test-data:obj ; ] ; ] ;
128
+ ----
129
+ * *TableDataset* - The contents of the table defined in RDF syntax within the specification.
130
+ E.g. a table dataset consisting of a single row and three columns.
131
+ ----
132
+ must:then [ a must:TableDataset ;
133
+ must:hasRow [ must:hasBinding[
134
+ must:variable "s" ;
135
+ must:boundValue test-data:sub ; ],
136
+ [ must:variable "p" ;
137
+ must:boundValue test-data:pred ; ],
138
+ [ must:variable "o" ;
139
+ must:boundValue test-data:obj ; ] ;
140
+ ] ; ] .
141
+ ----
142
+ * *OrderedTableDataset* - This is an extension of the TableDataset which allows the row order of the dataset to be specified using the SHACL order property to support the ORDER BY clause in SPARQL SELECT queries
143
+ E.g. A table dataset consisting of two ordered rows and three columns.
144
+ ----
145
+ must:then [ a must:OrderedTableDataset ;
146
+ must:hasRow [ sh:order 1 ;
147
+ must:hasBinding[ must:variable "s" ;
148
+ must:boundValue test-data:sub1 ; ],
149
+ [ must:variable "p" ;
150
+ must:boundValue test-data:pred1 ; ],
151
+ [ must:variable "o" ;
152
+ must:boundValue test-data:obj1 ; ] ; ] ,
153
+ [ sh:order 2 ;
154
+ must:hasBinding[ must:variable "s" ;
155
+ must:boundValue test-data:sub2 ; ],
156
+ [ must:variable "p" ;
157
+ must:boundValue test-data:pred2 ; ],
158
+ [ must:variable "o" ;
159
+ must:boundValue test-data:obj2 ; ] ; ] ;
160
+ ] .
161
+ ----
162
+ * *EmptyTable* - This is used to indicate that we are expecting an empty result from a SPARQL SELECT query.
163
+ ----
164
+ must:then [ a must:EmptyTable ] .
165
+ ----
166
+ * *EmptyGraph* - Similar to EmptyTable but used to indicate that we are expecting an empty graph as a result from a SPARQL query.
167
+ ----
168
+ must:then [ a must:EmptyGraph ] .
169
+ ----
170
+ * *AnzoGraphmartDataset* - The dataset is contained in an Anzo graphmart and needs to be retrieved from there. The Anzo instance containing the dataset needs to be indicated in the configuration file as documented in <<Configuring external triplestores>>.
171
+ ----
172
+ must:then [ a must:AnzoGraphmartDataset ;
173
+ must:graphmart "http://cambridgesemantics.com/Graphmart/43445aeadf674e09818c81cf7049e46a";
174
+ must:layer "http://cambridgesemantics.com/Layer/33b97531d7e148748b75e4e3c6bbf164";
175
+ ] .
176
+ ----
177
+ == Configuring external triplestores
178
+ The configuration file for external triplestores can be located outside of the project root as it is specified as an argument to the mustard module or as the -c option on the commandline when running run.py.
179
+
180
+ It is anticipated that the external triplestore is running as mustrd is not configured to start them.
181
+
182
+ Currently, the supported external triplestores are GraphDB and Anzo.
183
+
184
+ The configuration file should be serialised RDF. An example in Turtle format is included below for GraphDB. For Anzo the *must:repository* value is replaced with a *must:gqeURI*.
185
+ ----
186
+ @prefix must: <https://mustrd.com/model/> .
187
+ must:GraphDbConfig1 a must:GraphDbConfig ;
188
+ must:url "http://localhost";
189
+ must:port "7200";
190
+ must:inputGraph "http://localhost:7200/test-graph" ;
191
+ must:repository "mustrd" .
192
+ ----
193
+ To avoid versioning secrets when you want to version triplestore configuration (for example in case you want to run mustrd in CI), you have to configure user/password in a different file.
194
+ This file must be named as the triple store configuration file, but with "_secrets" just before the extension. For example triplestores.ttl -> triplestores_secrets.ttl
195
+ Subjects in the two files must match, no need to redefine the type, for example:
196
+ ----
197
+ @prefix must: <https://mustrd.com/model/> .
198
+ must:GraphDbConfig1 must:username 'test' ;
199
+ must:password 'test' .
200
+ ----
201
+
202
+ == Additional Notes for Developers
203
+ Mustrd remains very much under development. It is anticipated that additional functionality and triplestore support will be added over time. The project uses https://python-poetry.org/docs/[Poetry] to manage dependencies so it will be necessary to have this installed to contribute towards the project. The link contains instructions on how to install and use this.
204
+ As the project is actually built from the requirements.txt file at the project root, it is necessary to export dependencies from poetry to this file before committing and pushing changes to the repository, using the following command.
205
+
206
+ `poetry export -f requirements.txt --without-hashes > requirements.txt`
207
+
208
+
209
+
210
+ // end::body[]