datacontract-cli 0.10.23__py3-none-any.whl → 0.10.37__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- datacontract/__init__.py +13 -0
- datacontract/api.py +12 -5
- datacontract/catalog/catalog.py +5 -3
- datacontract/cli.py +116 -10
- datacontract/data_contract.py +143 -65
- datacontract/engines/data_contract_checks.py +366 -60
- datacontract/engines/data_contract_test.py +50 -4
- datacontract/engines/fastjsonschema/check_jsonschema.py +37 -19
- datacontract/engines/fastjsonschema/s3/s3_read_files.py +3 -2
- datacontract/engines/soda/check_soda_execute.py +22 -3
- datacontract/engines/soda/connections/athena.py +79 -0
- datacontract/engines/soda/connections/duckdb_connection.py +65 -6
- datacontract/engines/soda/connections/kafka.py +4 -2
- datacontract/export/avro_converter.py +20 -3
- datacontract/export/bigquery_converter.py +1 -1
- datacontract/export/dbt_converter.py +36 -7
- datacontract/export/dqx_converter.py +126 -0
- datacontract/export/duckdb_type_converter.py +57 -0
- datacontract/export/excel_exporter.py +923 -0
- datacontract/export/exporter.py +3 -0
- datacontract/export/exporter_factory.py +17 -1
- datacontract/export/great_expectations_converter.py +55 -5
- datacontract/export/{html_export.py → html_exporter.py} +31 -20
- datacontract/export/markdown_converter.py +134 -5
- datacontract/export/mermaid_exporter.py +110 -0
- datacontract/export/odcs_v3_exporter.py +187 -145
- datacontract/export/protobuf_converter.py +163 -69
- datacontract/export/rdf_converter.py +2 -2
- datacontract/export/sodacl_converter.py +9 -1
- datacontract/export/spark_converter.py +31 -4
- datacontract/export/sql_converter.py +6 -2
- datacontract/export/sql_type_converter.py +20 -8
- datacontract/imports/avro_importer.py +63 -12
- datacontract/imports/csv_importer.py +111 -57
- datacontract/imports/excel_importer.py +1111 -0
- datacontract/imports/importer.py +16 -3
- datacontract/imports/importer_factory.py +17 -0
- datacontract/imports/json_importer.py +325 -0
- datacontract/imports/odcs_importer.py +2 -2
- datacontract/imports/odcs_v3_importer.py +351 -151
- datacontract/imports/protobuf_importer.py +264 -0
- datacontract/imports/spark_importer.py +117 -13
- datacontract/imports/sql_importer.py +32 -16
- datacontract/imports/unity_importer.py +84 -38
- datacontract/init/init_template.py +1 -1
- datacontract/integration/datamesh_manager.py +16 -2
- datacontract/lint/resolve.py +112 -23
- datacontract/lint/schema.py +24 -15
- datacontract/model/data_contract_specification/__init__.py +1 -0
- datacontract/model/odcs.py +13 -0
- datacontract/model/run.py +3 -0
- datacontract/output/junit_test_results.py +3 -3
- datacontract/schemas/datacontract-1.1.0.init.yaml +1 -1
- datacontract/schemas/datacontract-1.2.0.init.yaml +91 -0
- datacontract/schemas/datacontract-1.2.0.schema.json +2029 -0
- datacontract/schemas/datacontract-1.2.1.init.yaml +91 -0
- datacontract/schemas/datacontract-1.2.1.schema.json +2058 -0
- datacontract/schemas/odcs-3.0.2.schema.json +2382 -0
- datacontract/templates/datacontract.html +54 -3
- datacontract/templates/datacontract_odcs.html +685 -0
- datacontract/templates/index.html +5 -2
- datacontract/templates/partials/server.html +2 -0
- datacontract/templates/style/output.css +319 -145
- {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.37.dist-info}/METADATA +656 -431
- datacontract_cli-0.10.37.dist-info/RECORD +119 -0
- {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.37.dist-info}/WHEEL +1 -1
- {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.37.dist-info/licenses}/LICENSE +1 -1
- datacontract/export/csv_type_converter.py +0 -36
- datacontract/lint/lint.py +0 -142
- datacontract/lint/linters/description_linter.py +0 -35
- datacontract/lint/linters/field_pattern_linter.py +0 -34
- datacontract/lint/linters/field_reference_linter.py +0 -48
- datacontract/lint/linters/notice_period_linter.py +0 -55
- datacontract/lint/linters/quality_schema_linter.py +0 -52
- datacontract/lint/linters/valid_constraints_linter.py +0 -100
- datacontract/model/data_contract_specification.py +0 -327
- datacontract_cli-0.10.23.dist-info/RECORD +0 -113
- /datacontract/{lint/linters → output}/__init__.py +0 -0
- {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.37.dist-info}/entry_points.txt +0 -0
- {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.37.dist-info}/top_level.txt +0 -0
|
@@ -1,62 +1,69 @@
|
|
|
1
|
-
Metadata-Version: 2.
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
2
|
Name: datacontract-cli
|
|
3
|
-
Version: 0.10.
|
|
3
|
+
Version: 0.10.37
|
|
4
4
|
Summary: The datacontract CLI is an open source command-line tool for working with Data Contracts. It uses data contract YAML files to lint the data contract, connect to data sources and execute schema and quality tests, detect breaking changes, and export to different formats. The tool is written in Python. It can be used as a standalone CLI tool, in a CI/CD pipeline, or directly as a Python library.
|
|
5
5
|
Author-email: Jochen Christ <jochen.christ@innoq.com>, Stefan Negele <stefan.negele@innoq.com>, Simon Harrer <simon.harrer@innoq.com>
|
|
6
|
+
License-Expression: MIT
|
|
6
7
|
Project-URL: Homepage, https://cli.datacontract.com
|
|
7
8
|
Project-URL: Issues, https://github.com/datacontract/datacontract-cli/issues
|
|
8
9
|
Classifier: Programming Language :: Python :: 3
|
|
9
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
10
10
|
Classifier: Operating System :: OS Independent
|
|
11
11
|
Requires-Python: >=3.10
|
|
12
12
|
Description-Content-Type: text/markdown
|
|
13
13
|
License-File: LICENSE
|
|
14
|
-
Requires-Dist: typer<0.
|
|
15
|
-
Requires-Dist: pydantic<2.
|
|
14
|
+
Requires-Dist: typer<0.20,>=0.15.1
|
|
15
|
+
Requires-Dist: pydantic<2.13.0,>=2.8.2
|
|
16
16
|
Requires-Dist: pyyaml~=6.0.1
|
|
17
17
|
Requires-Dist: requests<2.33,>=2.31
|
|
18
18
|
Requires-Dist: fastjsonschema<2.22.0,>=2.19.1
|
|
19
19
|
Requires-Dist: fastparquet<2025.0.0,>=2024.5.0
|
|
20
20
|
Requires-Dist: numpy<2.0.0,>=1.26.4
|
|
21
|
-
Requires-Dist: python-multipart
|
|
22
|
-
Requires-Dist: rich<
|
|
23
|
-
Requires-Dist: sqlglot<
|
|
21
|
+
Requires-Dist: python-multipart<1.0.0,>=0.0.20
|
|
22
|
+
Requires-Dist: rich<15.0,>=13.7
|
|
23
|
+
Requires-Dist: sqlglot<28.0.0,>=26.6.0
|
|
24
24
|
Requires-Dist: duckdb<2.0.0,>=1.0.0
|
|
25
|
-
Requires-Dist: soda-core-duckdb<3.
|
|
25
|
+
Requires-Dist: soda-core-duckdb<3.6.0,>=3.3.20
|
|
26
26
|
Requires-Dist: setuptools>=60
|
|
27
|
-
Requires-Dist: python-dotenv
|
|
28
|
-
Requires-Dist: boto3<
|
|
29
|
-
Requires-Dist: Jinja2
|
|
30
|
-
Requires-Dist: jinja_partials
|
|
27
|
+
Requires-Dist: python-dotenv<2.0.0,>=1.0.0
|
|
28
|
+
Requires-Dist: boto3<2.0.0,>=1.34.41
|
|
29
|
+
Requires-Dist: Jinja2<4.0.0,>=3.1.5
|
|
30
|
+
Requires-Dist: jinja_partials<1.0.0,>=0.2.1
|
|
31
|
+
Requires-Dist: datacontract-specification<2.0.0,>=1.2.3
|
|
32
|
+
Requires-Dist: open-data-contract-standard<4.0.0,>=3.0.5
|
|
31
33
|
Provides-Extra: avro
|
|
32
34
|
Requires-Dist: avro==1.12.0; extra == "avro"
|
|
33
35
|
Provides-Extra: bigquery
|
|
34
|
-
Requires-Dist: soda-core-bigquery<3.
|
|
36
|
+
Requires-Dist: soda-core-bigquery<3.6.0,>=3.3.20; extra == "bigquery"
|
|
35
37
|
Provides-Extra: csv
|
|
36
|
-
Requires-Dist: clevercsv>=0.8.2; extra == "csv"
|
|
37
38
|
Requires-Dist: pandas>=2.0.0; extra == "csv"
|
|
39
|
+
Provides-Extra: excel
|
|
40
|
+
Requires-Dist: openpyxl<4.0.0,>=3.1.5; extra == "excel"
|
|
38
41
|
Provides-Extra: databricks
|
|
39
|
-
Requires-Dist: soda-core-spark-df<3.
|
|
40
|
-
Requires-Dist: soda-core-spark[databricks]<3.
|
|
41
|
-
Requires-Dist: databricks-sql-connector<
|
|
42
|
-
Requires-Dist: databricks-sdk<0.
|
|
42
|
+
Requires-Dist: soda-core-spark-df<3.6.0,>=3.3.20; extra == "databricks"
|
|
43
|
+
Requires-Dist: soda-core-spark[databricks]<3.6.0,>=3.3.20; extra == "databricks"
|
|
44
|
+
Requires-Dist: databricks-sql-connector<4.2.0,>=3.7.0; extra == "databricks"
|
|
45
|
+
Requires-Dist: databricks-sdk<0.68.0; extra == "databricks"
|
|
46
|
+
Requires-Dist: pyspark<4.0.0,>=3.5.5; extra == "databricks"
|
|
43
47
|
Provides-Extra: iceberg
|
|
44
|
-
Requires-Dist: pyiceberg==0.
|
|
48
|
+
Requires-Dist: pyiceberg==0.9.1; extra == "iceberg"
|
|
45
49
|
Provides-Extra: kafka
|
|
46
50
|
Requires-Dist: datacontract-cli[avro]; extra == "kafka"
|
|
47
|
-
Requires-Dist: soda-core-spark-df<3.
|
|
51
|
+
Requires-Dist: soda-core-spark-df<3.6.0,>=3.3.20; extra == "kafka"
|
|
52
|
+
Requires-Dist: pyspark<4.0.0,>=3.5.5; extra == "kafka"
|
|
48
53
|
Provides-Extra: postgres
|
|
49
|
-
Requires-Dist: soda-core-postgres<3.
|
|
54
|
+
Requires-Dist: soda-core-postgres<3.6.0,>=3.3.20; extra == "postgres"
|
|
50
55
|
Provides-Extra: s3
|
|
51
|
-
Requires-Dist: s3fs
|
|
52
|
-
Requires-Dist: aiobotocore<2.
|
|
56
|
+
Requires-Dist: s3fs<2026.0.0,>=2025.2.0; extra == "s3"
|
|
57
|
+
Requires-Dist: aiobotocore<2.26.0,>=2.17.0; extra == "s3"
|
|
53
58
|
Provides-Extra: snowflake
|
|
54
|
-
Requires-Dist: snowflake-connector-python[pandas]<
|
|
55
|
-
Requires-Dist: soda-core-snowflake<3.
|
|
59
|
+
Requires-Dist: snowflake-connector-python[pandas]<4.1,>=3.6; extra == "snowflake"
|
|
60
|
+
Requires-Dist: soda-core-snowflake<3.6.0,>=3.3.20; extra == "snowflake"
|
|
56
61
|
Provides-Extra: sqlserver
|
|
57
|
-
Requires-Dist: soda-core-sqlserver<3.
|
|
62
|
+
Requires-Dist: soda-core-sqlserver<3.6.0,>=3.3.20; extra == "sqlserver"
|
|
63
|
+
Provides-Extra: athena
|
|
64
|
+
Requires-Dist: soda-core-athena<3.6.0,>=3.3.20; extra == "athena"
|
|
58
65
|
Provides-Extra: trino
|
|
59
|
-
Requires-Dist: soda-core-trino<3.
|
|
66
|
+
Requires-Dist: soda-core-trino<3.6.0,>=3.3.20; extra == "trino"
|
|
60
67
|
Provides-Extra: dbt
|
|
61
68
|
Requires-Dist: dbt-core>=1.8.0; extra == "dbt"
|
|
62
69
|
Provides-Extra: dbml
|
|
@@ -66,23 +73,26 @@ Requires-Dist: pyarrow>=18.1.0; extra == "parquet"
|
|
|
66
73
|
Provides-Extra: rdf
|
|
67
74
|
Requires-Dist: rdflib==7.0.0; extra == "rdf"
|
|
68
75
|
Provides-Extra: api
|
|
69
|
-
Requires-Dist: fastapi==0.
|
|
70
|
-
Requires-Dist: uvicorn==0.
|
|
76
|
+
Requires-Dist: fastapi==0.116.1; extra == "api"
|
|
77
|
+
Requires-Dist: uvicorn==0.38.0; extra == "api"
|
|
78
|
+
Provides-Extra: protobuf
|
|
79
|
+
Requires-Dist: grpcio-tools>=1.53; extra == "protobuf"
|
|
71
80
|
Provides-Extra: all
|
|
72
|
-
Requires-Dist: datacontract-cli[api,bigquery,csv,databricks,dbml,dbt,iceberg,kafka,parquet,postgres,rdf,s3,snowflake,sqlserver,trino]; extra == "all"
|
|
81
|
+
Requires-Dist: datacontract-cli[api,athena,bigquery,csv,databricks,dbml,dbt,excel,iceberg,kafka,parquet,postgres,protobuf,rdf,s3,snowflake,sqlserver,trino]; extra == "all"
|
|
73
82
|
Provides-Extra: dev
|
|
74
83
|
Requires-Dist: datacontract-cli[all]; extra == "dev"
|
|
75
84
|
Requires-Dist: httpx==0.28.1; extra == "dev"
|
|
76
85
|
Requires-Dist: kafka-python; extra == "dev"
|
|
77
|
-
Requires-Dist: moto==5.
|
|
86
|
+
Requires-Dist: moto==5.1.13; extra == "dev"
|
|
78
87
|
Requires-Dist: pandas>=2.1.0; extra == "dev"
|
|
79
|
-
Requires-Dist: pre-commit<4.
|
|
88
|
+
Requires-Dist: pre-commit<4.4.0,>=3.7.1; extra == "dev"
|
|
80
89
|
Requires-Dist: pytest; extra == "dev"
|
|
81
90
|
Requires-Dist: pytest-xdist; extra == "dev"
|
|
82
|
-
Requires-Dist: pymssql==2.3.
|
|
91
|
+
Requires-Dist: pymssql==2.3.8; extra == "dev"
|
|
83
92
|
Requires-Dist: ruff; extra == "dev"
|
|
84
|
-
Requires-Dist: testcontainers[kafka,minio,mssql,postgres]==4.
|
|
85
|
-
Requires-Dist: trino==0.
|
|
93
|
+
Requires-Dist: testcontainers[kafka,minio,mssql,postgres]==4.12.0; extra == "dev"
|
|
94
|
+
Requires-Dist: trino==0.336.0; extra == "dev"
|
|
95
|
+
Dynamic: license-file
|
|
86
96
|
|
|
87
97
|
# Data Contract CLI
|
|
88
98
|
|
|
@@ -109,9 +119,9 @@ We have a _servers_ section with endpoint details to the S3 bucket, _models_ for
|
|
|
109
119
|
|
|
110
120
|
This data contract contains all information to connect to S3 and check that the actual data meets the defined schema and quality requirements. We can use this information to test if the actual data product in S3 is compliant to the data contract.
|
|
111
121
|
|
|
112
|
-
Let's use [
|
|
122
|
+
Let's use [uv](https://docs.astral.sh/uv/) to install the CLI (or use the [Docker image](#docker)),
|
|
113
123
|
```bash
|
|
114
|
-
$
|
|
124
|
+
$ uv tool install --python python3.11 'datacontract-cli[all]'
|
|
115
125
|
```
|
|
116
126
|
|
|
117
127
|
|
|
@@ -206,9 +216,15 @@ $ datacontract export --format odcs datacontract.yaml --output odcs.yaml
|
|
|
206
216
|
# import ODCS to data contract
|
|
207
217
|
$ datacontract import --format odcs odcs.yaml --output datacontract.yaml
|
|
208
218
|
|
|
209
|
-
# import sql (other formats: avro, glue, bigquery, jsonschema ...)
|
|
219
|
+
# import sql (other formats: avro, glue, bigquery, jsonschema, excel ...)
|
|
210
220
|
$ datacontract import --format sql --source my-ddl.sql --dialect postgres --output datacontract.yaml
|
|
211
221
|
|
|
222
|
+
# import from Excel template
|
|
223
|
+
$ datacontract import --format excel --source odcs.xlsx --output datacontract.yaml
|
|
224
|
+
|
|
225
|
+
# export to Excel template
|
|
226
|
+
$ datacontract export --format excel --output odcs.xlsx datacontract.yaml
|
|
227
|
+
|
|
212
228
|
# find differences between two data contracts
|
|
213
229
|
$ datacontract diff datacontract-v1.yaml datacontract-v2.yaml
|
|
214
230
|
|
|
@@ -241,6 +257,14 @@ if not run.has_passed():
|
|
|
241
257
|
|
|
242
258
|
Choose the most appropriate installation method for your needs:
|
|
243
259
|
|
|
260
|
+
### uv
|
|
261
|
+
|
|
262
|
+
If you have [uv](https://docs.astral.sh/uv/) installed, you can run datacontract-cli directly without installing:
|
|
263
|
+
|
|
264
|
+
```
|
|
265
|
+
uv run --with 'datacontract-cli[all]' datacontract --version
|
|
266
|
+
```
|
|
267
|
+
|
|
244
268
|
### pip
|
|
245
269
|
Python 3.10, 3.11, and 3.12 are supported. We recommend to use Python 3.11.
|
|
246
270
|
|
|
@@ -302,6 +326,7 @@ A list of available extras:
|
|
|
302
326
|
|
|
303
327
|
| Dependency | Installation Command |
|
|
304
328
|
|-------------------------|--------------------------------------------|
|
|
329
|
+
| Amazon Athena | `pip install datacontract-cli[athena]` |
|
|
305
330
|
| Avro Support | `pip install datacontract-cli[avro]` |
|
|
306
331
|
| Google BigQuery | `pip install datacontract-cli[bigquery]` |
|
|
307
332
|
| Databricks Integration | `pip install datacontract-cli[databricks]` |
|
|
@@ -317,7 +342,7 @@ A list of available extras:
|
|
|
317
342
|
| Parquet | `pip install datacontract-cli[parquet]` |
|
|
318
343
|
| RDF | `pip install datacontract-cli[rdf]` |
|
|
319
344
|
| API (run as web server) | `pip install datacontract-cli[api]` |
|
|
320
|
-
|
|
345
|
+
| protobuf | `pip install datacontract-cli[protobuf]` |
|
|
321
346
|
|
|
322
347
|
|
|
323
348
|
## Documentation
|
|
@@ -338,47 +363,46 @@ Commands
|
|
|
338
363
|
|
|
339
364
|
### init
|
|
340
365
|
```
|
|
341
|
-
|
|
342
|
-
Usage: datacontract init [OPTIONS] [LOCATION]
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
|
|
346
|
-
╭─ Arguments
|
|
347
|
-
│ location [LOCATION] The location
|
|
348
|
-
│
|
|
349
|
-
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
│ --
|
|
353
|
-
│ [default:
|
|
354
|
-
│
|
|
355
|
-
|
|
356
|
-
│ datacontract.yaml │
|
|
357
|
-
│ [default: no-overwrite] │
|
|
358
|
-
│ --help Show this message and exit. │
|
|
359
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
366
|
+
|
|
367
|
+
Usage: datacontract init [OPTIONS] [LOCATION]
|
|
368
|
+
|
|
369
|
+
Create an empty data contract.
|
|
370
|
+
|
|
371
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
372
|
+
│ location [LOCATION] The location of the data contract file to create. │
|
|
373
|
+
│ [default: datacontract.yaml] │
|
|
374
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
375
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
376
|
+
│ --template TEXT URL of a template or data contract [default: None] │
|
|
377
|
+
│ --overwrite --no-overwrite Replace the existing datacontract.yaml │
|
|
378
|
+
│ [default: no-overwrite] │
|
|
379
|
+
│ --help Show this message and exit. │
|
|
380
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
360
381
|
|
|
361
382
|
```
|
|
362
383
|
|
|
363
384
|
### lint
|
|
364
385
|
```
|
|
365
|
-
|
|
366
|
-
Usage: datacontract lint [OPTIONS] [LOCATION]
|
|
367
|
-
|
|
368
|
-
Validate that the datacontract.yaml is correctly formatted.
|
|
369
|
-
|
|
370
|
-
╭─ Arguments
|
|
371
|
-
│ location [LOCATION] The location (url or path) of the data contract
|
|
372
|
-
│ yaml
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
│
|
|
377
|
-
│
|
|
378
|
-
│
|
|
379
|
-
│
|
|
380
|
-
│
|
|
381
|
-
|
|
386
|
+
|
|
387
|
+
Usage: datacontract lint [OPTIONS] [LOCATION]
|
|
388
|
+
|
|
389
|
+
Validate that the datacontract.yaml is correctly formatted.
|
|
390
|
+
|
|
391
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
392
|
+
│ location [LOCATION] The location (url or path) of the data contract yaml. │
|
|
393
|
+
│ [default: datacontract.yaml] │
|
|
394
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
395
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
396
|
+
│ --schema TEXT The location (url or path) of the Data Contract Specification │
|
|
397
|
+
│ JSON Schema │
|
|
398
|
+
│ [default: None] │
|
|
399
|
+
│ --output PATH Specify the file path where the test results should be written │
|
|
400
|
+
│ to (e.g., './test-results/TEST-datacontract.xml'). If no path is │
|
|
401
|
+
│ provided, the output will be printed to stdout. │
|
|
402
|
+
│ [default: None] │
|
|
403
|
+
│ --output-format [junit] The target format for the test results. [default: None] │
|
|
404
|
+
│ --help Show this message and exit. │
|
|
405
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
382
406
|
|
|
383
407
|
```
|
|
384
408
|
|
|
@@ -394,31 +418,40 @@ Commands
|
|
|
394
418
|
│ [default: datacontract.yaml] │
|
|
395
419
|
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
396
420
|
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
397
|
-
│ --schema
|
|
398
|
-
│
|
|
399
|
-
│
|
|
400
|
-
│
|
|
401
|
-
│
|
|
402
|
-
│
|
|
403
|
-
│
|
|
404
|
-
│
|
|
405
|
-
│
|
|
406
|
-
│
|
|
407
|
-
│
|
|
408
|
-
│
|
|
409
|
-
│
|
|
410
|
-
│
|
|
411
|
-
│
|
|
412
|
-
│
|
|
413
|
-
│
|
|
414
|
-
│
|
|
415
|
-
│
|
|
416
|
-
│ --
|
|
417
|
-
│
|
|
418
|
-
│
|
|
419
|
-
│
|
|
420
|
-
│
|
|
421
|
-
|
|
421
|
+
│ --schema TEXT The location (url or path) of │
|
|
422
|
+
│ the Data Contract Specification │
|
|
423
|
+
│ JSON Schema │
|
|
424
|
+
│ [default: None] │
|
|
425
|
+
│ --server TEXT The server configuration to run │
|
|
426
|
+
│ the schema and quality tests. │
|
|
427
|
+
│ Use the key of the server object │
|
|
428
|
+
│ in the data contract yaml file │
|
|
429
|
+
│ to refer to a server, e.g., │
|
|
430
|
+
│ `production`, or `all` for all │
|
|
431
|
+
│ servers (default). │
|
|
432
|
+
│ [default: all] │
|
|
433
|
+
│ --publish-test-results --no-publish-test-results Publish the results after the │
|
|
434
|
+
│ test │
|
|
435
|
+
│ [default: │
|
|
436
|
+
│ no-publish-test-results] │
|
|
437
|
+
│ --publish TEXT DEPRECATED. The url to publish │
|
|
438
|
+
│ the results after the test. │
|
|
439
|
+
│ [default: None] │
|
|
440
|
+
│ --output PATH Specify the file path where the │
|
|
441
|
+
│ test results should be written │
|
|
442
|
+
│ to (e.g., │
|
|
443
|
+
│ './test-results/TEST-datacontra… │
|
|
444
|
+
│ [default: None] │
|
|
445
|
+
│ --output-format [junit] The target format for the test │
|
|
446
|
+
│ results. │
|
|
447
|
+
│ [default: None] │
|
|
448
|
+
│ --logs --no-logs Print logs [default: no-logs] │
|
|
449
|
+
│ --ssl-verification --no-ssl-verification SSL verification when publishing │
|
|
450
|
+
│ the data contract. │
|
|
451
|
+
│ [default: ssl-verification] │
|
|
452
|
+
│ --help Show this message and exit. │
|
|
453
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
454
|
+
|
|
422
455
|
```
|
|
423
456
|
|
|
424
457
|
Data Contract CLI connects to a data source and runs schema and quality tests to verify that the data contract is valid.
|
|
@@ -438,6 +471,7 @@ Credentials are provided with environment variables.
|
|
|
438
471
|
Supported server types:
|
|
439
472
|
|
|
440
473
|
- [s3](#S3)
|
|
474
|
+
- [athena](#athena)
|
|
441
475
|
- [bigquery](#bigquery)
|
|
442
476
|
- [azure](#azure)
|
|
443
477
|
- [sqlserver](#sqlserver)
|
|
@@ -448,6 +482,7 @@ Supported server types:
|
|
|
448
482
|
- [kafka](#kafka)
|
|
449
483
|
- [postgres](#postgres)
|
|
450
484
|
- [trino](#trino)
|
|
485
|
+
- [api](#api)
|
|
451
486
|
- [local](#local)
|
|
452
487
|
|
|
453
488
|
Supported formats:
|
|
@@ -507,6 +542,41 @@ servers:
|
|
|
507
542
|
| `DATACONTRACT_S3_SESSION_TOKEN` | `AQoDYXdzEJr...` | AWS temporary session token (optional) |
|
|
508
543
|
|
|
509
544
|
|
|
545
|
+
#### Athena
|
|
546
|
+
|
|
547
|
+
Data Contract CLI can test data in AWS Athena stored in S3.
|
|
548
|
+
Supports different file formats, such as Iceberg, Parquet, JSON, CSV...
|
|
549
|
+
|
|
550
|
+
##### Example
|
|
551
|
+
|
|
552
|
+
datacontract.yaml
|
|
553
|
+
```yaml
|
|
554
|
+
servers:
|
|
555
|
+
athena:
|
|
556
|
+
type: athena
|
|
557
|
+
catalog: awsdatacatalog # awsdatacatalog is the default setting
|
|
558
|
+
schema: icebergdemodb # in Athena, this is called "database"
|
|
559
|
+
regionName: eu-central-1
|
|
560
|
+
stagingDir: s3://my-bucket/athena-results/
|
|
561
|
+
models:
|
|
562
|
+
my_table: # corresponds to a table of view name
|
|
563
|
+
type: table
|
|
564
|
+
fields:
|
|
565
|
+
my_column_1: # corresponds to a column
|
|
566
|
+
type: string
|
|
567
|
+
config:
|
|
568
|
+
physicalType: varchar
|
|
569
|
+
```
|
|
570
|
+
|
|
571
|
+
##### Environment Variables
|
|
572
|
+
|
|
573
|
+
| Environment Variable | Example | Description |
|
|
574
|
+
|-------------------------------------|---------------------------------|----------------------------------------|
|
|
575
|
+
| `DATACONTRACT_S3_REGION` | `eu-central-1` | Region of Athena service |
|
|
576
|
+
| `DATACONTRACT_S3_ACCESS_KEY_ID` | `AKIAXV5Q5QABCDEFGH` | AWS Access Key ID |
|
|
577
|
+
| `DATACONTRACT_S3_SECRET_ACCESS_KEY` | `93S7LRrJcqLaaaa/XXXXXXXXXXXXX` | AWS Secret Access Key |
|
|
578
|
+
| `DATACONTRACT_S3_SESSION_TOKEN` | `AQoDYXdzEJr...` | AWS temporary session token (optional) |
|
|
579
|
+
|
|
510
580
|
|
|
511
581
|
#### Google Cloud Storage (GCS)
|
|
512
582
|
|
|
@@ -682,19 +752,37 @@ models:
|
|
|
682
752
|
fields: ...
|
|
683
753
|
```
|
|
684
754
|
|
|
685
|
-
|
|
686
|
-
```python
|
|
687
|
-
%pip install datacontract-cli[databricks]
|
|
688
|
-
dbutils.library.restartPython()
|
|
755
|
+
##### Installing on Databricks Compute
|
|
689
756
|
|
|
690
|
-
|
|
757
|
+
**Important:** When using Databricks LTS ML runtimes (15.4, 16.4), installing via `%pip install` in notebooks can issues.
|
|
691
758
|
|
|
692
|
-
|
|
693
|
-
|
|
694
|
-
|
|
695
|
-
|
|
696
|
-
|
|
697
|
-
|
|
759
|
+
**Recommended approach:** Use Databricks' native library management instead:
|
|
760
|
+
|
|
761
|
+
1. **Create or configure your compute cluster:**
|
|
762
|
+
- Navigate to **Compute** in the Databricks workspace
|
|
763
|
+
- Create a new cluster or select an existing one
|
|
764
|
+
- Go to the **Libraries** tab
|
|
765
|
+
|
|
766
|
+
2. **Add the datacontract-cli library:**
|
|
767
|
+
- Click **Install new**
|
|
768
|
+
- Select **PyPI** as the library source
|
|
769
|
+
- Enter package name: `datacontract-cli[databricks]`
|
|
770
|
+
- Click **Install**
|
|
771
|
+
|
|
772
|
+
3. **Restart the cluster** to apply the library installation
|
|
773
|
+
|
|
774
|
+
4. **Use in your notebook** without additional installation:
|
|
775
|
+
```python
|
|
776
|
+
from datacontract.data_contract import DataContract
|
|
777
|
+
|
|
778
|
+
data_contract = DataContract(
|
|
779
|
+
data_contract_file="/Volumes/acme_catalog_prod/orders_latest/datacontract/datacontract.yaml",
|
|
780
|
+
spark=spark)
|
|
781
|
+
run = data_contract.test()
|
|
782
|
+
run.result
|
|
783
|
+
```
|
|
784
|
+
|
|
785
|
+
Databricks' library management properly resolves dependencies during cluster initialization, rather than at runtime in the notebook.
|
|
698
786
|
|
|
699
787
|
#### Dataframe (programmatic)
|
|
700
788
|
|
|
@@ -874,68 +962,117 @@ models:
|
|
|
874
962
|
| `DATACONTRACT_TRINO_PASSWORD` | `mysecretpassword` | Password |
|
|
875
963
|
|
|
876
964
|
|
|
965
|
+
#### API
|
|
966
|
+
|
|
967
|
+
Data Contract CLI can test APIs that return data in JSON format.
|
|
968
|
+
Currently, only GET requests are supported.
|
|
969
|
+
|
|
970
|
+
##### Example
|
|
971
|
+
|
|
972
|
+
datacontract.yaml
|
|
973
|
+
```yaml
|
|
974
|
+
servers:
|
|
975
|
+
api:
|
|
976
|
+
type: "api"
|
|
977
|
+
location: "https://api.example.com/path"
|
|
978
|
+
delimiter: none # new_line, array, or none (default)
|
|
979
|
+
|
|
980
|
+
models:
|
|
981
|
+
my_object: # corresponds to the root element of the JSON response
|
|
982
|
+
type: object
|
|
983
|
+
fields:
|
|
984
|
+
field1:
|
|
985
|
+
type: string
|
|
986
|
+
fields2:
|
|
987
|
+
type: number
|
|
988
|
+
```
|
|
989
|
+
|
|
990
|
+
##### Environment Variables
|
|
991
|
+
|
|
992
|
+
| Environment Variable | Example | Description |
|
|
993
|
+
|-----------------------------------------|------------------|---------------------------------------------------|
|
|
994
|
+
| `DATACONTRACT_API_HEADER_AUTHORIZATION` | `Bearer <token>` | The value for the `authorization` header. Optional. |
|
|
995
|
+
|
|
996
|
+
|
|
997
|
+
#### Local
|
|
998
|
+
|
|
999
|
+
Data Contract CLI can test local files in parquet, json, csv, or delta format.
|
|
1000
|
+
|
|
1001
|
+
##### Example
|
|
1002
|
+
|
|
1003
|
+
datacontract.yaml
|
|
1004
|
+
```yaml
|
|
1005
|
+
servers:
|
|
1006
|
+
local:
|
|
1007
|
+
type: local
|
|
1008
|
+
path: ./*.parquet
|
|
1009
|
+
format: parquet
|
|
1010
|
+
models:
|
|
1011
|
+
my_table_1: # corresponds to a table
|
|
1012
|
+
type: table
|
|
1013
|
+
fields:
|
|
1014
|
+
my_column_1: # corresponds to a column
|
|
1015
|
+
type: varchar
|
|
1016
|
+
my_column_2: # corresponds to a column
|
|
1017
|
+
type: string
|
|
1018
|
+
```
|
|
1019
|
+
|
|
877
1020
|
|
|
878
1021
|
### export
|
|
879
1022
|
```
|
|
880
|
-
|
|
881
|
-
Usage: datacontract export [OPTIONS] [LOCATION]
|
|
882
|
-
|
|
883
|
-
Convert data contract to a specific format. Saves to file specified by
|
|
884
|
-
|
|
885
|
-
|
|
886
|
-
╭─ Arguments
|
|
887
|
-
│ location [LOCATION] The location (url or path) of the data contract
|
|
888
|
-
│ yaml
|
|
889
|
-
|
|
890
|
-
|
|
891
|
-
|
|
892
|
-
│
|
|
893
|
-
│
|
|
894
|
-
│
|
|
895
|
-
│
|
|
896
|
-
│
|
|
897
|
-
│
|
|
898
|
-
│
|
|
899
|
-
│
|
|
900
|
-
│
|
|
901
|
-
│
|
|
902
|
-
│
|
|
903
|
-
│
|
|
904
|
-
│
|
|
905
|
-
│
|
|
906
|
-
│
|
|
907
|
-
│
|
|
908
|
-
│
|
|
909
|
-
│
|
|
910
|
-
│
|
|
911
|
-
│
|
|
912
|
-
│
|
|
913
|
-
│
|
|
914
|
-
│
|
|
915
|
-
│
|
|
916
|
-
│
|
|
917
|
-
│
|
|
918
|
-
│
|
|
919
|
-
│
|
|
920
|
-
│
|
|
921
|
-
│
|
|
922
|
-
│
|
|
923
|
-
|
|
924
|
-
|
|
925
|
-
│
|
|
926
|
-
|
|
927
|
-
|
|
928
|
-
|
|
929
|
-
│
|
|
930
|
-
│
|
|
931
|
-
|
|
932
|
-
|
|
933
|
-
│ --sql-server-type TEXT [sql] The server type to determine the sql │
|
|
934
|
-
│ dialect. By default, it uses 'auto' to │
|
|
935
|
-
│ automatically detect the sql dialect via the │
|
|
936
|
-
│ specified servers in the data contract. │
|
|
937
|
-
│ [default: auto] │
|
|
938
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1023
|
+
|
|
1024
|
+
Usage: datacontract export [OPTIONS] [LOCATION]
|
|
1025
|
+
|
|
1026
|
+
Convert data contract to a specific format. Saves to file specified by `output` option if present,
|
|
1027
|
+
otherwise prints to stdout.
|
|
1028
|
+
|
|
1029
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
1030
|
+
│ location [LOCATION] The location (url or path) of the data contract yaml. │
|
|
1031
|
+
│ [default: datacontract.yaml] │
|
|
1032
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1033
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1034
|
+
│ * --format [jsonschema|pydantic-model|sodacl|db The export format. [default: None] │
|
|
1035
|
+
│ t|dbt-sources|dbt-staging-sql|odcs|r [required] │
|
|
1036
|
+
│ df|avro|protobuf|great-expectations| │
|
|
1037
|
+
│ terraform|avro-idl|sql|sql-query|mer │
|
|
1038
|
+
│ maid|html|go|bigquery|dbml|spark|sql │
|
|
1039
|
+
│ alchemy|data-caterer|dcs|markdown|ic │
|
|
1040
|
+
│ eberg|custom|excel|dqx] │
|
|
1041
|
+
│ --output PATH Specify the file path where the │
|
|
1042
|
+
│ exported data will be saved. If no │
|
|
1043
|
+
│ path is provided, the output will be │
|
|
1044
|
+
│ printed to stdout. │
|
|
1045
|
+
│ [default: None] │
|
|
1046
|
+
│ --server TEXT The server name to export. │
|
|
1047
|
+
│ [default: None] │
|
|
1048
|
+
│ --model TEXT Use the key of the model in the data │
|
|
1049
|
+
│ contract yaml file to refer to a │
|
|
1050
|
+
│ model, e.g., `orders`, or `all` for │
|
|
1051
|
+
│ all models (default). │
|
|
1052
|
+
│ [default: all] │
|
|
1053
|
+
│ --schema TEXT The location (url or path) of the │
|
|
1054
|
+
│ Data Contract Specification JSON │
|
|
1055
|
+
│ Schema │
|
|
1056
|
+
│ [default: None] │
|
|
1057
|
+
│ --engine TEXT [engine] The engine used for great │
|
|
1058
|
+
│ expection run. │
|
|
1059
|
+
│ [default: None] │
|
|
1060
|
+
│ --template PATH The file path or URL of a template. │
|
|
1061
|
+
│ For Excel format: path/URL to custom │
|
|
1062
|
+
│ Excel template. For custom format: │
|
|
1063
|
+
│ path to Jinja template. │
|
|
1064
|
+
│ [default: None] │
|
|
1065
|
+
│ --help Show this message and exit. │
|
|
1066
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1067
|
+
╭─ RDF Options ────────────────────────────────────────────────────────────────────────────────────╮
|
|
1068
|
+
│ --rdf-base TEXT [rdf] The base URI used to generate the RDF graph. [default: None] │
|
|
1069
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1070
|
+
╭─ SQL Options ────────────────────────────────────────────────────────────────────────────────────╮
|
|
1071
|
+
│ --sql-server-type TEXT [sql] The server type to determine the sql dialect. By default, │
|
|
1072
|
+
│ it uses 'auto' to automatically detect the sql dialect via the │
|
|
1073
|
+
│ specified servers in the data contract. │
|
|
1074
|
+
│ [default: auto] │
|
|
1075
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
939
1076
|
|
|
940
1077
|
```
|
|
941
1078
|
|
|
@@ -946,35 +1083,50 @@ datacontract export --format html --output datacontract.html
|
|
|
946
1083
|
|
|
947
1084
|
Available export options:
|
|
948
1085
|
|
|
949
|
-
| Type | Description | Status
|
|
950
|
-
|
|
951
|
-
| `html` | Export to HTML | ✅
|
|
952
|
-
| `jsonschema` | Export to JSON Schema | ✅
|
|
953
|
-
| `odcs` | Export to Open Data Contract Standard (ODCS) V3 | ✅
|
|
954
|
-
| `sodacl` | Export to SodaCL quality checks in YAML format | ✅
|
|
955
|
-
| `dbt` | Export to dbt models in YAML format | ✅
|
|
956
|
-
| `dbt-sources` | Export to dbt sources in YAML format | ✅
|
|
957
|
-
| `dbt-staging-sql` | Export to dbt staging SQL models | ✅
|
|
958
|
-
| `rdf` | Export data contract to RDF representation in N3 format | ✅
|
|
959
|
-
| `avro` | Export to AVRO models | ✅
|
|
960
|
-
| `protobuf` | Export to Protobuf | ✅
|
|
961
|
-
| `terraform` | Export to terraform resources | ✅
|
|
962
|
-
| `sql` | Export to SQL DDL | ✅
|
|
963
|
-
| `sql-query` | Export to SQL Query | ✅
|
|
964
|
-
| `great-expectations` | Export to Great Expectations Suites in JSON Format | ✅
|
|
965
|
-
| `bigquery` | Export to BigQuery Schemas | ✅
|
|
966
|
-
| `go` | Export to Go types | ✅
|
|
967
|
-
| `pydantic-model` | Export to pydantic models | ✅
|
|
968
|
-
| `DBML` | Export to a DBML Diagram description | ✅
|
|
969
|
-
| `spark` | Export to a Spark StructType | ✅
|
|
970
|
-
| `sqlalchemy` | Export to SQLAlchemy Models | ✅
|
|
971
|
-
| `data-caterer` | Export to Data Caterer in YAML format | ✅
|
|
972
|
-
| `dcs` | Export to Data Contract Specification in YAML format | ✅
|
|
973
|
-
| `markdown` | Export to Markdown | ✅
|
|
1086
|
+
| Type | Description | Status |
|
|
1087
|
+
|----------------------|---------------------------------------------------------|---------|
|
|
1088
|
+
| `html` | Export to HTML | ✅ |
|
|
1089
|
+
| `jsonschema` | Export to JSON Schema | ✅ |
|
|
1090
|
+
| `odcs` | Export to Open Data Contract Standard (ODCS) V3 | ✅ |
|
|
1091
|
+
| `sodacl` | Export to SodaCL quality checks in YAML format | ✅ |
|
|
1092
|
+
| `dbt` | Export to dbt models in YAML format | ✅ |
|
|
1093
|
+
| `dbt-sources` | Export to dbt sources in YAML format | ✅ |
|
|
1094
|
+
| `dbt-staging-sql` | Export to dbt staging SQL models | ✅ |
|
|
1095
|
+
| `rdf` | Export data contract to RDF representation in N3 format | ✅ |
|
|
1096
|
+
| `avro` | Export to AVRO models | ✅ |
|
|
1097
|
+
| `protobuf` | Export to Protobuf | ✅ |
|
|
1098
|
+
| `terraform` | Export to terraform resources | ✅ |
|
|
1099
|
+
| `sql` | Export to SQL DDL | ✅ |
|
|
1100
|
+
| `sql-query` | Export to SQL Query | ✅ |
|
|
1101
|
+
| `great-expectations` | Export to Great Expectations Suites in JSON Format | ✅ |
|
|
1102
|
+
| `bigquery` | Export to BigQuery Schemas | ✅ |
|
|
1103
|
+
| `go` | Export to Go types | ✅ |
|
|
1104
|
+
| `pydantic-model` | Export to pydantic models | ✅ |
|
|
1105
|
+
| `DBML` | Export to a DBML Diagram description | ✅ |
|
|
1106
|
+
| `spark` | Export to a Spark StructType | ✅ |
|
|
1107
|
+
| `sqlalchemy` | Export to SQLAlchemy Models | ✅ |
|
|
1108
|
+
| `data-caterer` | Export to Data Caterer in YAML format | ✅ |
|
|
1109
|
+
| `dcs` | Export to Data Contract Specification in YAML format | ✅ |
|
|
1110
|
+
| `markdown` | Export to Markdown | ✅ |
|
|
974
1111
|
| `iceberg` | Export to an Iceberg JSON Schema Definition | partial |
|
|
975
|
-
| `
|
|
976
|
-
|
|
|
1112
|
+
| `excel` | Export to ODCS Excel Template | ✅ |
|
|
1113
|
+
| `custom` | Export to Custom format with Jinja | ✅ |
|
|
1114
|
+
| `dqx` | Export to DQX in YAML format | ✅ |
|
|
1115
|
+
| Missing something? | Please create an issue on GitHub | TBD |
|
|
1116
|
+
|
|
1117
|
+
#### SQL
|
|
1118
|
+
|
|
1119
|
+
The `export` function converts a given data contract into a SQL data definition language (DDL).
|
|
977
1120
|
|
|
1121
|
+
```shell
|
|
1122
|
+
datacontract export datacontract.yaml --format sql --output output.sql
|
|
1123
|
+
```
|
|
1124
|
+
|
|
1125
|
+
If using Databricks, and an error is thrown when trying to deploy the SQL DDLs with `variant` columns set the following properties.
|
|
1126
|
+
|
|
1127
|
+
```shell
|
|
1128
|
+
spark.conf.set(“spark.databricks.delta.schema.typeCheck.enabled”, “false”)
|
|
1129
|
+
```
|
|
978
1130
|
|
|
979
1131
|
#### Great Expectations
|
|
980
1132
|
|
|
@@ -982,7 +1134,7 @@ The `export` function transforms a specified data contract into a comprehensive
|
|
|
982
1134
|
If the contract includes multiple models, you need to specify the names of the model you wish to export.
|
|
983
1135
|
|
|
984
1136
|
```shell
|
|
985
|
-
datacontract
|
|
1137
|
+
datacontract export datacontract.yaml --format great-expectations --model orders
|
|
986
1138
|
```
|
|
987
1139
|
|
|
988
1140
|
The export creates a list of expectations by utilizing:
|
|
@@ -1007,7 +1159,7 @@ To further customize the export, the following optional arguments are available:
|
|
|
1007
1159
|
|
|
1008
1160
|
#### RDF
|
|
1009
1161
|
|
|
1010
|
-
The export function converts a given data contract into a RDF representation. You have the option to
|
|
1162
|
+
The `export` function converts a given data contract into a RDF representation. You have the option to
|
|
1011
1163
|
add a base_url which will be used as the default prefix to resolve relative IRIs inside the document.
|
|
1012
1164
|
|
|
1013
1165
|
```shell
|
|
@@ -1230,73 +1382,110 @@ FROM
|
|
|
1230
1382
|
{{ ref('orders') }}
|
|
1231
1383
|
```
|
|
1232
1384
|
|
|
1385
|
+
#### ODCS Excel Templace
|
|
1386
|
+
|
|
1387
|
+
The `export` function converts a data contract into an ODCS (Open Data Contract Standard) Excel template. This creates a user-friendly Excel spreadsheet that can be used for authoring, sharing, and managing data contracts using the familiar Excel interface.
|
|
1388
|
+
|
|
1389
|
+
```shell
|
|
1390
|
+
datacontract export --format excel --output datacontract.xlsx datacontract.yaml
|
|
1391
|
+
```
|
|
1392
|
+
|
|
1393
|
+
The Excel format enables:
|
|
1394
|
+
- **User-friendly authoring**: Create and edit data contracts in Excel's familiar interface
|
|
1395
|
+
- **Easy sharing**: Distribute data contracts as standard Excel files
|
|
1396
|
+
- **Collaboration**: Enable non-technical stakeholders to contribute to data contract definitions
|
|
1397
|
+
- **Round-trip conversion**: Import Excel templates back to YAML data contracts
|
|
1398
|
+
|
|
1399
|
+
For more information about the Excel template structure, visit the [ODCS Excel Template repository](https://github.com/datacontract/open-data-contract-standard-excel-template).
|
|
1400
|
+
|
|
1233
1401
|
### import
|
|
1234
1402
|
```
|
|
1235
|
-
|
|
1236
|
-
|
|
1237
|
-
|
|
1238
|
-
|
|
1239
|
-
|
|
1240
|
-
|
|
1241
|
-
|
|
1242
|
-
│
|
|
1243
|
-
│
|
|
1244
|
-
│
|
|
1245
|
-
│
|
|
1246
|
-
│
|
|
1247
|
-
│
|
|
1248
|
-
│
|
|
1249
|
-
│
|
|
1250
|
-
│
|
|
1251
|
-
│
|
|
1252
|
-
│ --
|
|
1253
|
-
│
|
|
1254
|
-
│
|
|
1255
|
-
│
|
|
1256
|
-
│
|
|
1257
|
-
│
|
|
1258
|
-
│
|
|
1259
|
-
│
|
|
1260
|
-
│
|
|
1261
|
-
│
|
|
1262
|
-
│
|
|
1263
|
-
│ --
|
|
1264
|
-
│
|
|
1265
|
-
│
|
|
1266
|
-
│
|
|
1267
|
-
│
|
|
1268
|
-
│
|
|
1269
|
-
│
|
|
1270
|
-
│
|
|
1271
|
-
│
|
|
1272
|
-
│
|
|
1273
|
-
│ --
|
|
1274
|
-
│
|
|
1275
|
-
│
|
|
1276
|
-
│
|
|
1277
|
-
│
|
|
1278
|
-
│
|
|
1279
|
-
│
|
|
1280
|
-
│
|
|
1281
|
-
│
|
|
1282
|
-
│
|
|
1283
|
-
│
|
|
1284
|
-
│
|
|
1285
|
-
│
|
|
1286
|
-
│
|
|
1287
|
-
│
|
|
1288
|
-
│ --
|
|
1289
|
-
│
|
|
1290
|
-
│
|
|
1291
|
-
│
|
|
1292
|
-
│
|
|
1293
|
-
│
|
|
1294
|
-
│ --
|
|
1295
|
-
│
|
|
1296
|
-
│
|
|
1297
|
-
│
|
|
1298
|
-
│
|
|
1299
|
-
|
|
1403
|
+
|
|
1404
|
+
Usage: datacontract import [OPTIONS]
|
|
1405
|
+
|
|
1406
|
+
Create a data contract from the given source location. Saves to file specified by `output` option
|
|
1407
|
+
if present, otherwise prints to stdout.
|
|
1408
|
+
|
|
1409
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1410
|
+
│ * --format [sql|avro|dbt|dbml|glue|jsonsc The format of the source file. │
|
|
1411
|
+
│ hema|json|bigquery|odcs|unity| [default: None] │
|
|
1412
|
+
│ spark|iceberg|parquet|csv|prot [required] │
|
|
1413
|
+
│ obuf|excel] │
|
|
1414
|
+
│ --output PATH Specify the file path where │
|
|
1415
|
+
│ the Data Contract will be │
|
|
1416
|
+
│ saved. If no path is provided, │
|
|
1417
|
+
│ the output will be printed to │
|
|
1418
|
+
│ stdout. │
|
|
1419
|
+
│ [default: None] │
|
|
1420
|
+
│ --source TEXT The path to the file that │
|
|
1421
|
+
│ should be imported. │
|
|
1422
|
+
│ [default: None] │
|
|
1423
|
+
│ --spec [datacontract_specification|od The format of the data │
|
|
1424
|
+
│ cs] contract to import. │
|
|
1425
|
+
│ [default: │
|
|
1426
|
+
│ datacontract_specification] │
|
|
1427
|
+
│ --dialect TEXT The SQL dialect to use when │
|
|
1428
|
+
│ importing SQL files, e.g., │
|
|
1429
|
+
│ postgres, tsql, bigquery. │
|
|
1430
|
+
│ [default: None] │
|
|
1431
|
+
│ --glue-table TEXT List of table ids to import │
|
|
1432
|
+
│ from the Glue Database (repeat │
|
|
1433
|
+
│ for multiple table ids, leave │
|
|
1434
|
+
│ empty for all tables in the │
|
|
1435
|
+
│ dataset). │
|
|
1436
|
+
│ [default: None] │
|
|
1437
|
+
│ --bigquery-project TEXT The bigquery project id. │
|
|
1438
|
+
│ [default: None] │
|
|
1439
|
+
│ --bigquery-dataset TEXT The bigquery dataset id. │
|
|
1440
|
+
│ [default: None] │
|
|
1441
|
+
│ --bigquery-table TEXT List of table ids to import │
|
|
1442
|
+
│ from the bigquery API (repeat │
|
|
1443
|
+
│ for multiple table ids, leave │
|
|
1444
|
+
│ empty for all tables in the │
|
|
1445
|
+
│ dataset). │
|
|
1446
|
+
│ [default: None] │
|
|
1447
|
+
│ --unity-table-full-name TEXT Full name of a table in the │
|
|
1448
|
+
│ unity catalog │
|
|
1449
|
+
│ [default: None] │
|
|
1450
|
+
│ --dbt-model TEXT List of models names to import │
|
|
1451
|
+
│ from the dbt manifest file │
|
|
1452
|
+
│ (repeat for multiple models │
|
|
1453
|
+
│ names, leave empty for all │
|
|
1454
|
+
│ models in the dataset). │
|
|
1455
|
+
│ [default: None] │
|
|
1456
|
+
│ --dbml-schema TEXT List of schema names to import │
|
|
1457
|
+
│ from the DBML file (repeat for │
|
|
1458
|
+
│ multiple schema names, leave │
|
|
1459
|
+
│ empty for all tables in the │
|
|
1460
|
+
│ file). │
|
|
1461
|
+
│ [default: None] │
|
|
1462
|
+
│ --dbml-table TEXT List of table names to import │
|
|
1463
|
+
│ from the DBML file (repeat for │
|
|
1464
|
+
│ multiple table names, leave │
|
|
1465
|
+
│ empty for all tables in the │
|
|
1466
|
+
│ file). │
|
|
1467
|
+
│ [default: None] │
|
|
1468
|
+
│ --iceberg-table TEXT Table name to assign to the │
|
|
1469
|
+
│ model created from the Iceberg │
|
|
1470
|
+
│ schema. │
|
|
1471
|
+
│ [default: None] │
|
|
1472
|
+
│ --template TEXT The location (url or path) of │
|
|
1473
|
+
│ the Data Contract │
|
|
1474
|
+
│ Specification Template │
|
|
1475
|
+
│ [default: None] │
|
|
1476
|
+
│ --schema TEXT The location (url or path) of │
|
|
1477
|
+
│ the Data Contract │
|
|
1478
|
+
│ Specification JSON Schema │
|
|
1479
|
+
│ [default: None] │
|
|
1480
|
+
│ --owner TEXT The owner or team responsible │
|
|
1481
|
+
│ for managing the data │
|
|
1482
|
+
│ contract. │
|
|
1483
|
+
│ [default: None] │
|
|
1484
|
+
│ --id TEXT The identifier for the the │
|
|
1485
|
+
│ data contract. │
|
|
1486
|
+
│ [default: None] │
|
|
1487
|
+
│ --help Show this message and exit. │
|
|
1488
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1300
1489
|
|
|
1301
1490
|
```
|
|
1302
1491
|
|
|
@@ -1312,21 +1501,23 @@ Available import options:
|
|
|
1312
1501
|
|
|
1313
1502
|
| Type | Description | Status |
|
|
1314
1503
|
|--------------------|------------------------------------------------|--------|
|
|
1315
|
-
| `sql` | Import from SQL DDL | ✅ |
|
|
1316
1504
|
| `avro` | Import from AVRO schemas | ✅ |
|
|
1317
|
-
| `glue` | Import from AWS Glue DataCatalog | ✅ |
|
|
1318
|
-
| `jsonschema` | Import from JSON Schemas | ✅ |
|
|
1319
1505
|
| `bigquery` | Import from BigQuery Schemas | ✅ |
|
|
1320
|
-
| `unity` | Import from Databricks Unity Catalog | partial |
|
|
1321
|
-
| `dbt` | Import from dbt models | ✅ |
|
|
1322
|
-
| `odcs` | Import from Open Data Contract Standard (ODCS) | ✅ |
|
|
1323
|
-
| `spark` | Import from Spark StructTypes | ✅ |
|
|
1324
|
-
| `dbml` | Import from DBML models | ✅ |
|
|
1325
1506
|
| `csv` | Import from CSV File | ✅ |
|
|
1326
|
-
| `
|
|
1507
|
+
| `dbml` | Import from DBML models | ✅ |
|
|
1508
|
+
| `dbt` | Import from dbt models | ✅ |
|
|
1509
|
+
| `excel` | Import from ODCS Excel Template | ✅ |
|
|
1510
|
+
| `glue` | Import from AWS Glue DataCatalog | ✅ |
|
|
1327
1511
|
| `iceberg` | Import from an Iceberg JSON Schema Definition | partial |
|
|
1328
|
-
| `
|
|
1329
|
-
|
|
|
1512
|
+
| `jsonschema` | Import from JSON Schemas | ✅ |
|
|
1513
|
+
| `odcs` | Import from Open Data Contract Standard (ODCS) | ✅ |
|
|
1514
|
+
| `parquet` | Import from Parquet File Metadata | ✅ |
|
|
1515
|
+
| `protobuf` | Import from Protobuf schemas | ✅ |
|
|
1516
|
+
| `spark` | Import from Spark StructTypes, Variant | ✅ |
|
|
1517
|
+
| `sql` | Import from SQL DDL | ✅ |
|
|
1518
|
+
| `unity` | Import from Databricks Unity Catalog | partial |
|
|
1519
|
+
| `excel` | Import from ODCS Excel Template | ✅ |
|
|
1520
|
+
| Missing something? | Please create an issue on GitHub | TBD |
|
|
1330
1521
|
|
|
1331
1522
|
|
|
1332
1523
|
#### ODCS
|
|
@@ -1367,16 +1558,21 @@ datacontract import --format bigquery --bigquery-project <project_id> --bigquery
|
|
|
1367
1558
|
```
|
|
1368
1559
|
|
|
1369
1560
|
#### Unity Catalog
|
|
1370
|
-
|
|
1371
1561
|
```bash
|
|
1372
1562
|
# Example import from a Unity Catalog JSON file
|
|
1373
1563
|
datacontract import --format unity --source my_unity_table.json
|
|
1374
1564
|
```
|
|
1375
1565
|
|
|
1376
1566
|
```bash
|
|
1377
|
-
# Example import single table from Unity Catalog via HTTP endpoint
|
|
1378
|
-
export
|
|
1379
|
-
export
|
|
1567
|
+
# Example import single table from Unity Catalog via HTTP endpoint using PAT
|
|
1568
|
+
export DATACONTRACT_DATABRICKS_SERVER_HOSTNAME="https://xyz.cloud.databricks.com"
|
|
1569
|
+
export DATACONTRACT_DATABRICKS_TOKEN=<token>
|
|
1570
|
+
datacontract import --format unity --unity-table-full-name <table_full_name>
|
|
1571
|
+
```
|
|
1572
|
+
Please Refer to [Databricks documentation](https://docs.databricks.com/aws/en/dev-tools/auth/unified-auth) on how to set up a profile
|
|
1573
|
+
```bash
|
|
1574
|
+
# Example import single table from Unity Catalog via HTTP endpoint using Profile
|
|
1575
|
+
export DATACONTRACT_DATABRICKS_PROFILE="my-profile"
|
|
1380
1576
|
datacontract import --format unity --unity-table-full-name <table_full_name>
|
|
1381
1577
|
```
|
|
1382
1578
|
|
|
@@ -1397,6 +1593,17 @@ datacontract import --format dbt --source <manifest_path> --dbt-model <model_nam
|
|
|
1397
1593
|
datacontract import --format dbt --source <manifest_path>
|
|
1398
1594
|
```
|
|
1399
1595
|
|
|
1596
|
+
### Excel
|
|
1597
|
+
|
|
1598
|
+
Importing from [ODCS Excel Template](https://github.com/datacontract/open-data-contract-standard-excel-template).
|
|
1599
|
+
|
|
1600
|
+
Examples:
|
|
1601
|
+
|
|
1602
|
+
```bash
|
|
1603
|
+
# Example import from ODCS Excel Template
|
|
1604
|
+
datacontract import --format excel --source odcs.xlsx
|
|
1605
|
+
```
|
|
1606
|
+
|
|
1400
1607
|
#### Glue
|
|
1401
1608
|
|
|
1402
1609
|
Importing from Glue reads the necessary Data directly off of the AWS API.
|
|
@@ -1416,14 +1623,31 @@ datacontract import --format glue --source <database_name>
|
|
|
1416
1623
|
|
|
1417
1624
|
#### Spark
|
|
1418
1625
|
|
|
1419
|
-
Importing from Spark table or view these must be created or accessible in the Spark context. Specify tables list in `source` parameter.
|
|
1420
|
-
|
|
1421
|
-
Example:
|
|
1626
|
+
Importing from Spark table or view these must be created or accessible in the Spark context. Specify tables list in `source` parameter. If the `source` tables are registered as tables in Databricks, and they have a table-level descriptions they will also be added to the Data Contract Specification.
|
|
1422
1627
|
|
|
1423
1628
|
```bash
|
|
1629
|
+
# Example: Import Spark table(s) from Spark context
|
|
1424
1630
|
datacontract import --format spark --source "users,orders"
|
|
1425
1631
|
```
|
|
1426
1632
|
|
|
1633
|
+
```bash
|
|
1634
|
+
# Example: Import Spark table
|
|
1635
|
+
DataContract.import_from_source("spark", "users")
|
|
1636
|
+
DataContract.import_from_source(format = "spark", source = "users")
|
|
1637
|
+
|
|
1638
|
+
# Example: Import Spark dataframe
|
|
1639
|
+
DataContract.import_from_source("spark", "users", dataframe = df_user)
|
|
1640
|
+
DataContract.import_from_source(format = "spark", source = "users", dataframe = df_user)
|
|
1641
|
+
|
|
1642
|
+
# Example: Import Spark table + table description
|
|
1643
|
+
DataContract.import_from_source("spark", "users", description = "description")
|
|
1644
|
+
DataContract.import_from_source(format = "spark", source = "users", description = "description")
|
|
1645
|
+
|
|
1646
|
+
# Example: Import Spark dataframe + table description
|
|
1647
|
+
DataContract.import_from_source("spark", "users", dataframe = df_user, description = "description")
|
|
1648
|
+
DataContract.import_from_source(format = "spark", source = "users", dataframe = df_user, description = "description")
|
|
1649
|
+
```
|
|
1650
|
+
|
|
1427
1651
|
#### DBML
|
|
1428
1652
|
|
|
1429
1653
|
Importing from DBML Documents.
|
|
@@ -1475,95 +1699,96 @@ Example:
|
|
|
1475
1699
|
datacontract import --format csv --source "test.csv"
|
|
1476
1700
|
```
|
|
1477
1701
|
|
|
1702
|
+
#### protobuf
|
|
1703
|
+
|
|
1704
|
+
Importing from protobuf File. Specify file in `source` parameter.
|
|
1705
|
+
|
|
1706
|
+
Example:
|
|
1707
|
+
|
|
1708
|
+
```bash
|
|
1709
|
+
datacontract import --format protobuf --source "test.proto"
|
|
1710
|
+
```
|
|
1711
|
+
|
|
1478
1712
|
|
|
1479
1713
|
### breaking
|
|
1480
1714
|
```
|
|
1481
|
-
|
|
1482
|
-
Usage: datacontract breaking [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1483
|
-
|
|
1484
|
-
Identifies breaking changes between data contracts. Prints to stdout.
|
|
1485
|
-
|
|
1486
|
-
╭─ Arguments
|
|
1487
|
-
│ * location_old TEXT The location (url or path) of the old data
|
|
1488
|
-
│
|
|
1489
|
-
│ [
|
|
1490
|
-
│
|
|
1491
|
-
│
|
|
1492
|
-
│
|
|
1493
|
-
|
|
1494
|
-
|
|
1495
|
-
|
|
1496
|
-
|
|
1497
|
-
│ --help Show this message and exit. │
|
|
1498
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1715
|
+
|
|
1716
|
+
Usage: datacontract breaking [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1717
|
+
|
|
1718
|
+
Identifies breaking changes between data contracts. Prints to stdout.
|
|
1719
|
+
|
|
1720
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
1721
|
+
│ * location_old TEXT The location (url or path) of the old data contract yaml. │
|
|
1722
|
+
│ [default: None] │
|
|
1723
|
+
│ [required] │
|
|
1724
|
+
│ * location_new TEXT The location (url or path) of the new data contract yaml. │
|
|
1725
|
+
│ [default: None] │
|
|
1726
|
+
│ [required] │
|
|
1727
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1728
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1729
|
+
│ --help Show this message and exit. │
|
|
1730
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1499
1731
|
|
|
1500
1732
|
```
|
|
1501
1733
|
|
|
1502
1734
|
### changelog
|
|
1503
1735
|
```
|
|
1504
|
-
|
|
1505
|
-
Usage: datacontract changelog [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1506
|
-
|
|
1507
|
-
Generate a changelog between data contracts. Prints to stdout.
|
|
1508
|
-
|
|
1509
|
-
╭─ Arguments
|
|
1510
|
-
│ * location_old TEXT The location (url or path) of the old data
|
|
1511
|
-
│
|
|
1512
|
-
│ [
|
|
1513
|
-
│
|
|
1514
|
-
│
|
|
1515
|
-
│
|
|
1516
|
-
|
|
1517
|
-
|
|
1518
|
-
|
|
1519
|
-
|
|
1520
|
-
│ --help Show this message and exit. │
|
|
1521
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1736
|
+
|
|
1737
|
+
Usage: datacontract changelog [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1738
|
+
|
|
1739
|
+
Generate a changelog between data contracts. Prints to stdout.
|
|
1740
|
+
|
|
1741
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
1742
|
+
│ * location_old TEXT The location (url or path) of the old data contract yaml. │
|
|
1743
|
+
│ [default: None] │
|
|
1744
|
+
│ [required] │
|
|
1745
|
+
│ * location_new TEXT The location (url or path) of the new data contract yaml. │
|
|
1746
|
+
│ [default: None] │
|
|
1747
|
+
│ [required] │
|
|
1748
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1749
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1750
|
+
│ --help Show this message and exit. │
|
|
1751
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1522
1752
|
|
|
1523
1753
|
```
|
|
1524
1754
|
|
|
1525
1755
|
### diff
|
|
1526
1756
|
```
|
|
1527
|
-
|
|
1528
|
-
Usage: datacontract diff [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1529
|
-
|
|
1530
|
-
PLACEHOLDER. Currently works as 'changelog' does.
|
|
1531
|
-
|
|
1532
|
-
╭─ Arguments
|
|
1533
|
-
│ * location_old TEXT The location (url or path) of the old data
|
|
1534
|
-
│
|
|
1535
|
-
│ [
|
|
1536
|
-
│
|
|
1537
|
-
│
|
|
1538
|
-
│
|
|
1539
|
-
|
|
1540
|
-
|
|
1541
|
-
|
|
1542
|
-
|
|
1543
|
-
│ --help Show this message and exit. │
|
|
1544
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1757
|
+
|
|
1758
|
+
Usage: datacontract diff [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1759
|
+
|
|
1760
|
+
PLACEHOLDER. Currently works as 'changelog' does.
|
|
1761
|
+
|
|
1762
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
1763
|
+
│ * location_old TEXT The location (url or path) of the old data contract yaml. │
|
|
1764
|
+
│ [default: None] │
|
|
1765
|
+
│ [required] │
|
|
1766
|
+
│ * location_new TEXT The location (url or path) of the new data contract yaml. │
|
|
1767
|
+
│ [default: None] │
|
|
1768
|
+
│ [required] │
|
|
1769
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1770
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1771
|
+
│ --help Show this message and exit. │
|
|
1772
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1545
1773
|
|
|
1546
1774
|
```
|
|
1547
1775
|
|
|
1548
1776
|
### catalog
|
|
1549
1777
|
```
|
|
1550
|
-
|
|
1551
|
-
Usage: datacontract catalog [OPTIONS]
|
|
1552
|
-
|
|
1553
|
-
Create
|
|
1554
|
-
|
|
1555
|
-
╭─ Options
|
|
1556
|
-
│ --files TEXT Glob pattern for the data contract files to include in │
|
|
1557
|
-
│
|
|
1558
|
-
│ [default: *.yaml]
|
|
1559
|
-
│ --output TEXT Output directory for the catalog html files. │
|
|
1560
|
-
│
|
|
1561
|
-
│
|
|
1562
|
-
│
|
|
1563
|
-
|
|
1564
|
-
│ https://datacontract.com/datacontract.schema.json] │
|
|
1565
|
-
│ --help Show this message and exit. │
|
|
1566
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1778
|
+
|
|
1779
|
+
Usage: datacontract catalog [OPTIONS]
|
|
1780
|
+
|
|
1781
|
+
Create a html catalog of data contracts.
|
|
1782
|
+
|
|
1783
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1784
|
+
│ --files TEXT Glob pattern for the data contract files to include in the catalog. │
|
|
1785
|
+
│ Applies recursively to any subfolders. │
|
|
1786
|
+
│ [default: *.yaml] │
|
|
1787
|
+
│ --output TEXT Output directory for the catalog html files. [default: catalog/] │
|
|
1788
|
+
│ --schema TEXT The location (url or path) of the Data Contract Specification JSON Schema │
|
|
1789
|
+
│ [default: None] │
|
|
1790
|
+
│ --help Show this message and exit. │
|
|
1791
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1567
1792
|
|
|
1568
1793
|
```
|
|
1569
1794
|
|
|
@@ -1579,51 +1804,50 @@ datacontract catalog --files "*.odcs.yaml"
|
|
|
1579
1804
|
|
|
1580
1805
|
### publish
|
|
1581
1806
|
```
|
|
1582
|
-
|
|
1583
|
-
Usage: datacontract publish [OPTIONS] [LOCATION]
|
|
1584
|
-
|
|
1585
|
-
Publish the data contract to the Data Mesh Manager.
|
|
1586
|
-
|
|
1587
|
-
╭─ Arguments
|
|
1588
|
-
│ location [LOCATION] The location (url or path) of the data contract
|
|
1589
|
-
│ yaml
|
|
1590
|
-
|
|
1591
|
-
|
|
1592
|
-
|
|
1593
|
-
│
|
|
1594
|
-
│
|
|
1595
|
-
│
|
|
1596
|
-
│
|
|
1597
|
-
│ [default:
|
|
1598
|
-
│
|
|
1599
|
-
|
|
1600
|
-
│ publishing the data │
|
|
1601
|
-
│ contract. │
|
|
1602
|
-
│ [default: │
|
|
1603
|
-
│ ssl-verification] │
|
|
1604
|
-
│ --help Show this message and │
|
|
1605
|
-
│ exit. │
|
|
1606
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1807
|
+
|
|
1808
|
+
Usage: datacontract publish [OPTIONS] [LOCATION]
|
|
1809
|
+
|
|
1810
|
+
Publish the data contract to the Data Mesh Manager.
|
|
1811
|
+
|
|
1812
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
1813
|
+
│ location [LOCATION] The location (url or path) of the data contract yaml. │
|
|
1814
|
+
│ [default: datacontract.yaml] │
|
|
1815
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1816
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1817
|
+
│ --schema TEXT The location (url or path) of the Data │
|
|
1818
|
+
│ Contract Specification JSON Schema │
|
|
1819
|
+
│ [default: None] │
|
|
1820
|
+
│ --ssl-verification --no-ssl-verification SSL verification when publishing the data │
|
|
1821
|
+
│ contract. │
|
|
1822
|
+
│ [default: ssl-verification] │
|
|
1823
|
+
│ --help Show this message and exit. │
|
|
1824
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1607
1825
|
|
|
1608
1826
|
```
|
|
1609
1827
|
|
|
1610
1828
|
### api
|
|
1611
1829
|
```
|
|
1612
|
-
|
|
1613
|
-
Usage: datacontract api [OPTIONS]
|
|
1614
|
-
|
|
1615
|
-
Start the datacontract CLI as server application with REST API.
|
|
1616
|
-
The OpenAPI documentation as Swagger UI is available on http://localhost:4242. You can execute the
|
|
1617
|
-
|
|
1618
|
-
|
|
1619
|
-
To
|
|
1620
|
-
|
|
1621
|
-
|
|
1622
|
-
|
|
1623
|
-
|
|
1624
|
-
|
|
1625
|
-
|
|
1626
|
-
|
|
1830
|
+
|
|
1831
|
+
Usage: datacontract api [OPTIONS]
|
|
1832
|
+
|
|
1833
|
+
Start the datacontract CLI as server application with REST API.
|
|
1834
|
+
The OpenAPI documentation as Swagger UI is available on http://localhost:4242. You can execute the
|
|
1835
|
+
commands directly from the Swagger UI.
|
|
1836
|
+
To protect the API, you can set the environment variable DATACONTRACT_CLI_API_KEY to a secret API
|
|
1837
|
+
key. To authenticate, requests must include the header 'x-api-key' with the correct API key. This
|
|
1838
|
+
is highly recommended, as data contract tests may be subject to SQL injections or leak sensitive
|
|
1839
|
+
information.
|
|
1840
|
+
To connect to servers (such as a Snowflake data source), set the credentials as environment
|
|
1841
|
+
variables as documented in https://cli.datacontract.com/#test
|
|
1842
|
+
It is possible to run the API with extra arguments for `uvicorn.run()` as keyword arguments, e.g.:
|
|
1843
|
+
`datacontract api --port 1234 --root_path /datacontract`.
|
|
1844
|
+
|
|
1845
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1846
|
+
│ --port INTEGER Bind socket to this port. [default: 4242] │
|
|
1847
|
+
│ --host TEXT Bind socket to this host. Hint: For running in docker, set it to 0.0.0.0 │
|
|
1848
|
+
│ [default: 127.0.0.1] │
|
|
1849
|
+
│ --help Show this message and exit. │
|
|
1850
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1627
1851
|
|
|
1628
1852
|
```
|
|
1629
1853
|
|
|
@@ -1666,8 +1890,7 @@ Create a data contract based on the actual data. This is the fastest way to get
|
|
|
1666
1890
|
$ datacontract test
|
|
1667
1891
|
```
|
|
1668
1892
|
|
|
1669
|
-
3.
|
|
1670
|
-
probably forgot to document some fields and add the terms and conditions.
|
|
1893
|
+
3. Validate that the `datacontract.yaml` is correctly formatted and adheres to the Data Contract Specification.
|
|
1671
1894
|
```bash
|
|
1672
1895
|
$ datacontract lint
|
|
1673
1896
|
```
|
|
@@ -1688,8 +1911,7 @@ Create a data contract based on the requirements from use cases.
|
|
|
1688
1911
|
```
|
|
1689
1912
|
|
|
1690
1913
|
2. Create the model and quality guarantees based on your business requirements. Fill in the terms,
|
|
1691
|
-
descriptions, etc.
|
|
1692
|
-
linter.
|
|
1914
|
+
descriptions, etc. Validate that your `datacontract.yaml` is correctly formatted.
|
|
1693
1915
|
```bash
|
|
1694
1916
|
$ datacontract lint
|
|
1695
1917
|
```
|
|
@@ -1883,7 +2105,7 @@ if __name__ == "__main__":
|
|
|
1883
2105
|
Output
|
|
1884
2106
|
|
|
1885
2107
|
```yaml
|
|
1886
|
-
dataContractSpecification: 1.1
|
|
2108
|
+
dataContractSpecification: 1.2.1
|
|
1887
2109
|
id: uuid-custom
|
|
1888
2110
|
info:
|
|
1889
2111
|
title: my_custom_imported_data
|
|
@@ -1902,19 +2124,41 @@ models:
|
|
|
1902
2124
|
```
|
|
1903
2125
|
## Development Setup
|
|
1904
2126
|
|
|
1905
|
-
|
|
2127
|
+
- Install [uv](https://docs.astral.sh/uv/)
|
|
2128
|
+
- Python base interpreter should be 3.11.x .
|
|
2129
|
+
- Docker engine must be running to execute the tests.
|
|
1906
2130
|
|
|
1907
2131
|
```bash
|
|
1908
|
-
#
|
|
1909
|
-
|
|
1910
|
-
|
|
2132
|
+
# make sure uv is installed
|
|
2133
|
+
uv python pin 3.11
|
|
2134
|
+
uv venv
|
|
2135
|
+
uv pip install -e '.[dev]'
|
|
2136
|
+
uv run ruff check
|
|
2137
|
+
uv run pytest
|
|
2138
|
+
```
|
|
2139
|
+
|
|
2140
|
+
### Troubleshooting
|
|
2141
|
+
|
|
2142
|
+
#### Windows: Some tests fail
|
|
2143
|
+
|
|
2144
|
+
Run in wsl. (We need to fix the pathes in the tests so that normal Windows will work, contributions are appreciated)
|
|
2145
|
+
|
|
2146
|
+
#### PyCharm does not pick up the `.venv`
|
|
2147
|
+
|
|
2148
|
+
This [uv issue](https://github.com/astral-sh/uv/issues/12545) might be relevant.
|
|
2149
|
+
|
|
2150
|
+
Try to sync all groups:
|
|
2151
|
+
|
|
2152
|
+
```
|
|
2153
|
+
uv sync --all-groups --all-extras
|
|
2154
|
+
```
|
|
2155
|
+
|
|
2156
|
+
#### Errors in tests that use PySpark (e.g. test_test_kafka.py)
|
|
2157
|
+
|
|
2158
|
+
Ensure you have a JDK 17 or 21 installed. Java 25 causes issues.
|
|
1911
2159
|
|
|
1912
|
-
|
|
1913
|
-
|
|
1914
|
-
pip install -e '.[dev]'
|
|
1915
|
-
pre-commit install
|
|
1916
|
-
pre-commit run --all-files
|
|
1917
|
-
pytest
|
|
2160
|
+
```
|
|
2161
|
+
java --version
|
|
1918
2162
|
```
|
|
1919
2163
|
|
|
1920
2164
|
|
|
@@ -1949,27 +2193,6 @@ docker compose run --rm datacontract --version
|
|
|
1949
2193
|
|
|
1950
2194
|
This command runs the container momentarily to check the version of the `datacontract` CLI. The `--rm` flag ensures that the container is automatically removed after the command executes, keeping your environment clean.
|
|
1951
2195
|
|
|
1952
|
-
## Use with pre-commit
|
|
1953
|
-
|
|
1954
|
-
To run `datacontract-cli` as part of a [pre-commit](https://pre-commit.com/) workflow, add something like the below to the `repos` list in the project's `.pre-commit-config.yaml`:
|
|
1955
|
-
|
|
1956
|
-
```yaml
|
|
1957
|
-
repos:
|
|
1958
|
-
- repo: https://github.com/datacontract/datacontract-cli
|
|
1959
|
-
rev: "v0.10.9"
|
|
1960
|
-
hooks:
|
|
1961
|
-
- id: datacontract-lint
|
|
1962
|
-
- id: datacontract-test
|
|
1963
|
-
args: ["--server", "production"]
|
|
1964
|
-
```
|
|
1965
|
-
|
|
1966
|
-
### Available Hook IDs
|
|
1967
|
-
|
|
1968
|
-
| Hook ID | Description | Dependency |
|
|
1969
|
-
| ----------------- | -------------------------------------------------- | ---------- |
|
|
1970
|
-
| datacontract-lint | Runs the lint subcommand. | Python3 |
|
|
1971
|
-
| datacontract-test | Runs the test subcommand. Please look at | Python3 |
|
|
1972
|
-
| | [test](#test) section for all available arguments. | |
|
|
1973
2196
|
|
|
1974
2197
|
## Release Steps
|
|
1975
2198
|
|
|
@@ -1986,8 +2209,10 @@ We are happy to receive your contributions. Propose your change in an issue or d
|
|
|
1986
2209
|
|
|
1987
2210
|
## Companies using this tool
|
|
1988
2211
|
|
|
2212
|
+
- [Entropy Data](https://www.entropy-data.com)
|
|
1989
2213
|
- [INNOQ](https://innoq.com)
|
|
1990
2214
|
- [Data Catering](https://data.catering/)
|
|
2215
|
+
- [Oliver Wyman](https://www.oliverwyman.com/)
|
|
1991
2216
|
- And many more. To add your company, please create a pull request.
|
|
1992
2217
|
|
|
1993
2218
|
## Related Tools
|
|
@@ -2003,7 +2228,7 @@ We are happy to receive your contributions. Propose your change in an issue or d
|
|
|
2003
2228
|
|
|
2004
2229
|
## Credits
|
|
2005
2230
|
|
|
2006
|
-
Created by [Stefan Negele](https://www.linkedin.com/in/stefan-negele-573153112/)
|
|
2231
|
+
Created by [Stefan Negele](https://www.linkedin.com/in/stefan-negele-573153112/), [Jochen Christ](https://www.linkedin.com/in/jochenchrist/), and [Simon Harrer]().
|
|
2007
2232
|
|
|
2008
2233
|
|
|
2009
2234
|
|