datacontract-cli 0.10.23__py3-none-any.whl → 0.10.40__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- datacontract/__init__.py +13 -0
- datacontract/api.py +12 -5
- datacontract/catalog/catalog.py +5 -3
- datacontract/cli.py +119 -13
- datacontract/data_contract.py +145 -67
- datacontract/engines/data_contract_checks.py +366 -60
- datacontract/engines/data_contract_test.py +50 -4
- datacontract/engines/fastjsonschema/check_jsonschema.py +37 -19
- datacontract/engines/fastjsonschema/s3/s3_read_files.py +3 -2
- datacontract/engines/soda/check_soda_execute.py +27 -3
- datacontract/engines/soda/connections/athena.py +79 -0
- datacontract/engines/soda/connections/duckdb_connection.py +65 -6
- datacontract/engines/soda/connections/kafka.py +4 -2
- datacontract/engines/soda/connections/oracle.py +50 -0
- datacontract/export/avro_converter.py +20 -3
- datacontract/export/bigquery_converter.py +1 -1
- datacontract/export/dbt_converter.py +36 -7
- datacontract/export/dqx_converter.py +126 -0
- datacontract/export/duckdb_type_converter.py +57 -0
- datacontract/export/excel_exporter.py +923 -0
- datacontract/export/exporter.py +3 -0
- datacontract/export/exporter_factory.py +17 -1
- datacontract/export/great_expectations_converter.py +55 -5
- datacontract/export/{html_export.py → html_exporter.py} +31 -20
- datacontract/export/markdown_converter.py +134 -5
- datacontract/export/mermaid_exporter.py +110 -0
- datacontract/export/odcs_v3_exporter.py +193 -149
- datacontract/export/protobuf_converter.py +163 -69
- datacontract/export/rdf_converter.py +2 -2
- datacontract/export/sodacl_converter.py +9 -1
- datacontract/export/spark_converter.py +31 -4
- datacontract/export/sql_converter.py +6 -2
- datacontract/export/sql_type_converter.py +124 -8
- datacontract/imports/avro_importer.py +63 -12
- datacontract/imports/csv_importer.py +111 -57
- datacontract/imports/excel_importer.py +1112 -0
- datacontract/imports/importer.py +16 -3
- datacontract/imports/importer_factory.py +17 -0
- datacontract/imports/json_importer.py +325 -0
- datacontract/imports/odcs_importer.py +2 -2
- datacontract/imports/odcs_v3_importer.py +367 -151
- datacontract/imports/protobuf_importer.py +264 -0
- datacontract/imports/spark_importer.py +117 -13
- datacontract/imports/sql_importer.py +32 -16
- datacontract/imports/unity_importer.py +84 -38
- datacontract/init/init_template.py +1 -1
- datacontract/integration/entropy_data.py +126 -0
- datacontract/lint/resolve.py +112 -23
- datacontract/lint/schema.py +24 -15
- datacontract/lint/urls.py +17 -3
- datacontract/model/data_contract_specification/__init__.py +1 -0
- datacontract/model/odcs.py +13 -0
- datacontract/model/run.py +3 -0
- datacontract/output/junit_test_results.py +3 -3
- datacontract/schemas/datacontract-1.1.0.init.yaml +1 -1
- datacontract/schemas/datacontract-1.2.0.init.yaml +91 -0
- datacontract/schemas/datacontract-1.2.0.schema.json +2029 -0
- datacontract/schemas/datacontract-1.2.1.init.yaml +91 -0
- datacontract/schemas/datacontract-1.2.1.schema.json +2058 -0
- datacontract/schemas/odcs-3.0.2.schema.json +2382 -0
- datacontract/schemas/odcs-3.1.0.schema.json +2809 -0
- datacontract/templates/datacontract.html +54 -3
- datacontract/templates/datacontract_odcs.html +685 -0
- datacontract/templates/index.html +5 -2
- datacontract/templates/partials/server.html +2 -0
- datacontract/templates/style/output.css +319 -145
- {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.40.dist-info}/METADATA +711 -433
- datacontract_cli-0.10.40.dist-info/RECORD +121 -0
- {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.40.dist-info}/WHEEL +1 -1
- {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.40.dist-info/licenses}/LICENSE +1 -1
- datacontract/export/csv_type_converter.py +0 -36
- datacontract/integration/datamesh_manager.py +0 -72
- datacontract/lint/lint.py +0 -142
- datacontract/lint/linters/description_linter.py +0 -35
- datacontract/lint/linters/field_pattern_linter.py +0 -34
- datacontract/lint/linters/field_reference_linter.py +0 -48
- datacontract/lint/linters/notice_period_linter.py +0 -55
- datacontract/lint/linters/quality_schema_linter.py +0 -52
- datacontract/lint/linters/valid_constraints_linter.py +0 -100
- datacontract/model/data_contract_specification.py +0 -327
- datacontract_cli-0.10.23.dist-info/RECORD +0 -113
- /datacontract/{lint/linters → output}/__init__.py +0 -0
- {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.40.dist-info}/entry_points.txt +0 -0
- {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.40.dist-info}/top_level.txt +0 -0
|
@@ -1,62 +1,71 @@
|
|
|
1
|
-
Metadata-Version: 2.
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
2
|
Name: datacontract-cli
|
|
3
|
-
Version: 0.10.
|
|
3
|
+
Version: 0.10.40
|
|
4
4
|
Summary: The datacontract CLI is an open source command-line tool for working with Data Contracts. It uses data contract YAML files to lint the data contract, connect to data sources and execute schema and quality tests, detect breaking changes, and export to different formats. The tool is written in Python. It can be used as a standalone CLI tool, in a CI/CD pipeline, or directly as a Python library.
|
|
5
5
|
Author-email: Jochen Christ <jochen.christ@innoq.com>, Stefan Negele <stefan.negele@innoq.com>, Simon Harrer <simon.harrer@innoq.com>
|
|
6
|
+
License-Expression: MIT
|
|
6
7
|
Project-URL: Homepage, https://cli.datacontract.com
|
|
7
8
|
Project-URL: Issues, https://github.com/datacontract/datacontract-cli/issues
|
|
8
9
|
Classifier: Programming Language :: Python :: 3
|
|
9
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
10
10
|
Classifier: Operating System :: OS Independent
|
|
11
|
-
Requires-Python:
|
|
11
|
+
Requires-Python: <3.13,>=3.10
|
|
12
12
|
Description-Content-Type: text/markdown
|
|
13
13
|
License-File: LICENSE
|
|
14
|
-
Requires-Dist: typer<0.
|
|
15
|
-
Requires-Dist: pydantic<2.
|
|
14
|
+
Requires-Dist: typer<0.20,>=0.15.1
|
|
15
|
+
Requires-Dist: pydantic<2.13.0,>=2.8.2
|
|
16
16
|
Requires-Dist: pyyaml~=6.0.1
|
|
17
17
|
Requires-Dist: requests<2.33,>=2.31
|
|
18
18
|
Requires-Dist: fastjsonschema<2.22.0,>=2.19.1
|
|
19
19
|
Requires-Dist: fastparquet<2025.0.0,>=2024.5.0
|
|
20
20
|
Requires-Dist: numpy<2.0.0,>=1.26.4
|
|
21
|
-
Requires-Dist: python-multipart
|
|
22
|
-
Requires-Dist: rich<
|
|
23
|
-
Requires-Dist: sqlglot<
|
|
21
|
+
Requires-Dist: python-multipart<1.0.0,>=0.0.20
|
|
22
|
+
Requires-Dist: rich<15.0,>=13.7
|
|
23
|
+
Requires-Dist: sqlglot<28.0.0,>=26.6.0
|
|
24
24
|
Requires-Dist: duckdb<2.0.0,>=1.0.0
|
|
25
|
-
Requires-Dist: soda-core-duckdb<3.
|
|
25
|
+
Requires-Dist: soda-core-duckdb<3.6.0,>=3.3.20
|
|
26
26
|
Requires-Dist: setuptools>=60
|
|
27
|
-
Requires-Dist: python-dotenv
|
|
28
|
-
Requires-Dist: boto3<
|
|
29
|
-
Requires-Dist: Jinja2
|
|
30
|
-
Requires-Dist: jinja_partials
|
|
27
|
+
Requires-Dist: python-dotenv<2.0.0,>=1.0.0
|
|
28
|
+
Requires-Dist: boto3<2.0.0,>=1.34.41
|
|
29
|
+
Requires-Dist: Jinja2<4.0.0,>=3.1.5
|
|
30
|
+
Requires-Dist: jinja_partials<1.0.0,>=0.2.1
|
|
31
|
+
Requires-Dist: datacontract-specification<2.0.0,>=1.2.3
|
|
32
|
+
Requires-Dist: open-data-contract-standard<4.0.0,>=3.1.0
|
|
31
33
|
Provides-Extra: avro
|
|
32
34
|
Requires-Dist: avro==1.12.0; extra == "avro"
|
|
33
35
|
Provides-Extra: bigquery
|
|
34
|
-
Requires-Dist: soda-core-bigquery<3.
|
|
36
|
+
Requires-Dist: soda-core-bigquery<3.6.0,>=3.3.20; extra == "bigquery"
|
|
35
37
|
Provides-Extra: csv
|
|
36
|
-
Requires-Dist: clevercsv>=0.8.2; extra == "csv"
|
|
37
38
|
Requires-Dist: pandas>=2.0.0; extra == "csv"
|
|
39
|
+
Provides-Extra: excel
|
|
40
|
+
Requires-Dist: openpyxl<4.0.0,>=3.1.5; extra == "excel"
|
|
38
41
|
Provides-Extra: databricks
|
|
39
|
-
Requires-Dist: soda-core-spark-df<3.
|
|
40
|
-
Requires-Dist: soda-core-spark[databricks]<3.
|
|
41
|
-
Requires-Dist: databricks-sql-connector<
|
|
42
|
-
Requires-Dist: databricks-sdk<0.
|
|
42
|
+
Requires-Dist: soda-core-spark-df<3.6.0,>=3.3.20; extra == "databricks"
|
|
43
|
+
Requires-Dist: soda-core-spark[databricks]<3.6.0,>=3.3.20; extra == "databricks"
|
|
44
|
+
Requires-Dist: databricks-sql-connector<4.2.0,>=3.7.0; extra == "databricks"
|
|
45
|
+
Requires-Dist: databricks-sdk<0.74.0; extra == "databricks"
|
|
46
|
+
Requires-Dist: pyspark<4.0.0,>=3.5.5; extra == "databricks"
|
|
43
47
|
Provides-Extra: iceberg
|
|
44
|
-
Requires-Dist: pyiceberg==0.
|
|
48
|
+
Requires-Dist: pyiceberg==0.10.0; extra == "iceberg"
|
|
45
49
|
Provides-Extra: kafka
|
|
46
50
|
Requires-Dist: datacontract-cli[avro]; extra == "kafka"
|
|
47
|
-
Requires-Dist: soda-core-spark-df<3.
|
|
51
|
+
Requires-Dist: soda-core-spark-df<3.6.0,>=3.3.20; extra == "kafka"
|
|
52
|
+
Requires-Dist: pyspark<4.0.0,>=3.5.5; extra == "kafka"
|
|
48
53
|
Provides-Extra: postgres
|
|
49
|
-
Requires-Dist: soda-core-postgres<3.
|
|
54
|
+
Requires-Dist: soda-core-postgres<3.6.0,>=3.3.20; extra == "postgres"
|
|
50
55
|
Provides-Extra: s3
|
|
51
|
-
Requires-Dist: s3fs
|
|
52
|
-
Requires-Dist: aiobotocore<2.
|
|
56
|
+
Requires-Dist: s3fs<2026.0.0,>=2025.2.0; extra == "s3"
|
|
57
|
+
Requires-Dist: aiobotocore<2.26.0,>=2.17.0; extra == "s3"
|
|
53
58
|
Provides-Extra: snowflake
|
|
54
|
-
Requires-Dist: snowflake-connector-python[pandas]<
|
|
55
|
-
Requires-Dist: soda-core-snowflake<3.
|
|
59
|
+
Requires-Dist: snowflake-connector-python[pandas]<4.1,>=3.6; extra == "snowflake"
|
|
60
|
+
Requires-Dist: soda-core-snowflake<3.6.0,>=3.3.20; extra == "snowflake"
|
|
56
61
|
Provides-Extra: sqlserver
|
|
57
|
-
Requires-Dist: soda-core-sqlserver<3.
|
|
62
|
+
Requires-Dist: soda-core-sqlserver<3.6.0,>=3.3.20; extra == "sqlserver"
|
|
63
|
+
Provides-Extra: oracle
|
|
64
|
+
Requires-Dist: soda-core-oracle<3.6.0,>=3.3.20; extra == "oracle"
|
|
65
|
+
Provides-Extra: athena
|
|
66
|
+
Requires-Dist: soda-core-athena<3.6.0,>=3.3.20; extra == "athena"
|
|
58
67
|
Provides-Extra: trino
|
|
59
|
-
Requires-Dist: soda-core-trino<3.
|
|
68
|
+
Requires-Dist: soda-core-trino<3.6.0,>=3.3.20; extra == "trino"
|
|
60
69
|
Provides-Extra: dbt
|
|
61
70
|
Requires-Dist: dbt-core>=1.8.0; extra == "dbt"
|
|
62
71
|
Provides-Extra: dbml
|
|
@@ -66,23 +75,27 @@ Requires-Dist: pyarrow>=18.1.0; extra == "parquet"
|
|
|
66
75
|
Provides-Extra: rdf
|
|
67
76
|
Requires-Dist: rdflib==7.0.0; extra == "rdf"
|
|
68
77
|
Provides-Extra: api
|
|
69
|
-
Requires-Dist: fastapi==0.
|
|
70
|
-
Requires-Dist: uvicorn==0.
|
|
78
|
+
Requires-Dist: fastapi==0.116.1; extra == "api"
|
|
79
|
+
Requires-Dist: uvicorn==0.38.0; extra == "api"
|
|
80
|
+
Provides-Extra: protobuf
|
|
81
|
+
Requires-Dist: grpcio-tools>=1.53; extra == "protobuf"
|
|
71
82
|
Provides-Extra: all
|
|
72
|
-
Requires-Dist: datacontract-cli[api,bigquery,csv,databricks,dbml,dbt,iceberg,kafka,parquet,postgres,rdf,s3,snowflake,sqlserver,trino]; extra == "all"
|
|
83
|
+
Requires-Dist: datacontract-cli[api,athena,bigquery,csv,databricks,dbml,dbt,excel,iceberg,kafka,oracle,parquet,postgres,protobuf,rdf,s3,snowflake,sqlserver,trino]; extra == "all"
|
|
73
84
|
Provides-Extra: dev
|
|
74
85
|
Requires-Dist: datacontract-cli[all]; extra == "dev"
|
|
75
86
|
Requires-Dist: httpx==0.28.1; extra == "dev"
|
|
76
87
|
Requires-Dist: kafka-python; extra == "dev"
|
|
77
|
-
Requires-Dist:
|
|
88
|
+
Requires-Dist: minio==7.2.17; extra == "dev"
|
|
89
|
+
Requires-Dist: moto==5.1.13; extra == "dev"
|
|
78
90
|
Requires-Dist: pandas>=2.1.0; extra == "dev"
|
|
79
|
-
Requires-Dist: pre-commit<4.
|
|
91
|
+
Requires-Dist: pre-commit<4.5.0,>=3.7.1; extra == "dev"
|
|
80
92
|
Requires-Dist: pytest; extra == "dev"
|
|
81
93
|
Requires-Dist: pytest-xdist; extra == "dev"
|
|
82
|
-
Requires-Dist: pymssql==2.3.
|
|
94
|
+
Requires-Dist: pymssql==2.3.9; extra == "dev"
|
|
83
95
|
Requires-Dist: ruff; extra == "dev"
|
|
84
|
-
Requires-Dist: testcontainers[kafka,minio,mssql,postgres]==4.
|
|
85
|
-
Requires-Dist: trino==0.
|
|
96
|
+
Requires-Dist: testcontainers[kafka,minio,mssql,postgres]==4.13.3; extra == "dev"
|
|
97
|
+
Requires-Dist: trino==0.336.0; extra == "dev"
|
|
98
|
+
Dynamic: license-file
|
|
86
99
|
|
|
87
100
|
# Data Contract CLI
|
|
88
101
|
|
|
@@ -94,7 +107,7 @@ Requires-Dist: trino==0.332.0; extra == "dev"
|
|
|
94
107
|
<a href="https://datacontract.com/slack" rel="nofollow"><img src="https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social" alt="Slack Status" data-canonical-src="https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social" style="max-width: 100%;"></a>
|
|
95
108
|
</p>
|
|
96
109
|
|
|
97
|
-
The `datacontract` CLI is
|
|
110
|
+
The `datacontract` CLI is a popular and [recognized](https://www.thoughtworks.com/en-de/radar/tools/summary/data-contract-cli) open-source command-line tool for working with data contracts.
|
|
98
111
|
It uses data contract YAML files as [Data Contract Specification](https://datacontract.com/) or [ODCS](https://bitol-io.github.io/open-data-contract-standard/latest/) to lint the data contract, connect to data sources and execute schema and quality tests, detect breaking changes, and export to different formats. The tool is written in Python. It can be used as a standalone CLI tool, in a CI/CD pipeline, or directly as a Python library.
|
|
99
112
|
|
|
100
113
|

|
|
@@ -109,9 +122,9 @@ We have a _servers_ section with endpoint details to the S3 bucket, _models_ for
|
|
|
109
122
|
|
|
110
123
|
This data contract contains all information to connect to S3 and check that the actual data meets the defined schema and quality requirements. We can use this information to test if the actual data product in S3 is compliant to the data contract.
|
|
111
124
|
|
|
112
|
-
Let's use [
|
|
125
|
+
Let's use [uv](https://docs.astral.sh/uv/) to install the CLI (or use the [Docker image](#docker)),
|
|
113
126
|
```bash
|
|
114
|
-
$
|
|
127
|
+
$ uv tool install --python python3.11 'datacontract-cli[all]'
|
|
115
128
|
```
|
|
116
129
|
|
|
117
130
|
|
|
@@ -206,9 +219,15 @@ $ datacontract export --format odcs datacontract.yaml --output odcs.yaml
|
|
|
206
219
|
# import ODCS to data contract
|
|
207
220
|
$ datacontract import --format odcs odcs.yaml --output datacontract.yaml
|
|
208
221
|
|
|
209
|
-
# import sql (other formats: avro, glue, bigquery, jsonschema ...)
|
|
222
|
+
# import sql (other formats: avro, glue, bigquery, jsonschema, excel ...)
|
|
210
223
|
$ datacontract import --format sql --source my-ddl.sql --dialect postgres --output datacontract.yaml
|
|
211
224
|
|
|
225
|
+
# import from Excel template
|
|
226
|
+
$ datacontract import --format excel --source odcs.xlsx --output datacontract.yaml
|
|
227
|
+
|
|
228
|
+
# export to Excel template
|
|
229
|
+
$ datacontract export --format excel --output odcs.xlsx datacontract.yaml
|
|
230
|
+
|
|
212
231
|
# find differences between two data contracts
|
|
213
232
|
$ datacontract diff datacontract-v1.yaml datacontract-v2.yaml
|
|
214
233
|
|
|
@@ -241,6 +260,14 @@ if not run.has_passed():
|
|
|
241
260
|
|
|
242
261
|
Choose the most appropriate installation method for your needs:
|
|
243
262
|
|
|
263
|
+
### uv
|
|
264
|
+
|
|
265
|
+
If you have [uv](https://docs.astral.sh/uv/) installed, you can run datacontract-cli directly without installing:
|
|
266
|
+
|
|
267
|
+
```
|
|
268
|
+
uv run --with 'datacontract-cli[all]' datacontract --version
|
|
269
|
+
```
|
|
270
|
+
|
|
244
271
|
### pip
|
|
245
272
|
Python 3.10, 3.11, and 3.12 are supported. We recommend to use Python 3.11.
|
|
246
273
|
|
|
@@ -302,6 +329,7 @@ A list of available extras:
|
|
|
302
329
|
|
|
303
330
|
| Dependency | Installation Command |
|
|
304
331
|
|-------------------------|--------------------------------------------|
|
|
332
|
+
| Amazon Athena | `pip install datacontract-cli[athena]` |
|
|
305
333
|
| Avro Support | `pip install datacontract-cli[avro]` |
|
|
306
334
|
| Google BigQuery | `pip install datacontract-cli[bigquery]` |
|
|
307
335
|
| Databricks Integration | `pip install datacontract-cli[databricks]` |
|
|
@@ -317,7 +345,7 @@ A list of available extras:
|
|
|
317
345
|
| Parquet | `pip install datacontract-cli[parquet]` |
|
|
318
346
|
| RDF | `pip install datacontract-cli[rdf]` |
|
|
319
347
|
| API (run as web server) | `pip install datacontract-cli[api]` |
|
|
320
|
-
|
|
348
|
+
| protobuf | `pip install datacontract-cli[protobuf]` |
|
|
321
349
|
|
|
322
350
|
|
|
323
351
|
## Documentation
|
|
@@ -338,47 +366,46 @@ Commands
|
|
|
338
366
|
|
|
339
367
|
### init
|
|
340
368
|
```
|
|
341
|
-
|
|
342
|
-
Usage: datacontract init [OPTIONS] [LOCATION]
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
|
|
346
|
-
╭─ Arguments
|
|
347
|
-
│ location [LOCATION] The location
|
|
348
|
-
│
|
|
349
|
-
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
│ --
|
|
353
|
-
│ [default:
|
|
354
|
-
│
|
|
355
|
-
|
|
356
|
-
│ datacontract.yaml │
|
|
357
|
-
│ [default: no-overwrite] │
|
|
358
|
-
│ --help Show this message and exit. │
|
|
359
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
369
|
+
|
|
370
|
+
Usage: datacontract init [OPTIONS] [LOCATION]
|
|
371
|
+
|
|
372
|
+
Create an empty data contract.
|
|
373
|
+
|
|
374
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
375
|
+
│ location [LOCATION] The location of the data contract file to create. │
|
|
376
|
+
│ [default: datacontract.yaml] │
|
|
377
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
378
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
379
|
+
│ --template TEXT URL of a template or data contract [default: None] │
|
|
380
|
+
│ --overwrite --no-overwrite Replace the existing datacontract.yaml │
|
|
381
|
+
│ [default: no-overwrite] │
|
|
382
|
+
│ --help Show this message and exit. │
|
|
383
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
360
384
|
|
|
361
385
|
```
|
|
362
386
|
|
|
363
387
|
### lint
|
|
364
388
|
```
|
|
365
|
-
|
|
366
|
-
Usage: datacontract lint [OPTIONS] [LOCATION]
|
|
367
|
-
|
|
368
|
-
Validate that the datacontract.yaml is correctly formatted.
|
|
369
|
-
|
|
370
|
-
╭─ Arguments
|
|
371
|
-
│ location [LOCATION] The location (url or path) of the data contract
|
|
372
|
-
│ yaml
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
│
|
|
377
|
-
│
|
|
378
|
-
│
|
|
379
|
-
│
|
|
380
|
-
│
|
|
381
|
-
|
|
389
|
+
|
|
390
|
+
Usage: datacontract lint [OPTIONS] [LOCATION]
|
|
391
|
+
|
|
392
|
+
Validate that the datacontract.yaml is correctly formatted.
|
|
393
|
+
|
|
394
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
395
|
+
│ location [LOCATION] The location (url or path) of the data contract yaml. │
|
|
396
|
+
│ [default: datacontract.yaml] │
|
|
397
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
398
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
399
|
+
│ --schema TEXT The location (url or path) of the Data Contract Specification │
|
|
400
|
+
│ JSON Schema │
|
|
401
|
+
│ [default: None] │
|
|
402
|
+
│ --output PATH Specify the file path where the test results should be written │
|
|
403
|
+
│ to (e.g., './test-results/TEST-datacontract.xml'). If no path is │
|
|
404
|
+
│ provided, the output will be printed to stdout. │
|
|
405
|
+
│ [default: None] │
|
|
406
|
+
│ --output-format [junit] The target format for the test results. [default: None] │
|
|
407
|
+
│ --help Show this message and exit. │
|
|
408
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
382
409
|
|
|
383
410
|
```
|
|
384
411
|
|
|
@@ -394,31 +421,40 @@ Commands
|
|
|
394
421
|
│ [default: datacontract.yaml] │
|
|
395
422
|
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
396
423
|
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
397
|
-
│ --schema
|
|
398
|
-
│
|
|
399
|
-
│
|
|
400
|
-
│
|
|
401
|
-
│
|
|
402
|
-
│
|
|
403
|
-
│
|
|
404
|
-
│
|
|
405
|
-
│
|
|
406
|
-
│
|
|
407
|
-
│
|
|
408
|
-
│
|
|
409
|
-
│
|
|
410
|
-
│
|
|
411
|
-
│
|
|
412
|
-
│
|
|
413
|
-
│
|
|
414
|
-
│
|
|
415
|
-
│
|
|
416
|
-
│ --
|
|
417
|
-
│
|
|
418
|
-
│
|
|
419
|
-
│
|
|
420
|
-
│
|
|
421
|
-
|
|
424
|
+
│ --schema TEXT The location (url or path) of │
|
|
425
|
+
│ the Data Contract Specification │
|
|
426
|
+
│ JSON Schema │
|
|
427
|
+
│ [default: None] │
|
|
428
|
+
│ --server TEXT The server configuration to run │
|
|
429
|
+
│ the schema and quality tests. │
|
|
430
|
+
│ Use the key of the server object │
|
|
431
|
+
│ in the data contract yaml file │
|
|
432
|
+
│ to refer to a server, e.g., │
|
|
433
|
+
│ `production`, or `all` for all │
|
|
434
|
+
│ servers (default). │
|
|
435
|
+
│ [default: all] │
|
|
436
|
+
│ --publish-test-results --no-publish-test-results Publish the results after the │
|
|
437
|
+
│ test │
|
|
438
|
+
│ [default: │
|
|
439
|
+
│ no-publish-test-results] │
|
|
440
|
+
│ --publish TEXT DEPRECATED. The url to publish │
|
|
441
|
+
│ the results after the test. │
|
|
442
|
+
│ [default: None] │
|
|
443
|
+
│ --output PATH Specify the file path where the │
|
|
444
|
+
│ test results should be written │
|
|
445
|
+
│ to (e.g., │
|
|
446
|
+
│ './test-results/TEST-datacontra… │
|
|
447
|
+
│ [default: None] │
|
|
448
|
+
│ --output-format [junit] The target format for the test │
|
|
449
|
+
│ results. │
|
|
450
|
+
│ [default: None] │
|
|
451
|
+
│ --logs --no-logs Print logs [default: no-logs] │
|
|
452
|
+
│ --ssl-verification --no-ssl-verification SSL verification when publishing │
|
|
453
|
+
│ the data contract. │
|
|
454
|
+
│ [default: ssl-verification] │
|
|
455
|
+
│ --help Show this message and exit. │
|
|
456
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
457
|
+
|
|
422
458
|
```
|
|
423
459
|
|
|
424
460
|
Data Contract CLI connects to a data source and runs schema and quality tests to verify that the data contract is valid.
|
|
@@ -438,9 +474,11 @@ Credentials are provided with environment variables.
|
|
|
438
474
|
Supported server types:
|
|
439
475
|
|
|
440
476
|
- [s3](#S3)
|
|
477
|
+
- [athena](#athena)
|
|
441
478
|
- [bigquery](#bigquery)
|
|
442
479
|
- [azure](#azure)
|
|
443
480
|
- [sqlserver](#sqlserver)
|
|
481
|
+
- [oracle](#oracle)
|
|
444
482
|
- [databricks](#databricks)
|
|
445
483
|
- [databricks (programmatic)](#databricks-programmatic)
|
|
446
484
|
- [dataframe (programmatic)](#dataframe-programmatic)
|
|
@@ -448,6 +486,7 @@ Supported server types:
|
|
|
448
486
|
- [kafka](#kafka)
|
|
449
487
|
- [postgres](#postgres)
|
|
450
488
|
- [trino](#trino)
|
|
489
|
+
- [api](#api)
|
|
451
490
|
- [local](#local)
|
|
452
491
|
|
|
453
492
|
Supported formats:
|
|
@@ -507,6 +546,41 @@ servers:
|
|
|
507
546
|
| `DATACONTRACT_S3_SESSION_TOKEN` | `AQoDYXdzEJr...` | AWS temporary session token (optional) |
|
|
508
547
|
|
|
509
548
|
|
|
549
|
+
#### Athena
|
|
550
|
+
|
|
551
|
+
Data Contract CLI can test data in AWS Athena stored in S3.
|
|
552
|
+
Supports different file formats, such as Iceberg, Parquet, JSON, CSV...
|
|
553
|
+
|
|
554
|
+
##### Example
|
|
555
|
+
|
|
556
|
+
datacontract.yaml
|
|
557
|
+
```yaml
|
|
558
|
+
servers:
|
|
559
|
+
athena:
|
|
560
|
+
type: athena
|
|
561
|
+
catalog: awsdatacatalog # awsdatacatalog is the default setting
|
|
562
|
+
schema: icebergdemodb # in Athena, this is called "database"
|
|
563
|
+
regionName: eu-central-1
|
|
564
|
+
stagingDir: s3://my-bucket/athena-results/
|
|
565
|
+
models:
|
|
566
|
+
my_table: # corresponds to a table of view name
|
|
567
|
+
type: table
|
|
568
|
+
fields:
|
|
569
|
+
my_column_1: # corresponds to a column
|
|
570
|
+
type: string
|
|
571
|
+
config:
|
|
572
|
+
physicalType: varchar
|
|
573
|
+
```
|
|
574
|
+
|
|
575
|
+
##### Environment Variables
|
|
576
|
+
|
|
577
|
+
| Environment Variable | Example | Description |
|
|
578
|
+
|-------------------------------------|---------------------------------|----------------------------------------|
|
|
579
|
+
| `DATACONTRACT_S3_REGION` | `eu-central-1` | Region of Athena service |
|
|
580
|
+
| `DATACONTRACT_S3_ACCESS_KEY_ID` | `AKIAXV5Q5QABCDEFGH` | AWS Access Key ID |
|
|
581
|
+
| `DATACONTRACT_S3_SECRET_ACCESS_KEY` | `93S7LRrJcqLaaaa/XXXXXXXXXXXXX` | AWS Secret Access Key |
|
|
582
|
+
| `DATACONTRACT_S3_SESSION_TOKEN` | `AQoDYXdzEJr...` | AWS temporary session token (optional) |
|
|
583
|
+
|
|
510
584
|
|
|
511
585
|
#### Google Cloud Storage (GCS)
|
|
512
586
|
|
|
@@ -628,6 +702,55 @@ models:
|
|
|
628
702
|
|
|
629
703
|
|
|
630
704
|
|
|
705
|
+
#### Oracle
|
|
706
|
+
|
|
707
|
+
Data Contract CLI can test data in Oracle Database.
|
|
708
|
+
|
|
709
|
+
##### Example
|
|
710
|
+
|
|
711
|
+
datacontract.yaml
|
|
712
|
+
```yaml
|
|
713
|
+
servers:
|
|
714
|
+
oracle:
|
|
715
|
+
type: oracle
|
|
716
|
+
host: localhost
|
|
717
|
+
port: 1521
|
|
718
|
+
service_name: ORCL
|
|
719
|
+
schema: ADMIN
|
|
720
|
+
models:
|
|
721
|
+
my_table_1: # corresponds to a table
|
|
722
|
+
type: table
|
|
723
|
+
fields:
|
|
724
|
+
my_column_1: # corresponds to a column
|
|
725
|
+
type: decimal
|
|
726
|
+
description: Decimal number
|
|
727
|
+
my_column_2: # corresponds to another column
|
|
728
|
+
type: text
|
|
729
|
+
description: Unicode text string
|
|
730
|
+
config:
|
|
731
|
+
oracleType: NVARCHAR2 # optional: can be used to explicitly define the type used in the database
|
|
732
|
+
# if not set a default mapping will be used
|
|
733
|
+
```
|
|
734
|
+
|
|
735
|
+
##### Environment Variables
|
|
736
|
+
|
|
737
|
+
These environment variable specify the credentials used by the datacontract tool to connect to the database.
|
|
738
|
+
If you've started the database from a container, e.g. [oracle-free](https://hub.docker.com/r/gvenzl/oracle-free)
|
|
739
|
+
this should match either `system` and what you specified as `ORACLE_PASSWORD` on the container or
|
|
740
|
+
alternatively what you've specified under `APP_USER` and `APP_USER_PASSWORD`.
|
|
741
|
+
If you require thick mode to connect to the database, you need to have an Oracle Instant Client
|
|
742
|
+
installed on the system and specify the path to the installation within the environment variable
|
|
743
|
+
`DATACONTRACT_ORACLE_CLIENT_DIR`.
|
|
744
|
+
|
|
745
|
+
| Environment Variable | Example | Description |
|
|
746
|
+
|--------------------------------------------------|--------------------|--------------------------------------------|
|
|
747
|
+
| `DATACONTRACT_ORACLE_USERNAME` | `system` | Username |
|
|
748
|
+
| `DATACONTRACT_ORACLE_PASSWORD` | `0x162e53` | Password |
|
|
749
|
+
| `DATACONTRACT_ORACLE_CLIENT_DIR` | `C:\oracle\client` | Path to Oracle Instant Client installation |
|
|
750
|
+
|
|
751
|
+
|
|
752
|
+
|
|
753
|
+
|
|
631
754
|
#### Databricks
|
|
632
755
|
|
|
633
756
|
Works with Unity Catalog and Hive metastore.
|
|
@@ -682,19 +805,37 @@ models:
|
|
|
682
805
|
fields: ...
|
|
683
806
|
```
|
|
684
807
|
|
|
685
|
-
|
|
686
|
-
```python
|
|
687
|
-
%pip install datacontract-cli[databricks]
|
|
688
|
-
dbutils.library.restartPython()
|
|
808
|
+
##### Installing on Databricks Compute
|
|
689
809
|
|
|
690
|
-
|
|
810
|
+
**Important:** When using Databricks LTS ML runtimes (15.4, 16.4), installing via `%pip install` in notebooks can issues.
|
|
691
811
|
|
|
692
|
-
|
|
693
|
-
|
|
694
|
-
|
|
695
|
-
|
|
696
|
-
|
|
697
|
-
|
|
812
|
+
**Recommended approach:** Use Databricks' native library management instead:
|
|
813
|
+
|
|
814
|
+
1. **Create or configure your compute cluster:**
|
|
815
|
+
- Navigate to **Compute** in the Databricks workspace
|
|
816
|
+
- Create a new cluster or select an existing one
|
|
817
|
+
- Go to the **Libraries** tab
|
|
818
|
+
|
|
819
|
+
2. **Add the datacontract-cli library:**
|
|
820
|
+
- Click **Install new**
|
|
821
|
+
- Select **PyPI** as the library source
|
|
822
|
+
- Enter package name: `datacontract-cli[databricks]`
|
|
823
|
+
- Click **Install**
|
|
824
|
+
|
|
825
|
+
3. **Restart the cluster** to apply the library installation
|
|
826
|
+
|
|
827
|
+
4. **Use in your notebook** without additional installation:
|
|
828
|
+
```python
|
|
829
|
+
from datacontract.data_contract import DataContract
|
|
830
|
+
|
|
831
|
+
data_contract = DataContract(
|
|
832
|
+
data_contract_file="/Volumes/acme_catalog_prod/orders_latest/datacontract/datacontract.yaml",
|
|
833
|
+
spark=spark)
|
|
834
|
+
run = data_contract.test()
|
|
835
|
+
run.result
|
|
836
|
+
```
|
|
837
|
+
|
|
838
|
+
Databricks' library management properly resolves dependencies during cluster initialization, rather than at runtime in the notebook.
|
|
698
839
|
|
|
699
840
|
#### Dataframe (programmatic)
|
|
700
841
|
|
|
@@ -874,68 +1015,117 @@ models:
|
|
|
874
1015
|
| `DATACONTRACT_TRINO_PASSWORD` | `mysecretpassword` | Password |
|
|
875
1016
|
|
|
876
1017
|
|
|
1018
|
+
#### API
|
|
1019
|
+
|
|
1020
|
+
Data Contract CLI can test APIs that return data in JSON format.
|
|
1021
|
+
Currently, only GET requests are supported.
|
|
1022
|
+
|
|
1023
|
+
##### Example
|
|
1024
|
+
|
|
1025
|
+
datacontract.yaml
|
|
1026
|
+
```yaml
|
|
1027
|
+
servers:
|
|
1028
|
+
api:
|
|
1029
|
+
type: "api"
|
|
1030
|
+
location: "https://api.example.com/path"
|
|
1031
|
+
delimiter: none # new_line, array, or none (default)
|
|
1032
|
+
|
|
1033
|
+
models:
|
|
1034
|
+
my_object: # corresponds to the root element of the JSON response
|
|
1035
|
+
type: object
|
|
1036
|
+
fields:
|
|
1037
|
+
field1:
|
|
1038
|
+
type: string
|
|
1039
|
+
fields2:
|
|
1040
|
+
type: number
|
|
1041
|
+
```
|
|
1042
|
+
|
|
1043
|
+
##### Environment Variables
|
|
1044
|
+
|
|
1045
|
+
| Environment Variable | Example | Description |
|
|
1046
|
+
|-----------------------------------------|------------------|---------------------------------------------------|
|
|
1047
|
+
| `DATACONTRACT_API_HEADER_AUTHORIZATION` | `Bearer <token>` | The value for the `authorization` header. Optional. |
|
|
1048
|
+
|
|
1049
|
+
|
|
1050
|
+
#### Local
|
|
1051
|
+
|
|
1052
|
+
Data Contract CLI can test local files in parquet, json, csv, or delta format.
|
|
1053
|
+
|
|
1054
|
+
##### Example
|
|
1055
|
+
|
|
1056
|
+
datacontract.yaml
|
|
1057
|
+
```yaml
|
|
1058
|
+
servers:
|
|
1059
|
+
local:
|
|
1060
|
+
type: local
|
|
1061
|
+
path: ./*.parquet
|
|
1062
|
+
format: parquet
|
|
1063
|
+
models:
|
|
1064
|
+
my_table_1: # corresponds to a table
|
|
1065
|
+
type: table
|
|
1066
|
+
fields:
|
|
1067
|
+
my_column_1: # corresponds to a column
|
|
1068
|
+
type: varchar
|
|
1069
|
+
my_column_2: # corresponds to a column
|
|
1070
|
+
type: string
|
|
1071
|
+
```
|
|
1072
|
+
|
|
877
1073
|
|
|
878
1074
|
### export
|
|
879
1075
|
```
|
|
880
|
-
|
|
881
|
-
Usage: datacontract export [OPTIONS] [LOCATION]
|
|
882
|
-
|
|
883
|
-
Convert data contract to a specific format. Saves to file specified by
|
|
884
|
-
|
|
885
|
-
|
|
886
|
-
╭─ Arguments
|
|
887
|
-
│ location [LOCATION] The location (url or path) of the data contract
|
|
888
|
-
│ yaml
|
|
889
|
-
|
|
890
|
-
|
|
891
|
-
|
|
892
|
-
│
|
|
893
|
-
│
|
|
894
|
-
│
|
|
895
|
-
│
|
|
896
|
-
│
|
|
897
|
-
│
|
|
898
|
-
│
|
|
899
|
-
│
|
|
900
|
-
│
|
|
901
|
-
│
|
|
902
|
-
│
|
|
903
|
-
│
|
|
904
|
-
│
|
|
905
|
-
│
|
|
906
|
-
│
|
|
907
|
-
│
|
|
908
|
-
│
|
|
909
|
-
│
|
|
910
|
-
│
|
|
911
|
-
│
|
|
912
|
-
│
|
|
913
|
-
│
|
|
914
|
-
│
|
|
915
|
-
│
|
|
916
|
-
│
|
|
917
|
-
│
|
|
918
|
-
│
|
|
919
|
-
│
|
|
920
|
-
│
|
|
921
|
-
│
|
|
922
|
-
│
|
|
923
|
-
|
|
924
|
-
|
|
925
|
-
│
|
|
926
|
-
|
|
927
|
-
|
|
928
|
-
|
|
929
|
-
│
|
|
930
|
-
│
|
|
931
|
-
|
|
932
|
-
|
|
933
|
-
│ --sql-server-type TEXT [sql] The server type to determine the sql │
|
|
934
|
-
│ dialect. By default, it uses 'auto' to │
|
|
935
|
-
│ automatically detect the sql dialect via the │
|
|
936
|
-
│ specified servers in the data contract. │
|
|
937
|
-
│ [default: auto] │
|
|
938
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1076
|
+
|
|
1077
|
+
Usage: datacontract export [OPTIONS] [LOCATION]
|
|
1078
|
+
|
|
1079
|
+
Convert data contract to a specific format. Saves to file specified by `output` option if present,
|
|
1080
|
+
otherwise prints to stdout.
|
|
1081
|
+
|
|
1082
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
1083
|
+
│ location [LOCATION] The location (url or path) of the data contract yaml. │
|
|
1084
|
+
│ [default: datacontract.yaml] │
|
|
1085
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1086
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1087
|
+
│ * --format [jsonschema|pydantic-model|sodacl|db The export format. [default: None] │
|
|
1088
|
+
│ t|dbt-sources|dbt-staging-sql|odcs|r [required] │
|
|
1089
|
+
│ df|avro|protobuf|great-expectations| │
|
|
1090
|
+
│ terraform|avro-idl|sql|sql-query|mer │
|
|
1091
|
+
│ maid|html|go|bigquery|dbml|spark|sql │
|
|
1092
|
+
│ alchemy|data-caterer|dcs|markdown|ic │
|
|
1093
|
+
│ eberg|custom|excel|dqx] │
|
|
1094
|
+
│ --output PATH Specify the file path where the │
|
|
1095
|
+
│ exported data will be saved. If no │
|
|
1096
|
+
│ path is provided, the output will be │
|
|
1097
|
+
│ printed to stdout. │
|
|
1098
|
+
│ [default: None] │
|
|
1099
|
+
│ --server TEXT The server name to export. │
|
|
1100
|
+
│ [default: None] │
|
|
1101
|
+
│ --model TEXT Use the key of the model in the data │
|
|
1102
|
+
│ contract yaml file to refer to a │
|
|
1103
|
+
│ model, e.g., `orders`, or `all` for │
|
|
1104
|
+
│ all models (default). │
|
|
1105
|
+
│ [default: all] │
|
|
1106
|
+
│ --schema TEXT The location (url or path) of the │
|
|
1107
|
+
│ Data Contract Specification JSON │
|
|
1108
|
+
│ Schema │
|
|
1109
|
+
│ [default: None] │
|
|
1110
|
+
│ --engine TEXT [engine] The engine used for great │
|
|
1111
|
+
│ expection run. │
|
|
1112
|
+
│ [default: None] │
|
|
1113
|
+
│ --template PATH The file path or URL of a template. │
|
|
1114
|
+
│ For Excel format: path/URL to custom │
|
|
1115
|
+
│ Excel template. For custom format: │
|
|
1116
|
+
│ path to Jinja template. │
|
|
1117
|
+
│ [default: None] │
|
|
1118
|
+
│ --help Show this message and exit. │
|
|
1119
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1120
|
+
╭─ RDF Options ────────────────────────────────────────────────────────────────────────────────────╮
|
|
1121
|
+
│ --rdf-base TEXT [rdf] The base URI used to generate the RDF graph. [default: None] │
|
|
1122
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1123
|
+
╭─ SQL Options ────────────────────────────────────────────────────────────────────────────────────╮
|
|
1124
|
+
│ --sql-server-type TEXT [sql] The server type to determine the sql dialect. By default, │
|
|
1125
|
+
│ it uses 'auto' to automatically detect the sql dialect via the │
|
|
1126
|
+
│ specified servers in the data contract. │
|
|
1127
|
+
│ [default: auto] │
|
|
1128
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
939
1129
|
|
|
940
1130
|
```
|
|
941
1131
|
|
|
@@ -946,35 +1136,50 @@ datacontract export --format html --output datacontract.html
|
|
|
946
1136
|
|
|
947
1137
|
Available export options:
|
|
948
1138
|
|
|
949
|
-
| Type | Description | Status
|
|
950
|
-
|
|
951
|
-
| `html` | Export to HTML | ✅
|
|
952
|
-
| `jsonschema` | Export to JSON Schema | ✅
|
|
953
|
-
| `odcs` | Export to Open Data Contract Standard (ODCS) V3 | ✅
|
|
954
|
-
| `sodacl` | Export to SodaCL quality checks in YAML format | ✅
|
|
955
|
-
| `dbt` | Export to dbt models in YAML format | ✅
|
|
956
|
-
| `dbt-sources` | Export to dbt sources in YAML format | ✅
|
|
957
|
-
| `dbt-staging-sql` | Export to dbt staging SQL models | ✅
|
|
958
|
-
| `rdf` | Export data contract to RDF representation in N3 format | ✅
|
|
959
|
-
| `avro` | Export to AVRO models | ✅
|
|
960
|
-
| `protobuf` | Export to Protobuf | ✅
|
|
961
|
-
| `terraform` | Export to terraform resources | ✅
|
|
962
|
-
| `sql` | Export to SQL DDL | ✅
|
|
963
|
-
| `sql-query` | Export to SQL Query | ✅
|
|
964
|
-
| `great-expectations` | Export to Great Expectations Suites in JSON Format | ✅
|
|
965
|
-
| `bigquery` | Export to BigQuery Schemas | ✅
|
|
966
|
-
| `go` | Export to Go types | ✅
|
|
967
|
-
| `pydantic-model` | Export to pydantic models | ✅
|
|
968
|
-
| `DBML` | Export to a DBML Diagram description | ✅
|
|
969
|
-
| `spark` | Export to a Spark StructType | ✅
|
|
970
|
-
| `sqlalchemy` | Export to SQLAlchemy Models | ✅
|
|
971
|
-
| `data-caterer` | Export to Data Caterer in YAML format | ✅
|
|
972
|
-
| `dcs` | Export to Data Contract Specification in YAML format | ✅
|
|
973
|
-
| `markdown` | Export to Markdown | ✅
|
|
1139
|
+
| Type | Description | Status |
|
|
1140
|
+
|----------------------|---------------------------------------------------------|---------|
|
|
1141
|
+
| `html` | Export to HTML | ✅ |
|
|
1142
|
+
| `jsonschema` | Export to JSON Schema | ✅ |
|
|
1143
|
+
| `odcs` | Export to Open Data Contract Standard (ODCS) V3 | ✅ |
|
|
1144
|
+
| `sodacl` | Export to SodaCL quality checks in YAML format | ✅ |
|
|
1145
|
+
| `dbt` | Export to dbt models in YAML format | ✅ |
|
|
1146
|
+
| `dbt-sources` | Export to dbt sources in YAML format | ✅ |
|
|
1147
|
+
| `dbt-staging-sql` | Export to dbt staging SQL models | ✅ |
|
|
1148
|
+
| `rdf` | Export data contract to RDF representation in N3 format | ✅ |
|
|
1149
|
+
| `avro` | Export to AVRO models | ✅ |
|
|
1150
|
+
| `protobuf` | Export to Protobuf | ✅ |
|
|
1151
|
+
| `terraform` | Export to terraform resources | ✅ |
|
|
1152
|
+
| `sql` | Export to SQL DDL | ✅ |
|
|
1153
|
+
| `sql-query` | Export to SQL Query | ✅ |
|
|
1154
|
+
| `great-expectations` | Export to Great Expectations Suites in JSON Format | ✅ |
|
|
1155
|
+
| `bigquery` | Export to BigQuery Schemas | ✅ |
|
|
1156
|
+
| `go` | Export to Go types | ✅ |
|
|
1157
|
+
| `pydantic-model` | Export to pydantic models | ✅ |
|
|
1158
|
+
| `DBML` | Export to a DBML Diagram description | ✅ |
|
|
1159
|
+
| `spark` | Export to a Spark StructType | ✅ |
|
|
1160
|
+
| `sqlalchemy` | Export to SQLAlchemy Models | ✅ |
|
|
1161
|
+
| `data-caterer` | Export to Data Caterer in YAML format | ✅ |
|
|
1162
|
+
| `dcs` | Export to Data Contract Specification in YAML format | ✅ |
|
|
1163
|
+
| `markdown` | Export to Markdown | ✅ |
|
|
974
1164
|
| `iceberg` | Export to an Iceberg JSON Schema Definition | partial |
|
|
975
|
-
| `
|
|
976
|
-
|
|
|
1165
|
+
| `excel` | Export to ODCS Excel Template | ✅ |
|
|
1166
|
+
| `custom` | Export to Custom format with Jinja | ✅ |
|
|
1167
|
+
| `dqx` | Export to DQX in YAML format | ✅ |
|
|
1168
|
+
| Missing something? | Please create an issue on GitHub | TBD |
|
|
1169
|
+
|
|
1170
|
+
#### SQL
|
|
1171
|
+
|
|
1172
|
+
The `export` function converts a given data contract into a SQL data definition language (DDL).
|
|
1173
|
+
|
|
1174
|
+
```shell
|
|
1175
|
+
datacontract export datacontract.yaml --format sql --output output.sql
|
|
1176
|
+
```
|
|
1177
|
+
|
|
1178
|
+
If using Databricks, and an error is thrown when trying to deploy the SQL DDLs with `variant` columns set the following properties.
|
|
977
1179
|
|
|
1180
|
+
```shell
|
|
1181
|
+
spark.conf.set(“spark.databricks.delta.schema.typeCheck.enabled”, “false”)
|
|
1182
|
+
```
|
|
978
1183
|
|
|
979
1184
|
#### Great Expectations
|
|
980
1185
|
|
|
@@ -982,7 +1187,7 @@ The `export` function transforms a specified data contract into a comprehensive
|
|
|
982
1187
|
If the contract includes multiple models, you need to specify the names of the model you wish to export.
|
|
983
1188
|
|
|
984
1189
|
```shell
|
|
985
|
-
datacontract
|
|
1190
|
+
datacontract export datacontract.yaml --format great-expectations --model orders
|
|
986
1191
|
```
|
|
987
1192
|
|
|
988
1193
|
The export creates a list of expectations by utilizing:
|
|
@@ -1007,7 +1212,7 @@ To further customize the export, the following optional arguments are available:
|
|
|
1007
1212
|
|
|
1008
1213
|
#### RDF
|
|
1009
1214
|
|
|
1010
|
-
The export function converts a given data contract into a RDF representation. You have the option to
|
|
1215
|
+
The `export` function converts a given data contract into a RDF representation. You have the option to
|
|
1011
1216
|
add a base_url which will be used as the default prefix to resolve relative IRIs inside the document.
|
|
1012
1217
|
|
|
1013
1218
|
```shell
|
|
@@ -1230,73 +1435,110 @@ FROM
|
|
|
1230
1435
|
{{ ref('orders') }}
|
|
1231
1436
|
```
|
|
1232
1437
|
|
|
1438
|
+
#### ODCS Excel Templace
|
|
1439
|
+
|
|
1440
|
+
The `export` function converts a data contract into an ODCS (Open Data Contract Standard) Excel template. This creates a user-friendly Excel spreadsheet that can be used for authoring, sharing, and managing data contracts using the familiar Excel interface.
|
|
1441
|
+
|
|
1442
|
+
```shell
|
|
1443
|
+
datacontract export --format excel --output datacontract.xlsx datacontract.yaml
|
|
1444
|
+
```
|
|
1445
|
+
|
|
1446
|
+
The Excel format enables:
|
|
1447
|
+
- **User-friendly authoring**: Create and edit data contracts in Excel's familiar interface
|
|
1448
|
+
- **Easy sharing**: Distribute data contracts as standard Excel files
|
|
1449
|
+
- **Collaboration**: Enable non-technical stakeholders to contribute to data contract definitions
|
|
1450
|
+
- **Round-trip conversion**: Import Excel templates back to YAML data contracts
|
|
1451
|
+
|
|
1452
|
+
For more information about the Excel template structure, visit the [ODCS Excel Template repository](https://github.com/datacontract/open-data-contract-standard-excel-template).
|
|
1453
|
+
|
|
1233
1454
|
### import
|
|
1234
1455
|
```
|
|
1235
|
-
|
|
1236
|
-
|
|
1237
|
-
|
|
1238
|
-
|
|
1239
|
-
|
|
1240
|
-
|
|
1241
|
-
|
|
1242
|
-
│
|
|
1243
|
-
│
|
|
1244
|
-
│
|
|
1245
|
-
│
|
|
1246
|
-
│
|
|
1247
|
-
│
|
|
1248
|
-
│
|
|
1249
|
-
│
|
|
1250
|
-
│
|
|
1251
|
-
│
|
|
1252
|
-
│ --
|
|
1253
|
-
│
|
|
1254
|
-
│
|
|
1255
|
-
│
|
|
1256
|
-
│
|
|
1257
|
-
│
|
|
1258
|
-
│
|
|
1259
|
-
│
|
|
1260
|
-
│
|
|
1261
|
-
│
|
|
1262
|
-
│
|
|
1263
|
-
│ --
|
|
1264
|
-
│
|
|
1265
|
-
│
|
|
1266
|
-
│
|
|
1267
|
-
│
|
|
1268
|
-
│
|
|
1269
|
-
│
|
|
1270
|
-
│
|
|
1271
|
-
│
|
|
1272
|
-
│
|
|
1273
|
-
│ --
|
|
1274
|
-
│
|
|
1275
|
-
│
|
|
1276
|
-
│
|
|
1277
|
-
│
|
|
1278
|
-
│
|
|
1279
|
-
│
|
|
1280
|
-
│
|
|
1281
|
-
│
|
|
1282
|
-
│
|
|
1283
|
-
│
|
|
1284
|
-
│
|
|
1285
|
-
│
|
|
1286
|
-
│
|
|
1287
|
-
│
|
|
1288
|
-
│ --
|
|
1289
|
-
│
|
|
1290
|
-
│
|
|
1291
|
-
│
|
|
1292
|
-
│
|
|
1293
|
-
│
|
|
1294
|
-
│ --
|
|
1295
|
-
│
|
|
1296
|
-
│
|
|
1297
|
-
│
|
|
1298
|
-
│
|
|
1299
|
-
|
|
1456
|
+
|
|
1457
|
+
Usage: datacontract import [OPTIONS]
|
|
1458
|
+
|
|
1459
|
+
Create a data contract from the given source location. Saves to file specified by `output` option
|
|
1460
|
+
if present, otherwise prints to stdout.
|
|
1461
|
+
|
|
1462
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1463
|
+
│ * --format [sql|avro|dbt|dbml|glue|jsonsc The format of the source file. │
|
|
1464
|
+
│ hema|json|bigquery|odcs|unity| [default: None] │
|
|
1465
|
+
│ spark|iceberg|parquet|csv|prot [required] │
|
|
1466
|
+
│ obuf|excel] │
|
|
1467
|
+
│ --output PATH Specify the file path where │
|
|
1468
|
+
│ the Data Contract will be │
|
|
1469
|
+
│ saved. If no path is provided, │
|
|
1470
|
+
│ the output will be printed to │
|
|
1471
|
+
│ stdout. │
|
|
1472
|
+
│ [default: None] │
|
|
1473
|
+
│ --source TEXT The path to the file that │
|
|
1474
|
+
│ should be imported. │
|
|
1475
|
+
│ [default: None] │
|
|
1476
|
+
│ --spec [datacontract_specification|od The format of the data │
|
|
1477
|
+
│ cs] contract to import. │
|
|
1478
|
+
│ [default: │
|
|
1479
|
+
│ datacontract_specification] │
|
|
1480
|
+
│ --dialect TEXT The SQL dialect to use when │
|
|
1481
|
+
│ importing SQL files, e.g., │
|
|
1482
|
+
│ postgres, tsql, bigquery. │
|
|
1483
|
+
│ [default: None] │
|
|
1484
|
+
│ --glue-table TEXT List of table ids to import │
|
|
1485
|
+
│ from the Glue Database (repeat │
|
|
1486
|
+
│ for multiple table ids, leave │
|
|
1487
|
+
│ empty for all tables in the │
|
|
1488
|
+
│ dataset). │
|
|
1489
|
+
│ [default: None] │
|
|
1490
|
+
│ --bigquery-project TEXT The bigquery project id. │
|
|
1491
|
+
│ [default: None] │
|
|
1492
|
+
│ --bigquery-dataset TEXT The bigquery dataset id. │
|
|
1493
|
+
│ [default: None] │
|
|
1494
|
+
│ --bigquery-table TEXT List of table ids to import │
|
|
1495
|
+
│ from the bigquery API (repeat │
|
|
1496
|
+
│ for multiple table ids, leave │
|
|
1497
|
+
│ empty for all tables in the │
|
|
1498
|
+
│ dataset). │
|
|
1499
|
+
│ [default: None] │
|
|
1500
|
+
│ --unity-table-full-name TEXT Full name of a table in the │
|
|
1501
|
+
│ unity catalog │
|
|
1502
|
+
│ [default: None] │
|
|
1503
|
+
│ --dbt-model TEXT List of models names to import │
|
|
1504
|
+
│ from the dbt manifest file │
|
|
1505
|
+
│ (repeat for multiple models │
|
|
1506
|
+
│ names, leave empty for all │
|
|
1507
|
+
│ models in the dataset). │
|
|
1508
|
+
│ [default: None] │
|
|
1509
|
+
│ --dbml-schema TEXT List of schema names to import │
|
|
1510
|
+
│ from the DBML file (repeat for │
|
|
1511
|
+
│ multiple schema names, leave │
|
|
1512
|
+
│ empty for all tables in the │
|
|
1513
|
+
│ file). │
|
|
1514
|
+
│ [default: None] │
|
|
1515
|
+
│ --dbml-table TEXT List of table names to import │
|
|
1516
|
+
│ from the DBML file (repeat for │
|
|
1517
|
+
│ multiple table names, leave │
|
|
1518
|
+
│ empty for all tables in the │
|
|
1519
|
+
│ file). │
|
|
1520
|
+
│ [default: None] │
|
|
1521
|
+
│ --iceberg-table TEXT Table name to assign to the │
|
|
1522
|
+
│ model created from the Iceberg │
|
|
1523
|
+
│ schema. │
|
|
1524
|
+
│ [default: None] │
|
|
1525
|
+
│ --template TEXT The location (url or path) of │
|
|
1526
|
+
│ the Data Contract │
|
|
1527
|
+
│ Specification Template │
|
|
1528
|
+
│ [default: None] │
|
|
1529
|
+
│ --schema TEXT The location (url or path) of │
|
|
1530
|
+
│ the Data Contract │
|
|
1531
|
+
│ Specification JSON Schema │
|
|
1532
|
+
│ [default: None] │
|
|
1533
|
+
│ --owner TEXT The owner or team responsible │
|
|
1534
|
+
│ for managing the data │
|
|
1535
|
+
│ contract. │
|
|
1536
|
+
│ [default: None] │
|
|
1537
|
+
│ --id TEXT The identifier for the the │
|
|
1538
|
+
│ data contract. │
|
|
1539
|
+
│ [default: None] │
|
|
1540
|
+
│ --help Show this message and exit. │
|
|
1541
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1300
1542
|
|
|
1301
1543
|
```
|
|
1302
1544
|
|
|
@@ -1312,21 +1554,23 @@ Available import options:
|
|
|
1312
1554
|
|
|
1313
1555
|
| Type | Description | Status |
|
|
1314
1556
|
|--------------------|------------------------------------------------|--------|
|
|
1315
|
-
| `sql` | Import from SQL DDL | ✅ |
|
|
1316
1557
|
| `avro` | Import from AVRO schemas | ✅ |
|
|
1317
|
-
| `glue` | Import from AWS Glue DataCatalog | ✅ |
|
|
1318
|
-
| `jsonschema` | Import from JSON Schemas | ✅ |
|
|
1319
1558
|
| `bigquery` | Import from BigQuery Schemas | ✅ |
|
|
1320
|
-
| `unity` | Import from Databricks Unity Catalog | partial |
|
|
1321
|
-
| `dbt` | Import from dbt models | ✅ |
|
|
1322
|
-
| `odcs` | Import from Open Data Contract Standard (ODCS) | ✅ |
|
|
1323
|
-
| `spark` | Import from Spark StructTypes | ✅ |
|
|
1324
|
-
| `dbml` | Import from DBML models | ✅ |
|
|
1325
1559
|
| `csv` | Import from CSV File | ✅ |
|
|
1326
|
-
| `
|
|
1560
|
+
| `dbml` | Import from DBML models | ✅ |
|
|
1561
|
+
| `dbt` | Import from dbt models | ✅ |
|
|
1562
|
+
| `excel` | Import from ODCS Excel Template | ✅ |
|
|
1563
|
+
| `glue` | Import from AWS Glue DataCatalog | ✅ |
|
|
1327
1564
|
| `iceberg` | Import from an Iceberg JSON Schema Definition | partial |
|
|
1328
|
-
| `
|
|
1329
|
-
|
|
|
1565
|
+
| `jsonschema` | Import from JSON Schemas | ✅ |
|
|
1566
|
+
| `odcs` | Import from Open Data Contract Standard (ODCS) | ✅ |
|
|
1567
|
+
| `parquet` | Import from Parquet File Metadata | ✅ |
|
|
1568
|
+
| `protobuf` | Import from Protobuf schemas | ✅ |
|
|
1569
|
+
| `spark` | Import from Spark StructTypes, Variant | ✅ |
|
|
1570
|
+
| `sql` | Import from SQL DDL | ✅ |
|
|
1571
|
+
| `unity` | Import from Databricks Unity Catalog | partial |
|
|
1572
|
+
| `excel` | Import from ODCS Excel Template | ✅ |
|
|
1573
|
+
| Missing something? | Please create an issue on GitHub | TBD |
|
|
1330
1574
|
|
|
1331
1575
|
|
|
1332
1576
|
#### ODCS
|
|
@@ -1367,16 +1611,21 @@ datacontract import --format bigquery --bigquery-project <project_id> --bigquery
|
|
|
1367
1611
|
```
|
|
1368
1612
|
|
|
1369
1613
|
#### Unity Catalog
|
|
1370
|
-
|
|
1371
1614
|
```bash
|
|
1372
1615
|
# Example import from a Unity Catalog JSON file
|
|
1373
1616
|
datacontract import --format unity --source my_unity_table.json
|
|
1374
1617
|
```
|
|
1375
1618
|
|
|
1376
1619
|
```bash
|
|
1377
|
-
# Example import single table from Unity Catalog via HTTP endpoint
|
|
1378
|
-
export
|
|
1379
|
-
export
|
|
1620
|
+
# Example import single table from Unity Catalog via HTTP endpoint using PAT
|
|
1621
|
+
export DATACONTRACT_DATABRICKS_SERVER_HOSTNAME="https://xyz.cloud.databricks.com"
|
|
1622
|
+
export DATACONTRACT_DATABRICKS_TOKEN=<token>
|
|
1623
|
+
datacontract import --format unity --unity-table-full-name <table_full_name>
|
|
1624
|
+
```
|
|
1625
|
+
Please Refer to [Databricks documentation](https://docs.databricks.com/aws/en/dev-tools/auth/unified-auth) on how to set up a profile
|
|
1626
|
+
```bash
|
|
1627
|
+
# Example import single table from Unity Catalog via HTTP endpoint using Profile
|
|
1628
|
+
export DATACONTRACT_DATABRICKS_PROFILE="my-profile"
|
|
1380
1629
|
datacontract import --format unity --unity-table-full-name <table_full_name>
|
|
1381
1630
|
```
|
|
1382
1631
|
|
|
@@ -1397,6 +1646,17 @@ datacontract import --format dbt --source <manifest_path> --dbt-model <model_nam
|
|
|
1397
1646
|
datacontract import --format dbt --source <manifest_path>
|
|
1398
1647
|
```
|
|
1399
1648
|
|
|
1649
|
+
### Excel
|
|
1650
|
+
|
|
1651
|
+
Importing from [ODCS Excel Template](https://github.com/datacontract/open-data-contract-standard-excel-template).
|
|
1652
|
+
|
|
1653
|
+
Examples:
|
|
1654
|
+
|
|
1655
|
+
```bash
|
|
1656
|
+
# Example import from ODCS Excel Template
|
|
1657
|
+
datacontract import --format excel --source odcs.xlsx
|
|
1658
|
+
```
|
|
1659
|
+
|
|
1400
1660
|
#### Glue
|
|
1401
1661
|
|
|
1402
1662
|
Importing from Glue reads the necessary Data directly off of the AWS API.
|
|
@@ -1416,14 +1676,31 @@ datacontract import --format glue --source <database_name>
|
|
|
1416
1676
|
|
|
1417
1677
|
#### Spark
|
|
1418
1678
|
|
|
1419
|
-
Importing from Spark table or view these must be created or accessible in the Spark context. Specify tables list in `source` parameter.
|
|
1420
|
-
|
|
1421
|
-
Example:
|
|
1679
|
+
Importing from Spark table or view these must be created or accessible in the Spark context. Specify tables list in `source` parameter. If the `source` tables are registered as tables in Databricks, and they have a table-level descriptions they will also be added to the Data Contract Specification.
|
|
1422
1680
|
|
|
1423
1681
|
```bash
|
|
1682
|
+
# Example: Import Spark table(s) from Spark context
|
|
1424
1683
|
datacontract import --format spark --source "users,orders"
|
|
1425
1684
|
```
|
|
1426
1685
|
|
|
1686
|
+
```bash
|
|
1687
|
+
# Example: Import Spark table
|
|
1688
|
+
DataContract.import_from_source("spark", "users")
|
|
1689
|
+
DataContract.import_from_source(format = "spark", source = "users")
|
|
1690
|
+
|
|
1691
|
+
# Example: Import Spark dataframe
|
|
1692
|
+
DataContract.import_from_source("spark", "users", dataframe = df_user)
|
|
1693
|
+
DataContract.import_from_source(format = "spark", source = "users", dataframe = df_user)
|
|
1694
|
+
|
|
1695
|
+
# Example: Import Spark table + table description
|
|
1696
|
+
DataContract.import_from_source("spark", "users", description = "description")
|
|
1697
|
+
DataContract.import_from_source(format = "spark", source = "users", description = "description")
|
|
1698
|
+
|
|
1699
|
+
# Example: Import Spark dataframe + table description
|
|
1700
|
+
DataContract.import_from_source("spark", "users", dataframe = df_user, description = "description")
|
|
1701
|
+
DataContract.import_from_source(format = "spark", source = "users", dataframe = df_user, description = "description")
|
|
1702
|
+
```
|
|
1703
|
+
|
|
1427
1704
|
#### DBML
|
|
1428
1705
|
|
|
1429
1706
|
Importing from DBML Documents.
|
|
@@ -1475,95 +1752,96 @@ Example:
|
|
|
1475
1752
|
datacontract import --format csv --source "test.csv"
|
|
1476
1753
|
```
|
|
1477
1754
|
|
|
1755
|
+
#### protobuf
|
|
1756
|
+
|
|
1757
|
+
Importing from protobuf File. Specify file in `source` parameter.
|
|
1758
|
+
|
|
1759
|
+
Example:
|
|
1760
|
+
|
|
1761
|
+
```bash
|
|
1762
|
+
datacontract import --format protobuf --source "test.proto"
|
|
1763
|
+
```
|
|
1764
|
+
|
|
1478
1765
|
|
|
1479
1766
|
### breaking
|
|
1480
1767
|
```
|
|
1481
|
-
|
|
1482
|
-
Usage: datacontract breaking [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1483
|
-
|
|
1484
|
-
Identifies breaking changes between data contracts. Prints to stdout.
|
|
1485
|
-
|
|
1486
|
-
╭─ Arguments
|
|
1487
|
-
│ * location_old TEXT The location (url or path) of the old data
|
|
1488
|
-
│
|
|
1489
|
-
│ [
|
|
1490
|
-
│
|
|
1491
|
-
│
|
|
1492
|
-
│
|
|
1493
|
-
|
|
1494
|
-
|
|
1495
|
-
|
|
1496
|
-
|
|
1497
|
-
│ --help Show this message and exit. │
|
|
1498
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1768
|
+
|
|
1769
|
+
Usage: datacontract breaking [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1770
|
+
|
|
1771
|
+
Identifies breaking changes between data contracts. Prints to stdout.
|
|
1772
|
+
|
|
1773
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
1774
|
+
│ * location_old TEXT The location (url or path) of the old data contract yaml. │
|
|
1775
|
+
│ [default: None] │
|
|
1776
|
+
│ [required] │
|
|
1777
|
+
│ * location_new TEXT The location (url or path) of the new data contract yaml. │
|
|
1778
|
+
│ [default: None] │
|
|
1779
|
+
│ [required] │
|
|
1780
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1781
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1782
|
+
│ --help Show this message and exit. │
|
|
1783
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1499
1784
|
|
|
1500
1785
|
```
|
|
1501
1786
|
|
|
1502
1787
|
### changelog
|
|
1503
1788
|
```
|
|
1504
|
-
|
|
1505
|
-
Usage: datacontract changelog [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1506
|
-
|
|
1507
|
-
Generate a changelog between data contracts. Prints to stdout.
|
|
1508
|
-
|
|
1509
|
-
╭─ Arguments
|
|
1510
|
-
│ * location_old TEXT The location (url or path) of the old data
|
|
1511
|
-
│
|
|
1512
|
-
│ [
|
|
1513
|
-
│
|
|
1514
|
-
│
|
|
1515
|
-
│
|
|
1516
|
-
|
|
1517
|
-
|
|
1518
|
-
|
|
1519
|
-
|
|
1520
|
-
│ --help Show this message and exit. │
|
|
1521
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1789
|
+
|
|
1790
|
+
Usage: datacontract changelog [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1791
|
+
|
|
1792
|
+
Generate a changelog between data contracts. Prints to stdout.
|
|
1793
|
+
|
|
1794
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
1795
|
+
│ * location_old TEXT The location (url or path) of the old data contract yaml. │
|
|
1796
|
+
│ [default: None] │
|
|
1797
|
+
│ [required] │
|
|
1798
|
+
│ * location_new TEXT The location (url or path) of the new data contract yaml. │
|
|
1799
|
+
│ [default: None] │
|
|
1800
|
+
│ [required] │
|
|
1801
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1802
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1803
|
+
│ --help Show this message and exit. │
|
|
1804
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1522
1805
|
|
|
1523
1806
|
```
|
|
1524
1807
|
|
|
1525
1808
|
### diff
|
|
1526
1809
|
```
|
|
1527
|
-
|
|
1528
|
-
Usage: datacontract diff [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1529
|
-
|
|
1530
|
-
PLACEHOLDER. Currently works as 'changelog' does.
|
|
1531
|
-
|
|
1532
|
-
╭─ Arguments
|
|
1533
|
-
│ * location_old TEXT The location (url or path) of the old data
|
|
1534
|
-
│
|
|
1535
|
-
│ [
|
|
1536
|
-
│
|
|
1537
|
-
│
|
|
1538
|
-
│
|
|
1539
|
-
|
|
1540
|
-
|
|
1541
|
-
|
|
1542
|
-
|
|
1543
|
-
│ --help Show this message and exit. │
|
|
1544
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1810
|
+
|
|
1811
|
+
Usage: datacontract diff [OPTIONS] LOCATION_OLD LOCATION_NEW
|
|
1812
|
+
|
|
1813
|
+
PLACEHOLDER. Currently works as 'changelog' does.
|
|
1814
|
+
|
|
1815
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
1816
|
+
│ * location_old TEXT The location (url or path) of the old data contract yaml. │
|
|
1817
|
+
│ [default: None] │
|
|
1818
|
+
│ [required] │
|
|
1819
|
+
│ * location_new TEXT The location (url or path) of the new data contract yaml. │
|
|
1820
|
+
│ [default: None] │
|
|
1821
|
+
│ [required] │
|
|
1822
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1823
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1824
|
+
│ --help Show this message and exit. │
|
|
1825
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1545
1826
|
|
|
1546
1827
|
```
|
|
1547
1828
|
|
|
1548
1829
|
### catalog
|
|
1549
1830
|
```
|
|
1550
|
-
|
|
1551
|
-
Usage: datacontract catalog [OPTIONS]
|
|
1552
|
-
|
|
1553
|
-
Create
|
|
1554
|
-
|
|
1555
|
-
╭─ Options
|
|
1556
|
-
│ --files TEXT Glob pattern for the data contract files to include in │
|
|
1557
|
-
│
|
|
1558
|
-
│ [default: *.yaml]
|
|
1559
|
-
│ --output TEXT Output directory for the catalog html files. │
|
|
1560
|
-
│
|
|
1561
|
-
│
|
|
1562
|
-
│
|
|
1563
|
-
|
|
1564
|
-
│ https://datacontract.com/datacontract.schema.json] │
|
|
1565
|
-
│ --help Show this message and exit. │
|
|
1566
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1831
|
+
|
|
1832
|
+
Usage: datacontract catalog [OPTIONS]
|
|
1833
|
+
|
|
1834
|
+
Create a html catalog of data contracts.
|
|
1835
|
+
|
|
1836
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1837
|
+
│ --files TEXT Glob pattern for the data contract files to include in the catalog. │
|
|
1838
|
+
│ Applies recursively to any subfolders. │
|
|
1839
|
+
│ [default: *.yaml] │
|
|
1840
|
+
│ --output TEXT Output directory for the catalog html files. [default: catalog/] │
|
|
1841
|
+
│ --schema TEXT The location (url or path) of the Data Contract Specification JSON Schema │
|
|
1842
|
+
│ [default: None] │
|
|
1843
|
+
│ --help Show this message and exit. │
|
|
1844
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1567
1845
|
|
|
1568
1846
|
```
|
|
1569
1847
|
|
|
@@ -1579,51 +1857,50 @@ datacontract catalog --files "*.odcs.yaml"
|
|
|
1579
1857
|
|
|
1580
1858
|
### publish
|
|
1581
1859
|
```
|
|
1582
|
-
|
|
1583
|
-
Usage: datacontract publish [OPTIONS] [LOCATION]
|
|
1584
|
-
|
|
1585
|
-
Publish the data contract to the Data Mesh Manager.
|
|
1586
|
-
|
|
1587
|
-
╭─ Arguments
|
|
1588
|
-
│ location [LOCATION] The location (url or path) of the data contract
|
|
1589
|
-
│ yaml
|
|
1590
|
-
|
|
1591
|
-
|
|
1592
|
-
|
|
1593
|
-
│
|
|
1594
|
-
│
|
|
1595
|
-
│
|
|
1596
|
-
│
|
|
1597
|
-
│ [default:
|
|
1598
|
-
│
|
|
1599
|
-
|
|
1600
|
-
│ publishing the data │
|
|
1601
|
-
│ contract. │
|
|
1602
|
-
│ [default: │
|
|
1603
|
-
│ ssl-verification] │
|
|
1604
|
-
│ --help Show this message and │
|
|
1605
|
-
│ exit. │
|
|
1606
|
-
╰──────────────────────────────────────────────────────────────────────────────╯
|
|
1860
|
+
|
|
1861
|
+
Usage: datacontract publish [OPTIONS] [LOCATION]
|
|
1862
|
+
|
|
1863
|
+
Publish the data contract to the Data Mesh Manager.
|
|
1864
|
+
|
|
1865
|
+
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
|
|
1866
|
+
│ location [LOCATION] The location (url or path) of the data contract yaml. │
|
|
1867
|
+
│ [default: datacontract.yaml] │
|
|
1868
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1869
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1870
|
+
│ --schema TEXT The location (url or path) of the Data │
|
|
1871
|
+
│ Contract Specification JSON Schema │
|
|
1872
|
+
│ [default: None] │
|
|
1873
|
+
│ --ssl-verification --no-ssl-verification SSL verification when publishing the data │
|
|
1874
|
+
│ contract. │
|
|
1875
|
+
│ [default: ssl-verification] │
|
|
1876
|
+
│ --help Show this message and exit. │
|
|
1877
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1607
1878
|
|
|
1608
1879
|
```
|
|
1609
1880
|
|
|
1610
1881
|
### api
|
|
1611
1882
|
```
|
|
1612
|
-
|
|
1613
|
-
Usage: datacontract api [OPTIONS]
|
|
1614
|
-
|
|
1615
|
-
Start the datacontract CLI as server application with REST API.
|
|
1616
|
-
The OpenAPI documentation as Swagger UI is available on http://localhost:4242. You can execute the
|
|
1617
|
-
|
|
1618
|
-
|
|
1619
|
-
To
|
|
1620
|
-
|
|
1621
|
-
|
|
1622
|
-
|
|
1623
|
-
|
|
1624
|
-
|
|
1625
|
-
|
|
1626
|
-
|
|
1883
|
+
|
|
1884
|
+
Usage: datacontract api [OPTIONS]
|
|
1885
|
+
|
|
1886
|
+
Start the datacontract CLI as server application with REST API.
|
|
1887
|
+
The OpenAPI documentation as Swagger UI is available on http://localhost:4242. You can execute the
|
|
1888
|
+
commands directly from the Swagger UI.
|
|
1889
|
+
To protect the API, you can set the environment variable DATACONTRACT_CLI_API_KEY to a secret API
|
|
1890
|
+
key. To authenticate, requests must include the header 'x-api-key' with the correct API key. This
|
|
1891
|
+
is highly recommended, as data contract tests may be subject to SQL injections or leak sensitive
|
|
1892
|
+
information.
|
|
1893
|
+
To connect to servers (such as a Snowflake data source), set the credentials as environment
|
|
1894
|
+
variables as documented in https://cli.datacontract.com/#test
|
|
1895
|
+
It is possible to run the API with extra arguments for `uvicorn.run()` as keyword arguments, e.g.:
|
|
1896
|
+
`datacontract api --port 1234 --root_path /datacontract`.
|
|
1897
|
+
|
|
1898
|
+
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
|
|
1899
|
+
│ --port INTEGER Bind socket to this port. [default: 4242] │
|
|
1900
|
+
│ --host TEXT Bind socket to this host. Hint: For running in docker, set it to 0.0.0.0 │
|
|
1901
|
+
│ [default: 127.0.0.1] │
|
|
1902
|
+
│ --help Show this message and exit. │
|
|
1903
|
+
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
|
|
1627
1904
|
|
|
1628
1905
|
```
|
|
1629
1906
|
|
|
@@ -1666,8 +1943,7 @@ Create a data contract based on the actual data. This is the fastest way to get
|
|
|
1666
1943
|
$ datacontract test
|
|
1667
1944
|
```
|
|
1668
1945
|
|
|
1669
|
-
3.
|
|
1670
|
-
probably forgot to document some fields and add the terms and conditions.
|
|
1946
|
+
3. Validate that the `datacontract.yaml` is correctly formatted and adheres to the Data Contract Specification.
|
|
1671
1947
|
```bash
|
|
1672
1948
|
$ datacontract lint
|
|
1673
1949
|
```
|
|
@@ -1688,8 +1964,7 @@ Create a data contract based on the requirements from use cases.
|
|
|
1688
1964
|
```
|
|
1689
1965
|
|
|
1690
1966
|
2. Create the model and quality guarantees based on your business requirements. Fill in the terms,
|
|
1691
|
-
descriptions, etc.
|
|
1692
|
-
linter.
|
|
1967
|
+
descriptions, etc. Validate that your `datacontract.yaml` is correctly formatted.
|
|
1693
1968
|
```bash
|
|
1694
1969
|
$ datacontract lint
|
|
1695
1970
|
```
|
|
@@ -1883,7 +2158,7 @@ if __name__ == "__main__":
|
|
|
1883
2158
|
Output
|
|
1884
2159
|
|
|
1885
2160
|
```yaml
|
|
1886
|
-
dataContractSpecification: 1.1
|
|
2161
|
+
dataContractSpecification: 1.2.1
|
|
1887
2162
|
id: uuid-custom
|
|
1888
2163
|
info:
|
|
1889
2164
|
title: my_custom_imported_data
|
|
@@ -1902,19 +2177,41 @@ models:
|
|
|
1902
2177
|
```
|
|
1903
2178
|
## Development Setup
|
|
1904
2179
|
|
|
1905
|
-
|
|
2180
|
+
- Install [uv](https://docs.astral.sh/uv/)
|
|
2181
|
+
- Python base interpreter should be 3.11.x .
|
|
2182
|
+
- Docker engine must be running to execute the tests.
|
|
1906
2183
|
|
|
1907
2184
|
```bash
|
|
1908
|
-
#
|
|
1909
|
-
|
|
1910
|
-
|
|
2185
|
+
# make sure uv is installed
|
|
2186
|
+
uv python pin 3.11
|
|
2187
|
+
uv venv
|
|
2188
|
+
uv pip install -e '.[dev]'
|
|
2189
|
+
uv run ruff check
|
|
2190
|
+
uv run pytest
|
|
2191
|
+
```
|
|
2192
|
+
|
|
2193
|
+
### Troubleshooting
|
|
2194
|
+
|
|
2195
|
+
#### Windows: Some tests fail
|
|
2196
|
+
|
|
2197
|
+
Run in wsl. (We need to fix the pathes in the tests so that normal Windows will work, contributions are appreciated)
|
|
1911
2198
|
|
|
1912
|
-
|
|
1913
|
-
|
|
1914
|
-
|
|
1915
|
-
|
|
1916
|
-
|
|
1917
|
-
|
|
2199
|
+
#### PyCharm does not pick up the `.venv`
|
|
2200
|
+
|
|
2201
|
+
This [uv issue](https://github.com/astral-sh/uv/issues/12545) might be relevant.
|
|
2202
|
+
|
|
2203
|
+
Try to sync all groups:
|
|
2204
|
+
|
|
2205
|
+
```
|
|
2206
|
+
uv sync --all-groups --all-extras
|
|
2207
|
+
```
|
|
2208
|
+
|
|
2209
|
+
#### Errors in tests that use PySpark (e.g. test_test_kafka.py)
|
|
2210
|
+
|
|
2211
|
+
Ensure you have a JDK 17 or 21 installed. Java 25 causes issues.
|
|
2212
|
+
|
|
2213
|
+
```
|
|
2214
|
+
java --version
|
|
1918
2215
|
```
|
|
1919
2216
|
|
|
1920
2217
|
|
|
@@ -1949,27 +2246,6 @@ docker compose run --rm datacontract --version
|
|
|
1949
2246
|
|
|
1950
2247
|
This command runs the container momentarily to check the version of the `datacontract` CLI. The `--rm` flag ensures that the container is automatically removed after the command executes, keeping your environment clean.
|
|
1951
2248
|
|
|
1952
|
-
## Use with pre-commit
|
|
1953
|
-
|
|
1954
|
-
To run `datacontract-cli` as part of a [pre-commit](https://pre-commit.com/) workflow, add something like the below to the `repos` list in the project's `.pre-commit-config.yaml`:
|
|
1955
|
-
|
|
1956
|
-
```yaml
|
|
1957
|
-
repos:
|
|
1958
|
-
- repo: https://github.com/datacontract/datacontract-cli
|
|
1959
|
-
rev: "v0.10.9"
|
|
1960
|
-
hooks:
|
|
1961
|
-
- id: datacontract-lint
|
|
1962
|
-
- id: datacontract-test
|
|
1963
|
-
args: ["--server", "production"]
|
|
1964
|
-
```
|
|
1965
|
-
|
|
1966
|
-
### Available Hook IDs
|
|
1967
|
-
|
|
1968
|
-
| Hook ID | Description | Dependency |
|
|
1969
|
-
| ----------------- | -------------------------------------------------- | ---------- |
|
|
1970
|
-
| datacontract-lint | Runs the lint subcommand. | Python3 |
|
|
1971
|
-
| datacontract-test | Runs the test subcommand. Please look at | Python3 |
|
|
1972
|
-
| | [test](#test) section for all available arguments. | |
|
|
1973
2249
|
|
|
1974
2250
|
## Release Steps
|
|
1975
2251
|
|
|
@@ -1986,8 +2262,10 @@ We are happy to receive your contributions. Propose your change in an issue or d
|
|
|
1986
2262
|
|
|
1987
2263
|
## Companies using this tool
|
|
1988
2264
|
|
|
2265
|
+
- [Entropy Data](https://www.entropy-data.com)
|
|
1989
2266
|
- [INNOQ](https://innoq.com)
|
|
1990
2267
|
- [Data Catering](https://data.catering/)
|
|
2268
|
+
- [Oliver Wyman](https://www.oliverwyman.com/)
|
|
1991
2269
|
- And many more. To add your company, please create a pull request.
|
|
1992
2270
|
|
|
1993
2271
|
## Related Tools
|
|
@@ -2003,7 +2281,7 @@ We are happy to receive your contributions. Propose your change in an issue or d
|
|
|
2003
2281
|
|
|
2004
2282
|
## Credits
|
|
2005
2283
|
|
|
2006
|
-
Created by [Stefan Negele](https://www.linkedin.com/in/stefan-negele-573153112/)
|
|
2284
|
+
Created by [Stefan Negele](https://www.linkedin.com/in/stefan-negele-573153112/), [Jochen Christ](https://www.linkedin.com/in/jochenchrist/), and [Simon Harrer]().
|
|
2007
2285
|
|
|
2008
2286
|
|
|
2009
2287
|
|