datacontract-cli 0.10.23__py3-none-any.whl → 0.10.40__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (84) hide show
  1. datacontract/__init__.py +13 -0
  2. datacontract/api.py +12 -5
  3. datacontract/catalog/catalog.py +5 -3
  4. datacontract/cli.py +119 -13
  5. datacontract/data_contract.py +145 -67
  6. datacontract/engines/data_contract_checks.py +366 -60
  7. datacontract/engines/data_contract_test.py +50 -4
  8. datacontract/engines/fastjsonschema/check_jsonschema.py +37 -19
  9. datacontract/engines/fastjsonschema/s3/s3_read_files.py +3 -2
  10. datacontract/engines/soda/check_soda_execute.py +27 -3
  11. datacontract/engines/soda/connections/athena.py +79 -0
  12. datacontract/engines/soda/connections/duckdb_connection.py +65 -6
  13. datacontract/engines/soda/connections/kafka.py +4 -2
  14. datacontract/engines/soda/connections/oracle.py +50 -0
  15. datacontract/export/avro_converter.py +20 -3
  16. datacontract/export/bigquery_converter.py +1 -1
  17. datacontract/export/dbt_converter.py +36 -7
  18. datacontract/export/dqx_converter.py +126 -0
  19. datacontract/export/duckdb_type_converter.py +57 -0
  20. datacontract/export/excel_exporter.py +923 -0
  21. datacontract/export/exporter.py +3 -0
  22. datacontract/export/exporter_factory.py +17 -1
  23. datacontract/export/great_expectations_converter.py +55 -5
  24. datacontract/export/{html_export.py → html_exporter.py} +31 -20
  25. datacontract/export/markdown_converter.py +134 -5
  26. datacontract/export/mermaid_exporter.py +110 -0
  27. datacontract/export/odcs_v3_exporter.py +193 -149
  28. datacontract/export/protobuf_converter.py +163 -69
  29. datacontract/export/rdf_converter.py +2 -2
  30. datacontract/export/sodacl_converter.py +9 -1
  31. datacontract/export/spark_converter.py +31 -4
  32. datacontract/export/sql_converter.py +6 -2
  33. datacontract/export/sql_type_converter.py +124 -8
  34. datacontract/imports/avro_importer.py +63 -12
  35. datacontract/imports/csv_importer.py +111 -57
  36. datacontract/imports/excel_importer.py +1112 -0
  37. datacontract/imports/importer.py +16 -3
  38. datacontract/imports/importer_factory.py +17 -0
  39. datacontract/imports/json_importer.py +325 -0
  40. datacontract/imports/odcs_importer.py +2 -2
  41. datacontract/imports/odcs_v3_importer.py +367 -151
  42. datacontract/imports/protobuf_importer.py +264 -0
  43. datacontract/imports/spark_importer.py +117 -13
  44. datacontract/imports/sql_importer.py +32 -16
  45. datacontract/imports/unity_importer.py +84 -38
  46. datacontract/init/init_template.py +1 -1
  47. datacontract/integration/entropy_data.py +126 -0
  48. datacontract/lint/resolve.py +112 -23
  49. datacontract/lint/schema.py +24 -15
  50. datacontract/lint/urls.py +17 -3
  51. datacontract/model/data_contract_specification/__init__.py +1 -0
  52. datacontract/model/odcs.py +13 -0
  53. datacontract/model/run.py +3 -0
  54. datacontract/output/junit_test_results.py +3 -3
  55. datacontract/schemas/datacontract-1.1.0.init.yaml +1 -1
  56. datacontract/schemas/datacontract-1.2.0.init.yaml +91 -0
  57. datacontract/schemas/datacontract-1.2.0.schema.json +2029 -0
  58. datacontract/schemas/datacontract-1.2.1.init.yaml +91 -0
  59. datacontract/schemas/datacontract-1.2.1.schema.json +2058 -0
  60. datacontract/schemas/odcs-3.0.2.schema.json +2382 -0
  61. datacontract/schemas/odcs-3.1.0.schema.json +2809 -0
  62. datacontract/templates/datacontract.html +54 -3
  63. datacontract/templates/datacontract_odcs.html +685 -0
  64. datacontract/templates/index.html +5 -2
  65. datacontract/templates/partials/server.html +2 -0
  66. datacontract/templates/style/output.css +319 -145
  67. {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.40.dist-info}/METADATA +711 -433
  68. datacontract_cli-0.10.40.dist-info/RECORD +121 -0
  69. {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.40.dist-info}/WHEEL +1 -1
  70. {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.40.dist-info/licenses}/LICENSE +1 -1
  71. datacontract/export/csv_type_converter.py +0 -36
  72. datacontract/integration/datamesh_manager.py +0 -72
  73. datacontract/lint/lint.py +0 -142
  74. datacontract/lint/linters/description_linter.py +0 -35
  75. datacontract/lint/linters/field_pattern_linter.py +0 -34
  76. datacontract/lint/linters/field_reference_linter.py +0 -48
  77. datacontract/lint/linters/notice_period_linter.py +0 -55
  78. datacontract/lint/linters/quality_schema_linter.py +0 -52
  79. datacontract/lint/linters/valid_constraints_linter.py +0 -100
  80. datacontract/model/data_contract_specification.py +0 -327
  81. datacontract_cli-0.10.23.dist-info/RECORD +0 -113
  82. /datacontract/{lint/linters → output}/__init__.py +0 -0
  83. {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.40.dist-info}/entry_points.txt +0 -0
  84. {datacontract_cli-0.10.23.dist-info → datacontract_cli-0.10.40.dist-info}/top_level.txt +0 -0
@@ -1,62 +1,71 @@
1
- Metadata-Version: 2.2
1
+ Metadata-Version: 2.4
2
2
  Name: datacontract-cli
3
- Version: 0.10.23
3
+ Version: 0.10.40
4
4
  Summary: The datacontract CLI is an open source command-line tool for working with Data Contracts. It uses data contract YAML files to lint the data contract, connect to data sources and execute schema and quality tests, detect breaking changes, and export to different formats. The tool is written in Python. It can be used as a standalone CLI tool, in a CI/CD pipeline, or directly as a Python library.
5
5
  Author-email: Jochen Christ <jochen.christ@innoq.com>, Stefan Negele <stefan.negele@innoq.com>, Simon Harrer <simon.harrer@innoq.com>
6
+ License-Expression: MIT
6
7
  Project-URL: Homepage, https://cli.datacontract.com
7
8
  Project-URL: Issues, https://github.com/datacontract/datacontract-cli/issues
8
9
  Classifier: Programming Language :: Python :: 3
9
- Classifier: License :: OSI Approved :: MIT License
10
10
  Classifier: Operating System :: OS Independent
11
- Requires-Python: >=3.10
11
+ Requires-Python: <3.13,>=3.10
12
12
  Description-Content-Type: text/markdown
13
13
  License-File: LICENSE
14
- Requires-Dist: typer<0.16,>=0.15.1
15
- Requires-Dist: pydantic<2.11.0,>=2.8.2
14
+ Requires-Dist: typer<0.20,>=0.15.1
15
+ Requires-Dist: pydantic<2.13.0,>=2.8.2
16
16
  Requires-Dist: pyyaml~=6.0.1
17
17
  Requires-Dist: requests<2.33,>=2.31
18
18
  Requires-Dist: fastjsonschema<2.22.0,>=2.19.1
19
19
  Requires-Dist: fastparquet<2025.0.0,>=2024.5.0
20
20
  Requires-Dist: numpy<2.0.0,>=1.26.4
21
- Requires-Dist: python-multipart==0.0.20
22
- Requires-Dist: rich<13.10,>=13.7
23
- Requires-Dist: sqlglot<27.0.0,>=26.6.0
21
+ Requires-Dist: python-multipart<1.0.0,>=0.0.20
22
+ Requires-Dist: rich<15.0,>=13.7
23
+ Requires-Dist: sqlglot<28.0.0,>=26.6.0
24
24
  Requires-Dist: duckdb<2.0.0,>=1.0.0
25
- Requires-Dist: soda-core-duckdb<3.5.0,>=3.3.20
25
+ Requires-Dist: soda-core-duckdb<3.6.0,>=3.3.20
26
26
  Requires-Dist: setuptools>=60
27
- Requires-Dist: python-dotenv~=1.0.0
28
- Requires-Dist: boto3<1.36.12,>=1.34.41
29
- Requires-Dist: Jinja2>=3.1.5
30
- Requires-Dist: jinja_partials>=0.2.1
27
+ Requires-Dist: python-dotenv<2.0.0,>=1.0.0
28
+ Requires-Dist: boto3<2.0.0,>=1.34.41
29
+ Requires-Dist: Jinja2<4.0.0,>=3.1.5
30
+ Requires-Dist: jinja_partials<1.0.0,>=0.2.1
31
+ Requires-Dist: datacontract-specification<2.0.0,>=1.2.3
32
+ Requires-Dist: open-data-contract-standard<4.0.0,>=3.1.0
31
33
  Provides-Extra: avro
32
34
  Requires-Dist: avro==1.12.0; extra == "avro"
33
35
  Provides-Extra: bigquery
34
- Requires-Dist: soda-core-bigquery<3.4.0,>=3.3.20; extra == "bigquery"
36
+ Requires-Dist: soda-core-bigquery<3.6.0,>=3.3.20; extra == "bigquery"
35
37
  Provides-Extra: csv
36
- Requires-Dist: clevercsv>=0.8.2; extra == "csv"
37
38
  Requires-Dist: pandas>=2.0.0; extra == "csv"
39
+ Provides-Extra: excel
40
+ Requires-Dist: openpyxl<4.0.0,>=3.1.5; extra == "excel"
38
41
  Provides-Extra: databricks
39
- Requires-Dist: soda-core-spark-df<3.4.0,>=3.3.20; extra == "databricks"
40
- Requires-Dist: soda-core-spark[databricks]<3.4.0,>=3.3.20; extra == "databricks"
41
- Requires-Dist: databricks-sql-connector<3.8.0,>=3.7.0; extra == "databricks"
42
- Requires-Dist: databricks-sdk<0.45.0; extra == "databricks"
42
+ Requires-Dist: soda-core-spark-df<3.6.0,>=3.3.20; extra == "databricks"
43
+ Requires-Dist: soda-core-spark[databricks]<3.6.0,>=3.3.20; extra == "databricks"
44
+ Requires-Dist: databricks-sql-connector<4.2.0,>=3.7.0; extra == "databricks"
45
+ Requires-Dist: databricks-sdk<0.74.0; extra == "databricks"
46
+ Requires-Dist: pyspark<4.0.0,>=3.5.5; extra == "databricks"
43
47
  Provides-Extra: iceberg
44
- Requires-Dist: pyiceberg==0.8.1; extra == "iceberg"
48
+ Requires-Dist: pyiceberg==0.10.0; extra == "iceberg"
45
49
  Provides-Extra: kafka
46
50
  Requires-Dist: datacontract-cli[avro]; extra == "kafka"
47
- Requires-Dist: soda-core-spark-df<3.4.0,>=3.3.20; extra == "kafka"
51
+ Requires-Dist: soda-core-spark-df<3.6.0,>=3.3.20; extra == "kafka"
52
+ Requires-Dist: pyspark<4.0.0,>=3.5.5; extra == "kafka"
48
53
  Provides-Extra: postgres
49
- Requires-Dist: soda-core-postgres<3.4.0,>=3.3.20; extra == "postgres"
54
+ Requires-Dist: soda-core-postgres<3.6.0,>=3.3.20; extra == "postgres"
50
55
  Provides-Extra: s3
51
- Requires-Dist: s3fs==2025.2.0; extra == "s3"
52
- Requires-Dist: aiobotocore<2.20.0,>=2.17.0; extra == "s3"
56
+ Requires-Dist: s3fs<2026.0.0,>=2025.2.0; extra == "s3"
57
+ Requires-Dist: aiobotocore<2.26.0,>=2.17.0; extra == "s3"
53
58
  Provides-Extra: snowflake
54
- Requires-Dist: snowflake-connector-python[pandas]<3.14,>=3.6; extra == "snowflake"
55
- Requires-Dist: soda-core-snowflake<3.5.0,>=3.3.20; extra == "snowflake"
59
+ Requires-Dist: snowflake-connector-python[pandas]<4.1,>=3.6; extra == "snowflake"
60
+ Requires-Dist: soda-core-snowflake<3.6.0,>=3.3.20; extra == "snowflake"
56
61
  Provides-Extra: sqlserver
57
- Requires-Dist: soda-core-sqlserver<3.4.0,>=3.3.20; extra == "sqlserver"
62
+ Requires-Dist: soda-core-sqlserver<3.6.0,>=3.3.20; extra == "sqlserver"
63
+ Provides-Extra: oracle
64
+ Requires-Dist: soda-core-oracle<3.6.0,>=3.3.20; extra == "oracle"
65
+ Provides-Extra: athena
66
+ Requires-Dist: soda-core-athena<3.6.0,>=3.3.20; extra == "athena"
58
67
  Provides-Extra: trino
59
- Requires-Dist: soda-core-trino<3.4.0,>=3.3.20; extra == "trino"
68
+ Requires-Dist: soda-core-trino<3.6.0,>=3.3.20; extra == "trino"
60
69
  Provides-Extra: dbt
61
70
  Requires-Dist: dbt-core>=1.8.0; extra == "dbt"
62
71
  Provides-Extra: dbml
@@ -66,23 +75,27 @@ Requires-Dist: pyarrow>=18.1.0; extra == "parquet"
66
75
  Provides-Extra: rdf
67
76
  Requires-Dist: rdflib==7.0.0; extra == "rdf"
68
77
  Provides-Extra: api
69
- Requires-Dist: fastapi==0.115.8; extra == "api"
70
- Requires-Dist: uvicorn==0.34.0; extra == "api"
78
+ Requires-Dist: fastapi==0.116.1; extra == "api"
79
+ Requires-Dist: uvicorn==0.38.0; extra == "api"
80
+ Provides-Extra: protobuf
81
+ Requires-Dist: grpcio-tools>=1.53; extra == "protobuf"
71
82
  Provides-Extra: all
72
- Requires-Dist: datacontract-cli[api,bigquery,csv,databricks,dbml,dbt,iceberg,kafka,parquet,postgres,rdf,s3,snowflake,sqlserver,trino]; extra == "all"
83
+ Requires-Dist: datacontract-cli[api,athena,bigquery,csv,databricks,dbml,dbt,excel,iceberg,kafka,oracle,parquet,postgres,protobuf,rdf,s3,snowflake,sqlserver,trino]; extra == "all"
73
84
  Provides-Extra: dev
74
85
  Requires-Dist: datacontract-cli[all]; extra == "dev"
75
86
  Requires-Dist: httpx==0.28.1; extra == "dev"
76
87
  Requires-Dist: kafka-python; extra == "dev"
77
- Requires-Dist: moto==5.0.27; extra == "dev"
88
+ Requires-Dist: minio==7.2.17; extra == "dev"
89
+ Requires-Dist: moto==5.1.13; extra == "dev"
78
90
  Requires-Dist: pandas>=2.1.0; extra == "dev"
79
- Requires-Dist: pre-commit<4.1.0,>=3.7.1; extra == "dev"
91
+ Requires-Dist: pre-commit<4.5.0,>=3.7.1; extra == "dev"
80
92
  Requires-Dist: pytest; extra == "dev"
81
93
  Requires-Dist: pytest-xdist; extra == "dev"
82
- Requires-Dist: pymssql==2.3.2; extra == "dev"
94
+ Requires-Dist: pymssql==2.3.9; extra == "dev"
83
95
  Requires-Dist: ruff; extra == "dev"
84
- Requires-Dist: testcontainers[kafka,minio,mssql,postgres]==4.9.0; extra == "dev"
85
- Requires-Dist: trino==0.332.0; extra == "dev"
96
+ Requires-Dist: testcontainers[kafka,minio,mssql,postgres]==4.13.3; extra == "dev"
97
+ Requires-Dist: trino==0.336.0; extra == "dev"
98
+ Dynamic: license-file
86
99
 
87
100
  # Data Contract CLI
88
101
 
@@ -94,7 +107,7 @@ Requires-Dist: trino==0.332.0; extra == "dev"
94
107
  <a href="https://datacontract.com/slack" rel="nofollow"><img src="https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&amp;style=social" alt="Slack Status" data-canonical-src="https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&amp;style=social" style="max-width: 100%;"></a>
95
108
  </p>
96
109
 
97
- The `datacontract` CLI is an open-source command-line tool for working with data contracts.
110
+ The `datacontract` CLI is a popular and [recognized](https://www.thoughtworks.com/en-de/radar/tools/summary/data-contract-cli) open-source command-line tool for working with data contracts.
98
111
  It uses data contract YAML files as [Data Contract Specification](https://datacontract.com/) or [ODCS](https://bitol-io.github.io/open-data-contract-standard/latest/) to lint the data contract, connect to data sources and execute schema and quality tests, detect breaking changes, and export to different formats. The tool is written in Python. It can be used as a standalone CLI tool, in a CI/CD pipeline, or directly as a Python library.
99
112
 
100
113
  ![Main features of the Data Contract CLI](datacontractcli.png)
@@ -109,9 +122,9 @@ We have a _servers_ section with endpoint details to the S3 bucket, _models_ for
109
122
 
110
123
  This data contract contains all information to connect to S3 and check that the actual data meets the defined schema and quality requirements. We can use this information to test if the actual data product in S3 is compliant to the data contract.
111
124
 
112
- Let's use [pip](https://pip.pypa.io/en/stable/getting-started/) to install the CLI (or use the [Docker image](#docker)),
125
+ Let's use [uv](https://docs.astral.sh/uv/) to install the CLI (or use the [Docker image](#docker)),
113
126
  ```bash
114
- $ python3 -m pip install 'datacontract-cli[all]'
127
+ $ uv tool install --python python3.11 'datacontract-cli[all]'
115
128
  ```
116
129
 
117
130
 
@@ -206,9 +219,15 @@ $ datacontract export --format odcs datacontract.yaml --output odcs.yaml
206
219
  # import ODCS to data contract
207
220
  $ datacontract import --format odcs odcs.yaml --output datacontract.yaml
208
221
 
209
- # import sql (other formats: avro, glue, bigquery, jsonschema ...)
222
+ # import sql (other formats: avro, glue, bigquery, jsonschema, excel ...)
210
223
  $ datacontract import --format sql --source my-ddl.sql --dialect postgres --output datacontract.yaml
211
224
 
225
+ # import from Excel template
226
+ $ datacontract import --format excel --source odcs.xlsx --output datacontract.yaml
227
+
228
+ # export to Excel template
229
+ $ datacontract export --format excel --output odcs.xlsx datacontract.yaml
230
+
212
231
  # find differences between two data contracts
213
232
  $ datacontract diff datacontract-v1.yaml datacontract-v2.yaml
214
233
 
@@ -241,6 +260,14 @@ if not run.has_passed():
241
260
 
242
261
  Choose the most appropriate installation method for your needs:
243
262
 
263
+ ### uv
264
+
265
+ If you have [uv](https://docs.astral.sh/uv/) installed, you can run datacontract-cli directly without installing:
266
+
267
+ ```
268
+ uv run --with 'datacontract-cli[all]' datacontract --version
269
+ ```
270
+
244
271
  ### pip
245
272
  Python 3.10, 3.11, and 3.12 are supported. We recommend to use Python 3.11.
246
273
 
@@ -302,6 +329,7 @@ A list of available extras:
302
329
 
303
330
  | Dependency | Installation Command |
304
331
  |-------------------------|--------------------------------------------|
332
+ | Amazon Athena | `pip install datacontract-cli[athena]` |
305
333
  | Avro Support | `pip install datacontract-cli[avro]` |
306
334
  | Google BigQuery | `pip install datacontract-cli[bigquery]` |
307
335
  | Databricks Integration | `pip install datacontract-cli[databricks]` |
@@ -317,7 +345,7 @@ A list of available extras:
317
345
  | Parquet | `pip install datacontract-cli[parquet]` |
318
346
  | RDF | `pip install datacontract-cli[rdf]` |
319
347
  | API (run as web server) | `pip install datacontract-cli[api]` |
320
-
348
+ | protobuf | `pip install datacontract-cli[protobuf]` |
321
349
 
322
350
 
323
351
  ## Documentation
@@ -338,47 +366,46 @@ Commands
338
366
 
339
367
  ### init
340
368
  ```
341
-
342
- Usage: datacontract init [OPTIONS] [LOCATION]
343
-
344
- Download a datacontract.yaml template and write it to file.
345
-
346
- ╭─ Arguments ──────────────────────────────────────────────────────────────────╮
347
- │ location [LOCATION] The location (url or path) of the data contract
348
- yaml to create.
349
- │ [default: datacontract.yaml] │
350
- ╰──────────────────────────────────────────────────────────────────────────────╯
351
- ╭─ Options ────────────────────────────────────────────────────────────────────╮
352
- │ --template TEXT URL of a template or data contract
353
- │ [default:
354
- https://datacontract.com/datacontrac…
355
- │ --overwrite --no-overwrite Replace the existing │
356
- │ datacontract.yaml │
357
- │ [default: no-overwrite] │
358
- │ --help Show this message and exit. │
359
- ╰──────────────────────────────────────────────────────────────────────────────╯
369
+
370
+ Usage: datacontract init [OPTIONS] [LOCATION]
371
+
372
+ Create an empty data contract.
373
+
374
+ ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
375
+ │ location [LOCATION] The location of the data contract file to create.
376
+ [default: datacontract.yaml]
377
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
378
+ ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
379
+ --template TEXT URL of a template or data contract [default: None] │
380
+ │ --overwrite --no-overwrite Replace the existing datacontract.yaml
381
+ │ [default: no-overwrite]
382
+ --help Show this message and exit.
383
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
360
384
 
361
385
  ```
362
386
 
363
387
  ### lint
364
388
  ```
365
-
366
- Usage: datacontract lint [OPTIONS] [LOCATION]
367
-
368
- Validate that the datacontract.yaml is correctly formatted.
369
-
370
- ╭─ Arguments ──────────────────────────────────────────────────────────────────╮
371
- │ location [LOCATION] The location (url or path) of the data contract
372
- │ yaml.
373
- │ [default: datacontract.yaml] │
374
- ╰──────────────────────────────────────────────────────────────────────────────╯
375
- ╭─ Options ────────────────────────────────────────────────────────────────────╮
376
- --schema TEXT The location (url or path) of the Data Contract
377
- Specification JSON Schema
378
- [default:
379
- https://datacontract.com/datacontract.schema.json]
380
- --help Show this message and exit.
381
- ╰──────────────────────────────────────────────────────────────────────────────╯
389
+
390
+ Usage: datacontract lint [OPTIONS] [LOCATION]
391
+
392
+ Validate that the datacontract.yaml is correctly formatted.
393
+
394
+ ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
395
+ │ location [LOCATION] The location (url or path) of the data contract yaml.
396
+ [default: datacontract.yaml]
397
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
398
+ ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
399
+ --schema TEXT The location (url or path) of the Data Contract Specification │
400
+ JSON Schema
401
+ [default: None]
402
+ --output PATH Specify the file path where the test results should be written
403
+ to (e.g., './test-results/TEST-datacontract.xml'). If no path is
404
+ provided, the output will be printed to stdout.
405
+ │ [default: None] │
406
+ │ --output-format [junit] The target format for the test results. [default: None] │
407
+ │ --help Show this message and exit. │
408
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
382
409
 
383
410
  ```
384
411
 
@@ -394,31 +421,40 @@ Commands
394
421
  │ [default: datacontract.yaml] │
395
422
  ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
396
423
  ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
397
- │ --schema TEXT The location (url or path) of the Data
398
- Contract Specification JSON Schema
399
- [default: None]
400
- --server TEXT The server configuration to run the
401
- schema and quality tests. Use the key of
402
- the server object in the data contract
403
- yaml file to refer to a server, e.g.,
404
- `production`, or `all` for all servers
405
- (default).
406
- [default: all]
407
- --publish TEXT The url to publish the results after the
408
- test
409
- [default: None]
410
- --output PATH Specify the file path where the test
411
- results should be written to (e.g.,
412
- './test-results/TEST-datacontract.xml').
413
- [default: None]
414
- --output-format [junit] The target format for the test results.
415
- [default: None]
416
- │ --logs --no-logs Print logs [default: no-logs]
417
- --ssl-verification --no-ssl-verification SSL verification when publishing the
418
- data contract.
419
- [default: ssl-verification]
420
- --help Show this message and exit.
421
- ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
424
+ │ --schema TEXT The location (url or path) of
425
+ the Data Contract Specification
426
+ JSON Schema
427
+ [default: None]
428
+ --server TEXT The server configuration to run
429
+ the schema and quality tests.
430
+ Use the key of the server object
431
+ in the data contract yaml file
432
+ to refer to a server, e.g.,
433
+ `production`, or `all` for all
434
+ servers (default).
435
+ [default: all]
436
+ --publish-test-results --no-publish-test-results Publish the results after the
437
+ test
438
+ [default:
439
+ no-publish-test-results]
440
+ --publish TEXT DEPRECATED. The url to publish
441
+ the results after the test.
442
+ [default: None]
443
+ │ --output PATH Specify the file path where the
444
+ test results should be written
445
+ to (e.g.,
446
+ './test-results/TEST-datacontra…
447
+ [default: None]
448
+ │ --output-format [junit] The target format for the test │
449
+ │ results. │
450
+ │ [default: None] │
451
+ │ --logs --no-logs Print logs [default: no-logs] │
452
+ │ --ssl-verification --no-ssl-verification SSL verification when publishing │
453
+ │ the data contract. │
454
+ │ [default: ssl-verification] │
455
+ │ --help Show this message and exit. │
456
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
457
+
422
458
  ```
423
459
 
424
460
  Data Contract CLI connects to a data source and runs schema and quality tests to verify that the data contract is valid.
@@ -438,9 +474,11 @@ Credentials are provided with environment variables.
438
474
  Supported server types:
439
475
 
440
476
  - [s3](#S3)
477
+ - [athena](#athena)
441
478
  - [bigquery](#bigquery)
442
479
  - [azure](#azure)
443
480
  - [sqlserver](#sqlserver)
481
+ - [oracle](#oracle)
444
482
  - [databricks](#databricks)
445
483
  - [databricks (programmatic)](#databricks-programmatic)
446
484
  - [dataframe (programmatic)](#dataframe-programmatic)
@@ -448,6 +486,7 @@ Supported server types:
448
486
  - [kafka](#kafka)
449
487
  - [postgres](#postgres)
450
488
  - [trino](#trino)
489
+ - [api](#api)
451
490
  - [local](#local)
452
491
 
453
492
  Supported formats:
@@ -507,6 +546,41 @@ servers:
507
546
  | `DATACONTRACT_S3_SESSION_TOKEN` | `AQoDYXdzEJr...` | AWS temporary session token (optional) |
508
547
 
509
548
 
549
+ #### Athena
550
+
551
+ Data Contract CLI can test data in AWS Athena stored in S3.
552
+ Supports different file formats, such as Iceberg, Parquet, JSON, CSV...
553
+
554
+ ##### Example
555
+
556
+ datacontract.yaml
557
+ ```yaml
558
+ servers:
559
+ athena:
560
+ type: athena
561
+ catalog: awsdatacatalog # awsdatacatalog is the default setting
562
+ schema: icebergdemodb # in Athena, this is called "database"
563
+ regionName: eu-central-1
564
+ stagingDir: s3://my-bucket/athena-results/
565
+ models:
566
+ my_table: # corresponds to a table of view name
567
+ type: table
568
+ fields:
569
+ my_column_1: # corresponds to a column
570
+ type: string
571
+ config:
572
+ physicalType: varchar
573
+ ```
574
+
575
+ ##### Environment Variables
576
+
577
+ | Environment Variable | Example | Description |
578
+ |-------------------------------------|---------------------------------|----------------------------------------|
579
+ | `DATACONTRACT_S3_REGION` | `eu-central-1` | Region of Athena service |
580
+ | `DATACONTRACT_S3_ACCESS_KEY_ID` | `AKIAXV5Q5QABCDEFGH` | AWS Access Key ID |
581
+ | `DATACONTRACT_S3_SECRET_ACCESS_KEY` | `93S7LRrJcqLaaaa/XXXXXXXXXXXXX` | AWS Secret Access Key |
582
+ | `DATACONTRACT_S3_SESSION_TOKEN` | `AQoDYXdzEJr...` | AWS temporary session token (optional) |
583
+
510
584
 
511
585
  #### Google Cloud Storage (GCS)
512
586
 
@@ -628,6 +702,55 @@ models:
628
702
 
629
703
 
630
704
 
705
+ #### Oracle
706
+
707
+ Data Contract CLI can test data in Oracle Database.
708
+
709
+ ##### Example
710
+
711
+ datacontract.yaml
712
+ ```yaml
713
+ servers:
714
+ oracle:
715
+ type: oracle
716
+ host: localhost
717
+ port: 1521
718
+ service_name: ORCL
719
+ schema: ADMIN
720
+ models:
721
+ my_table_1: # corresponds to a table
722
+ type: table
723
+ fields:
724
+ my_column_1: # corresponds to a column
725
+ type: decimal
726
+ description: Decimal number
727
+ my_column_2: # corresponds to another column
728
+ type: text
729
+ description: Unicode text string
730
+ config:
731
+ oracleType: NVARCHAR2 # optional: can be used to explicitly define the type used in the database
732
+ # if not set a default mapping will be used
733
+ ```
734
+
735
+ ##### Environment Variables
736
+
737
+ These environment variable specify the credentials used by the datacontract tool to connect to the database.
738
+ If you've started the database from a container, e.g. [oracle-free](https://hub.docker.com/r/gvenzl/oracle-free)
739
+ this should match either `system` and what you specified as `ORACLE_PASSWORD` on the container or
740
+ alternatively what you've specified under `APP_USER` and `APP_USER_PASSWORD`.
741
+ If you require thick mode to connect to the database, you need to have an Oracle Instant Client
742
+ installed on the system and specify the path to the installation within the environment variable
743
+ `DATACONTRACT_ORACLE_CLIENT_DIR`.
744
+
745
+ | Environment Variable | Example | Description |
746
+ |--------------------------------------------------|--------------------|--------------------------------------------|
747
+ | `DATACONTRACT_ORACLE_USERNAME` | `system` | Username |
748
+ | `DATACONTRACT_ORACLE_PASSWORD` | `0x162e53` | Password |
749
+ | `DATACONTRACT_ORACLE_CLIENT_DIR` | `C:\oracle\client` | Path to Oracle Instant Client installation |
750
+
751
+
752
+
753
+
631
754
  #### Databricks
632
755
 
633
756
  Works with Unity Catalog and Hive metastore.
@@ -682,19 +805,37 @@ models:
682
805
  fields: ...
683
806
  ```
684
807
 
685
- Notebook
686
- ```python
687
- %pip install datacontract-cli[databricks]
688
- dbutils.library.restartPython()
808
+ ##### Installing on Databricks Compute
689
809
 
690
- from datacontract.data_contract import DataContract
810
+ **Important:** When using Databricks LTS ML runtimes (15.4, 16.4), installing via `%pip install` in notebooks can issues.
691
811
 
692
- data_contract = DataContract(
693
- data_contract_file="/Volumes/acme_catalog_prod/orders_latest/datacontract/datacontract.yaml",
694
- spark=spark)
695
- run = data_contract.test()
696
- run.result
697
- ```
812
+ **Recommended approach:** Use Databricks' native library management instead:
813
+
814
+ 1. **Create or configure your compute cluster:**
815
+ - Navigate to **Compute** in the Databricks workspace
816
+ - Create a new cluster or select an existing one
817
+ - Go to the **Libraries** tab
818
+
819
+ 2. **Add the datacontract-cli library:**
820
+ - Click **Install new**
821
+ - Select **PyPI** as the library source
822
+ - Enter package name: `datacontract-cli[databricks]`
823
+ - Click **Install**
824
+
825
+ 3. **Restart the cluster** to apply the library installation
826
+
827
+ 4. **Use in your notebook** without additional installation:
828
+ ```python
829
+ from datacontract.data_contract import DataContract
830
+
831
+ data_contract = DataContract(
832
+ data_contract_file="/Volumes/acme_catalog_prod/orders_latest/datacontract/datacontract.yaml",
833
+ spark=spark)
834
+ run = data_contract.test()
835
+ run.result
836
+ ```
837
+
838
+ Databricks' library management properly resolves dependencies during cluster initialization, rather than at runtime in the notebook.
698
839
 
699
840
  #### Dataframe (programmatic)
700
841
 
@@ -874,68 +1015,117 @@ models:
874
1015
  | `DATACONTRACT_TRINO_PASSWORD` | `mysecretpassword` | Password |
875
1016
 
876
1017
 
1018
+ #### API
1019
+
1020
+ Data Contract CLI can test APIs that return data in JSON format.
1021
+ Currently, only GET requests are supported.
1022
+
1023
+ ##### Example
1024
+
1025
+ datacontract.yaml
1026
+ ```yaml
1027
+ servers:
1028
+ api:
1029
+ type: "api"
1030
+ location: "https://api.example.com/path"
1031
+ delimiter: none # new_line, array, or none (default)
1032
+
1033
+ models:
1034
+ my_object: # corresponds to the root element of the JSON response
1035
+ type: object
1036
+ fields:
1037
+ field1:
1038
+ type: string
1039
+ fields2:
1040
+ type: number
1041
+ ```
1042
+
1043
+ ##### Environment Variables
1044
+
1045
+ | Environment Variable | Example | Description |
1046
+ |-----------------------------------------|------------------|---------------------------------------------------|
1047
+ | `DATACONTRACT_API_HEADER_AUTHORIZATION` | `Bearer <token>` | The value for the `authorization` header. Optional. |
1048
+
1049
+
1050
+ #### Local
1051
+
1052
+ Data Contract CLI can test local files in parquet, json, csv, or delta format.
1053
+
1054
+ ##### Example
1055
+
1056
+ datacontract.yaml
1057
+ ```yaml
1058
+ servers:
1059
+ local:
1060
+ type: local
1061
+ path: ./*.parquet
1062
+ format: parquet
1063
+ models:
1064
+ my_table_1: # corresponds to a table
1065
+ type: table
1066
+ fields:
1067
+ my_column_1: # corresponds to a column
1068
+ type: varchar
1069
+ my_column_2: # corresponds to a column
1070
+ type: string
1071
+ ```
1072
+
877
1073
 
878
1074
  ### export
879
1075
  ```
880
-
881
- Usage: datacontract export [OPTIONS] [LOCATION]
882
-
883
- Convert data contract to a specific format. Saves to file specified by
884
- `output` option if present, otherwise prints to stdout.
885
-
886
- ╭─ Arguments ──────────────────────────────────────────────────────────────────╮
887
- │ location [LOCATION] The location (url or path) of the data contract
888
- │ yaml.
889
- │ [default: datacontract.yaml] │
890
- ╰──────────────────────────────────────────────────────────────────────────────╯
891
- ╭─ Options ────────────────────────────────────────────────────────────────────╮
892
- * --format [jsonschema|pydantic-model| The export format.
893
- sodacl|dbt|dbt-sources|dbt- [default: None]
894
- staging-sql|odcs| [required]
895
- rdf|avro|protobuf|gre
896
- at-expectations|terraform|a
897
- vro-idl|sql|sql-query|html|
898
- go|bigquery|dbml|spark|sqla
899
- lchemy|data-caterer|dcs|mar
900
- kdown|iceberg|custom]
901
- --output PATH Specify the file path where
902
- the exported data will be
903
- saved. If no path is
904
- provided, the output will be
905
- printed to stdout.
906
- [default: None]
907
- --server TEXT The server name to export.
908
- [default: None]
909
- --model TEXT Use the key of the model in
910
- the data contract yaml file
911
- to refer to a model, e.g.,
912
- `orders`, or `all` for all
913
- models (default).
914
- [default: all]
915
- --schema TEXT The location (url or path)
916
- of the Data Contract
917
- Specification JSON Schema
918
- [default:
919
- https://datacontract.com/da…
920
- --engine TEXT [engine] The engine used for
921
- great expection run.
922
- [default: None]
923
- │ --template PATH [custom] The file path of │
924
- │ Jinja template. │
925
- [default: None]
926
- │ --help Show this message and exit. │
927
- ╰──────────────────────────────────────────────────────────────────────────────╯
928
- ╭─ RDF Options ────────────────────────────────────────────────────────────────╮
929
- --rdf-base TEXT [rdf] The base URI used to generate the RDF graph.
930
- [default: None]
931
- ╰──────────────────────────────────────────────────────────────────────────────╯
932
- ╭─ SQL Options ────────────────────────────────────────────────────────────────╮
933
- │ --sql-server-type TEXT [sql] The server type to determine the sql │
934
- │ dialect. By default, it uses 'auto' to │
935
- │ automatically detect the sql dialect via the │
936
- │ specified servers in the data contract. │
937
- │ [default: auto] │
938
- ╰──────────────────────────────────────────────────────────────────────────────╯
1076
+
1077
+ Usage: datacontract export [OPTIONS] [LOCATION]
1078
+
1079
+ Convert data contract to a specific format. Saves to file specified by `output` option if present,
1080
+ otherwise prints to stdout.
1081
+
1082
+ ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
1083
+ │ location [LOCATION] The location (url or path) of the data contract yaml.
1084
+ [default: datacontract.yaml]
1085
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1086
+ ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
1087
+ * --format [jsonschema|pydantic-model|sodacl|db The export format. [default: None] │
1088
+ t|dbt-sources|dbt-staging-sql|odcs|r [required]
1089
+ df|avro|protobuf|great-expectations|
1090
+ terraform|avro-idl|sql|sql-query|mer
1091
+ maid|html|go|bigquery|dbml|spark|sql
1092
+ alchemy|data-caterer|dcs|markdown|ic
1093
+ eberg|custom|excel|dqx]
1094
+ --output PATH Specify the file path where the
1095
+ exported data will be saved. If no
1096
+ path is provided, the output will be
1097
+ printed to stdout.
1098
+ [default: None]
1099
+ --server TEXT The server name to export.
1100
+ [default: None]
1101
+ --model TEXT Use the key of the model in the data
1102
+ contract yaml file to refer to a
1103
+ model, e.g., `orders`, or `all` for
1104
+ all models (default).
1105
+ [default: all]
1106
+ --schema TEXT The location (url or path) of the
1107
+ Data Contract Specification JSON
1108
+ Schema
1109
+ [default: None]
1110
+ --engine TEXT [engine] The engine used for great
1111
+ expection run.
1112
+ [default: None]
1113
+ --template PATH The file path or URL of a template.
1114
+ For Excel format: path/URL to custom
1115
+ Excel template. For custom format:
1116
+ path to Jinja template.
1117
+ [default: None]
1118
+ --help Show this message and exit.
1119
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1120
+ ╭─ RDF Options ────────────────────────────────────────────────────────────────────────────────────╮
1121
+ --rdf-base TEXT [rdf] The base URI used to generate the RDF graph. [default: None]
1122
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1123
+ ╭─ SQL Options ────────────────────────────────────────────────────────────────────────────────────╮
1124
+ --sql-server-type TEXT [sql] The server type to determine the sql dialect. By default, │
1125
+ it uses 'auto' to automatically detect the sql dialect via the
1126
+ specified servers in the data contract.
1127
+ │ [default: auto] │
1128
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
939
1129
 
940
1130
  ```
941
1131
 
@@ -946,35 +1136,50 @@ datacontract export --format html --output datacontract.html
946
1136
 
947
1137
  Available export options:
948
1138
 
949
- | Type | Description | Status |
950
- |----------------------|---------------------------------------------------------|--------|
951
- | `html` | Export to HTML | ✅ |
952
- | `jsonschema` | Export to JSON Schema | ✅ |
953
- | `odcs` | Export to Open Data Contract Standard (ODCS) V3 | ✅ |
954
- | `sodacl` | Export to SodaCL quality checks in YAML format | ✅ |
955
- | `dbt` | Export to dbt models in YAML format | ✅ |
956
- | `dbt-sources` | Export to dbt sources in YAML format | ✅ |
957
- | `dbt-staging-sql` | Export to dbt staging SQL models | ✅ |
958
- | `rdf` | Export data contract to RDF representation in N3 format | ✅ |
959
- | `avro` | Export to AVRO models | ✅ |
960
- | `protobuf` | Export to Protobuf | ✅ |
961
- | `terraform` | Export to terraform resources | ✅ |
962
- | `sql` | Export to SQL DDL | ✅ |
963
- | `sql-query` | Export to SQL Query | ✅ |
964
- | `great-expectations` | Export to Great Expectations Suites in JSON Format | ✅ |
965
- | `bigquery` | Export to BigQuery Schemas | ✅ |
966
- | `go` | Export to Go types | ✅ |
967
- | `pydantic-model` | Export to pydantic models | ✅ |
968
- | `DBML` | Export to a DBML Diagram description | ✅ |
969
- | `spark` | Export to a Spark StructType | ✅ |
970
- | `sqlalchemy` | Export to SQLAlchemy Models | ✅ |
971
- | `data-caterer` | Export to Data Caterer in YAML format | ✅ |
972
- | `dcs` | Export to Data Contract Specification in YAML format | ✅ |
973
- | `markdown` | Export to Markdown | ✅ |
1139
+ | Type | Description | Status |
1140
+ |----------------------|---------------------------------------------------------|---------|
1141
+ | `html` | Export to HTML | ✅ |
1142
+ | `jsonschema` | Export to JSON Schema | ✅ |
1143
+ | `odcs` | Export to Open Data Contract Standard (ODCS) V3 | ✅ |
1144
+ | `sodacl` | Export to SodaCL quality checks in YAML format | ✅ |
1145
+ | `dbt` | Export to dbt models in YAML format | ✅ |
1146
+ | `dbt-sources` | Export to dbt sources in YAML format | ✅ |
1147
+ | `dbt-staging-sql` | Export to dbt staging SQL models | ✅ |
1148
+ | `rdf` | Export data contract to RDF representation in N3 format | ✅ |
1149
+ | `avro` | Export to AVRO models | ✅ |
1150
+ | `protobuf` | Export to Protobuf | ✅ |
1151
+ | `terraform` | Export to terraform resources | ✅ |
1152
+ | `sql` | Export to SQL DDL | ✅ |
1153
+ | `sql-query` | Export to SQL Query | ✅ |
1154
+ | `great-expectations` | Export to Great Expectations Suites in JSON Format | ✅ |
1155
+ | `bigquery` | Export to BigQuery Schemas | ✅ |
1156
+ | `go` | Export to Go types | ✅ |
1157
+ | `pydantic-model` | Export to pydantic models | ✅ |
1158
+ | `DBML` | Export to a DBML Diagram description | ✅ |
1159
+ | `spark` | Export to a Spark StructType | ✅ |
1160
+ | `sqlalchemy` | Export to SQLAlchemy Models | ✅ |
1161
+ | `data-caterer` | Export to Data Caterer in YAML format | ✅ |
1162
+ | `dcs` | Export to Data Contract Specification in YAML format | ✅ |
1163
+ | `markdown` | Export to Markdown | ✅ |
974
1164
  | `iceberg` | Export to an Iceberg JSON Schema Definition | partial |
975
- | `custom` | Export to Custom format with Jinja | ✅ |
976
- | Missing something? | Please create an issue on GitHub | TBD |
1165
+ | `excel` | Export to ODCS Excel Template | ✅ |
1166
+ | `custom` | Export to Custom format with Jinja | |
1167
+ | `dqx` | Export to DQX in YAML format | ✅ |
1168
+ | Missing something? | Please create an issue on GitHub | TBD |
1169
+
1170
+ #### SQL
1171
+
1172
+ The `export` function converts a given data contract into a SQL data definition language (DDL).
1173
+
1174
+ ```shell
1175
+ datacontract export datacontract.yaml --format sql --output output.sql
1176
+ ```
1177
+
1178
+ If using Databricks, and an error is thrown when trying to deploy the SQL DDLs with `variant` columns set the following properties.
977
1179
 
1180
+ ```shell
1181
+ spark.conf.set(“spark.databricks.delta.schema.typeCheck.enabled”, “false”)
1182
+ ```
978
1183
 
979
1184
  #### Great Expectations
980
1185
 
@@ -982,7 +1187,7 @@ The `export` function transforms a specified data contract into a comprehensive
982
1187
  If the contract includes multiple models, you need to specify the names of the model you wish to export.
983
1188
 
984
1189
  ```shell
985
- datacontract export datacontract.yaml --format great-expectations --model orders
1190
+ datacontract export datacontract.yaml --format great-expectations --model orders
986
1191
  ```
987
1192
 
988
1193
  The export creates a list of expectations by utilizing:
@@ -1007,7 +1212,7 @@ To further customize the export, the following optional arguments are available:
1007
1212
 
1008
1213
  #### RDF
1009
1214
 
1010
- The export function converts a given data contract into a RDF representation. You have the option to
1215
+ The `export` function converts a given data contract into a RDF representation. You have the option to
1011
1216
  add a base_url which will be used as the default prefix to resolve relative IRIs inside the document.
1012
1217
 
1013
1218
  ```shell
@@ -1230,73 +1435,110 @@ FROM
1230
1435
  {{ ref('orders') }}
1231
1436
  ```
1232
1437
 
1438
+ #### ODCS Excel Templace
1439
+
1440
+ The `export` function converts a data contract into an ODCS (Open Data Contract Standard) Excel template. This creates a user-friendly Excel spreadsheet that can be used for authoring, sharing, and managing data contracts using the familiar Excel interface.
1441
+
1442
+ ```shell
1443
+ datacontract export --format excel --output datacontract.xlsx datacontract.yaml
1444
+ ```
1445
+
1446
+ The Excel format enables:
1447
+ - **User-friendly authoring**: Create and edit data contracts in Excel's familiar interface
1448
+ - **Easy sharing**: Distribute data contracts as standard Excel files
1449
+ - **Collaboration**: Enable non-technical stakeholders to contribute to data contract definitions
1450
+ - **Round-trip conversion**: Import Excel templates back to YAML data contracts
1451
+
1452
+ For more information about the Excel template structure, visit the [ODCS Excel Template repository](https://github.com/datacontract/open-data-contract-standard-excel-template).
1453
+
1233
1454
  ### import
1234
1455
  ```
1235
- Usage: datacontract import [OPTIONS]
1236
-
1237
- Create a data contract from the given source location. Saves to file specified by `output` option if present,
1238
- otherwise prints to stdout.
1239
-
1240
- ╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────╮
1241
- * --format [sql|avro|dbt|dbml|glue|jsonschema|bi The format of the source file. │
1242
- gquery|odcs|unity|spark|iceberg|parqu [default: None]
1243
- et|csv] [required]
1244
- --output PATH Specify the file path where the Data
1245
- Contract will be saved. If no path is
1246
- provided, the output will be printed
1247
- to stdout.
1248
- [default: None]
1249
- --source TEXT The path to the file or Glue Database
1250
- that should be imported.
1251
- [default: None]
1252
- │ --dialect TEXT The SQL dialect to use when importing
1253
- SQL files, e.g., postgres, tsql,
1254
- bigquery.
1255
- [default: None]
1256
- --glue-table TEXT List of table ids to import from the
1257
- Glue Database (repeat for multiple
1258
- table ids, leave empty for all tables
1259
- in the dataset).
1260
- [default: None]
1261
- --bigquery-project TEXT The bigquery project id.
1262
- [default: None]
1263
- │ --bigquery-dataset TEXT The bigquery dataset id.
1264
- [default: None]
1265
- --bigquery-table TEXT List of table ids to import from the
1266
- bigquery API (repeat for multiple
1267
- table ids, leave empty for all tables
1268
- in the dataset).
1269
- [default: None]
1270
- --unity-table-full-name TEXT Full name of a table in the unity
1271
- catalog
1272
- [default: None]
1273
- │ --dbt-model TEXT List of models names to import from
1274
- the dbt manifest file (repeat for
1275
- multiple models names, leave empty
1276
- for all models in the dataset).
1277
- [default: None]
1278
- --dbml-schema TEXT List of schema names to import from
1279
- the DBML file (repeat for multiple
1280
- schema names, leave empty for all
1281
- tables in the file).
1282
- [default: None]
1283
- --dbml-table TEXT List of table names to import from
1284
- the DBML file (repeat for multiple │
1285
- table names, leave empty for all
1286
- tables in the file).
1287
- [default: None]
1288
- │ --iceberg-table TEXT Table name to assign to the model
1289
- created from the Iceberg schema.
1290
- [default: None]
1291
- --template TEXT The location (url or path) of the
1292
- Data Contract Specification Template
1293
- [default: None]
1294
- │ --schema TEXT The location (url or path) of the
1295
- Data Contract Specification JSON
1296
- Schema
1297
- [default: None]
1298
- --help Show this message and exit.
1299
- ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
1456
+
1457
+ Usage: datacontract import [OPTIONS]
1458
+
1459
+ Create a data contract from the given source location. Saves to file specified by `output` option
1460
+ if present, otherwise prints to stdout.
1461
+
1462
+ ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
1463
+ * --format [sql|avro|dbt|dbml|glue|jsonsc The format of the source file.
1464
+ hema|json|bigquery|odcs|unity| [default: None]
1465
+ spark|iceberg|parquet|csv|prot [required]
1466
+ obuf|excel]
1467
+ --output PATH Specify the file path where
1468
+ the Data Contract will be
1469
+ saved. If no path is provided,
1470
+ the output will be printed to
1471
+ stdout.
1472
+ [default: None]
1473
+ │ --source TEXT The path to the file that
1474
+ should be imported.
1475
+ [default: None]
1476
+ --spec [datacontract_specification|od The format of the data
1477
+ cs] contract to import.
1478
+ [default:
1479
+ datacontract_specification]
1480
+ --dialect TEXT The SQL dialect to use when
1481
+ importing SQL files, e.g.,
1482
+ postgres, tsql, bigquery.
1483
+ [default: None]
1484
+ │ --glue-table TEXT List of table ids to import
1485
+ from the Glue Database (repeat
1486
+ for multiple table ids, leave
1487
+ empty for all tables in the
1488
+ dataset).
1489
+ [default: None]
1490
+ --bigquery-project TEXT The bigquery project id.
1491
+ [default: None]
1492
+ --bigquery-dataset TEXT The bigquery dataset id.
1493
+ [default: None]
1494
+ │ --bigquery-table TEXT List of table ids to import
1495
+ from the bigquery API (repeat
1496
+ for multiple table ids, leave
1497
+ empty for all tables in the
1498
+ dataset).
1499
+ [default: None]
1500
+ --unity-table-full-name TEXT Full name of a table in the
1501
+ unity catalog
1502
+ [default: None]
1503
+ --dbt-model TEXT List of models names to import
1504
+ from the dbt manifest file
1505
+ (repeat for multiple models
1506
+ names, leave empty for all
1507
+ models in the dataset).
1508
+ [default: None]
1509
+ │ --dbml-schema TEXT List of schema names to import
1510
+ from the DBML file (repeat for
1511
+ multiple schema names, leave
1512
+ empty for all tables in the
1513
+ file).
1514
+ [default: None]
1515
+ │ --dbml-table TEXT List of table names to import
1516
+ from the DBML file (repeat for
1517
+ multiple table names, leave
1518
+ empty for all tables in the
1519
+ file).
1520
+ │ [default: None] │
1521
+ │ --iceberg-table TEXT Table name to assign to the │
1522
+ │ model created from the Iceberg │
1523
+ │ schema. │
1524
+ │ [default: None] │
1525
+ │ --template TEXT The location (url or path) of │
1526
+ │ the Data Contract │
1527
+ │ Specification Template │
1528
+ │ [default: None] │
1529
+ │ --schema TEXT The location (url or path) of │
1530
+ │ the Data Contract │
1531
+ │ Specification JSON Schema │
1532
+ │ [default: None] │
1533
+ │ --owner TEXT The owner or team responsible │
1534
+ │ for managing the data │
1535
+ │ contract. │
1536
+ │ [default: None] │
1537
+ │ --id TEXT The identifier for the the │
1538
+ │ data contract. │
1539
+ │ [default: None] │
1540
+ │ --help Show this message and exit. │
1541
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1300
1542
 
1301
1543
  ```
1302
1544
 
@@ -1312,21 +1554,23 @@ Available import options:
1312
1554
 
1313
1555
  | Type | Description | Status |
1314
1556
  |--------------------|------------------------------------------------|--------|
1315
- | `sql` | Import from SQL DDL | ✅ |
1316
1557
  | `avro` | Import from AVRO schemas | ✅ |
1317
- | `glue` | Import from AWS Glue DataCatalog | ✅ |
1318
- | `jsonschema` | Import from JSON Schemas | ✅ |
1319
1558
  | `bigquery` | Import from BigQuery Schemas | ✅ |
1320
- | `unity` | Import from Databricks Unity Catalog | partial |
1321
- | `dbt` | Import from dbt models | ✅ |
1322
- | `odcs` | Import from Open Data Contract Standard (ODCS) | ✅ |
1323
- | `spark` | Import from Spark StructTypes | ✅ |
1324
- | `dbml` | Import from DBML models | ✅ |
1325
1559
  | `csv` | Import from CSV File | ✅ |
1326
- | `protobuf` | Import from Protobuf schemas | TBD |
1560
+ | `dbml` | Import from DBML models | |
1561
+ | `dbt` | Import from dbt models | ✅ |
1562
+ | `excel` | Import from ODCS Excel Template | ✅ |
1563
+ | `glue` | Import from AWS Glue DataCatalog | ✅ |
1327
1564
  | `iceberg` | Import from an Iceberg JSON Schema Definition | partial |
1328
- | `parquet` | Import from Parquet File Metadta | ✅ |
1329
- | Missing something? | Please create an issue on GitHub | TBD |
1565
+ | `jsonschema` | Import from JSON Schemas | ✅ |
1566
+ | `odcs` | Import from Open Data Contract Standard (ODCS) | |
1567
+ | `parquet` | Import from Parquet File Metadata | ✅ |
1568
+ | `protobuf` | Import from Protobuf schemas | ✅ |
1569
+ | `spark` | Import from Spark StructTypes, Variant | ✅ |
1570
+ | `sql` | Import from SQL DDL | ✅ |
1571
+ | `unity` | Import from Databricks Unity Catalog | partial |
1572
+ | `excel` | Import from ODCS Excel Template | ✅ |
1573
+ | Missing something? | Please create an issue on GitHub | TBD |
1330
1574
 
1331
1575
 
1332
1576
  #### ODCS
@@ -1367,16 +1611,21 @@ datacontract import --format bigquery --bigquery-project <project_id> --bigquery
1367
1611
  ```
1368
1612
 
1369
1613
  #### Unity Catalog
1370
-
1371
1614
  ```bash
1372
1615
  # Example import from a Unity Catalog JSON file
1373
1616
  datacontract import --format unity --source my_unity_table.json
1374
1617
  ```
1375
1618
 
1376
1619
  ```bash
1377
- # Example import single table from Unity Catalog via HTTP endpoint
1378
- export DATABRICKS_IMPORT_INSTANCE="https://xyz.cloud.databricks.com"
1379
- export DATABRICKS_IMPORT_ACCESS_TOKEN=<token>
1620
+ # Example import single table from Unity Catalog via HTTP endpoint using PAT
1621
+ export DATACONTRACT_DATABRICKS_SERVER_HOSTNAME="https://xyz.cloud.databricks.com"
1622
+ export DATACONTRACT_DATABRICKS_TOKEN=<token>
1623
+ datacontract import --format unity --unity-table-full-name <table_full_name>
1624
+ ```
1625
+ Please Refer to [Databricks documentation](https://docs.databricks.com/aws/en/dev-tools/auth/unified-auth) on how to set up a profile
1626
+ ```bash
1627
+ # Example import single table from Unity Catalog via HTTP endpoint using Profile
1628
+ export DATACONTRACT_DATABRICKS_PROFILE="my-profile"
1380
1629
  datacontract import --format unity --unity-table-full-name <table_full_name>
1381
1630
  ```
1382
1631
 
@@ -1397,6 +1646,17 @@ datacontract import --format dbt --source <manifest_path> --dbt-model <model_nam
1397
1646
  datacontract import --format dbt --source <manifest_path>
1398
1647
  ```
1399
1648
 
1649
+ ### Excel
1650
+
1651
+ Importing from [ODCS Excel Template](https://github.com/datacontract/open-data-contract-standard-excel-template).
1652
+
1653
+ Examples:
1654
+
1655
+ ```bash
1656
+ # Example import from ODCS Excel Template
1657
+ datacontract import --format excel --source odcs.xlsx
1658
+ ```
1659
+
1400
1660
  #### Glue
1401
1661
 
1402
1662
  Importing from Glue reads the necessary Data directly off of the AWS API.
@@ -1416,14 +1676,31 @@ datacontract import --format glue --source <database_name>
1416
1676
 
1417
1677
  #### Spark
1418
1678
 
1419
- Importing from Spark table or view these must be created or accessible in the Spark context. Specify tables list in `source` parameter.
1420
-
1421
- Example:
1679
+ Importing from Spark table or view these must be created or accessible in the Spark context. Specify tables list in `source` parameter. If the `source` tables are registered as tables in Databricks, and they have a table-level descriptions they will also be added to the Data Contract Specification.
1422
1680
 
1423
1681
  ```bash
1682
+ # Example: Import Spark table(s) from Spark context
1424
1683
  datacontract import --format spark --source "users,orders"
1425
1684
  ```
1426
1685
 
1686
+ ```bash
1687
+ # Example: Import Spark table
1688
+ DataContract.import_from_source("spark", "users")
1689
+ DataContract.import_from_source(format = "spark", source = "users")
1690
+
1691
+ # Example: Import Spark dataframe
1692
+ DataContract.import_from_source("spark", "users", dataframe = df_user)
1693
+ DataContract.import_from_source(format = "spark", source = "users", dataframe = df_user)
1694
+
1695
+ # Example: Import Spark table + table description
1696
+ DataContract.import_from_source("spark", "users", description = "description")
1697
+ DataContract.import_from_source(format = "spark", source = "users", description = "description")
1698
+
1699
+ # Example: Import Spark dataframe + table description
1700
+ DataContract.import_from_source("spark", "users", dataframe = df_user, description = "description")
1701
+ DataContract.import_from_source(format = "spark", source = "users", dataframe = df_user, description = "description")
1702
+ ```
1703
+
1427
1704
  #### DBML
1428
1705
 
1429
1706
  Importing from DBML Documents.
@@ -1475,95 +1752,96 @@ Example:
1475
1752
  datacontract import --format csv --source "test.csv"
1476
1753
  ```
1477
1754
 
1755
+ #### protobuf
1756
+
1757
+ Importing from protobuf File. Specify file in `source` parameter.
1758
+
1759
+ Example:
1760
+
1761
+ ```bash
1762
+ datacontract import --format protobuf --source "test.proto"
1763
+ ```
1764
+
1478
1765
 
1479
1766
  ### breaking
1480
1767
  ```
1481
-
1482
- Usage: datacontract breaking [OPTIONS] LOCATION_OLD LOCATION_NEW
1483
-
1484
- Identifies breaking changes between data contracts. Prints to stdout.
1485
-
1486
- ╭─ Arguments ──────────────────────────────────────────────────────────────────╮
1487
- │ * location_old TEXT The location (url or path) of the old data
1488
- contract yaml.
1489
- │ [default: None]
1490
- [required]
1491
- * location_new TEXT The location (url or path) of the new data
1492
- contract yaml.
1493
- │ [default: None] │
1494
- │ [required] │
1495
- ╰──────────────────────────────────────────────────────────────────────────────╯
1496
- ╭─ Options ────────────────────────────────────────────────────────────────────╮
1497
- │ --help Show this message and exit. │
1498
- ╰──────────────────────────────────────────────────────────────────────────────╯
1768
+
1769
+ Usage: datacontract breaking [OPTIONS] LOCATION_OLD LOCATION_NEW
1770
+
1771
+ Identifies breaking changes between data contracts. Prints to stdout.
1772
+
1773
+ ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
1774
+ │ * location_old TEXT The location (url or path) of the old data contract yaml.
1775
+ [default: None]
1776
+ │ [required]
1777
+ * location_new TEXT The location (url or path) of the new data contract yaml.
1778
+ [default: None]
1779
+ [required]
1780
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1781
+ ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
1782
+ │ --help Show this message and exit. │
1783
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1499
1784
 
1500
1785
  ```
1501
1786
 
1502
1787
  ### changelog
1503
1788
  ```
1504
-
1505
- Usage: datacontract changelog [OPTIONS] LOCATION_OLD LOCATION_NEW
1506
-
1507
- Generate a changelog between data contracts. Prints to stdout.
1508
-
1509
- ╭─ Arguments ──────────────────────────────────────────────────────────────────╮
1510
- │ * location_old TEXT The location (url or path) of the old data
1511
- contract yaml.
1512
- │ [default: None]
1513
- [required]
1514
- * location_new TEXT The location (url or path) of the new data
1515
- contract yaml.
1516
- │ [default: None] │
1517
- │ [required] │
1518
- ╰──────────────────────────────────────────────────────────────────────────────╯
1519
- ╭─ Options ────────────────────────────────────────────────────────────────────╮
1520
- │ --help Show this message and exit. │
1521
- ╰──────────────────────────────────────────────────────────────────────────────╯
1789
+
1790
+ Usage: datacontract changelog [OPTIONS] LOCATION_OLD LOCATION_NEW
1791
+
1792
+ Generate a changelog between data contracts. Prints to stdout.
1793
+
1794
+ ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
1795
+ │ * location_old TEXT The location (url or path) of the old data contract yaml.
1796
+ [default: None]
1797
+ │ [required]
1798
+ * location_new TEXT The location (url or path) of the new data contract yaml.
1799
+ [default: None]
1800
+ [required]
1801
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1802
+ ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
1803
+ │ --help Show this message and exit. │
1804
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1522
1805
 
1523
1806
  ```
1524
1807
 
1525
1808
  ### diff
1526
1809
  ```
1527
-
1528
- Usage: datacontract diff [OPTIONS] LOCATION_OLD LOCATION_NEW
1529
-
1530
- PLACEHOLDER. Currently works as 'changelog' does.
1531
-
1532
- ╭─ Arguments ──────────────────────────────────────────────────────────────────╮
1533
- │ * location_old TEXT The location (url or path) of the old data
1534
- contract yaml.
1535
- │ [default: None]
1536
- [required]
1537
- * location_new TEXT The location (url or path) of the new data
1538
- contract yaml.
1539
- │ [default: None] │
1540
- │ [required] │
1541
- ╰──────────────────────────────────────────────────────────────────────────────╯
1542
- ╭─ Options ────────────────────────────────────────────────────────────────────╮
1543
- │ --help Show this message and exit. │
1544
- ╰──────────────────────────────────────────────────────────────────────────────╯
1810
+
1811
+ Usage: datacontract diff [OPTIONS] LOCATION_OLD LOCATION_NEW
1812
+
1813
+ PLACEHOLDER. Currently works as 'changelog' does.
1814
+
1815
+ ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
1816
+ │ * location_old TEXT The location (url or path) of the old data contract yaml.
1817
+ [default: None]
1818
+ │ [required]
1819
+ * location_new TEXT The location (url or path) of the new data contract yaml.
1820
+ [default: None]
1821
+ [required]
1822
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1823
+ ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
1824
+ │ --help Show this message and exit. │
1825
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1545
1826
 
1546
1827
  ```
1547
1828
 
1548
1829
  ### catalog
1549
1830
  ```
1550
-
1551
- Usage: datacontract catalog [OPTIONS]
1552
-
1553
- Create an html catalog of data contracts.
1554
-
1555
- ╭─ Options ────────────────────────────────────────────────────────────────────╮
1556
- │ --files TEXT Glob pattern for the data contract files to include in │
1557
- the catalog. Applies recursively to any subfolders.
1558
- │ [default: *.yaml]
1559
- │ --output TEXT Output directory for the catalog html files. │
1560
- [default: catalog/]
1561
- --schema TEXT The location (url or path) of the Data Contract
1562
- Specification JSON Schema
1563
- │ [default: │
1564
- │ https://datacontract.com/datacontract.schema.json] │
1565
- │ --help Show this message and exit. │
1566
- ╰──────────────────────────────────────────────────────────────────────────────╯
1831
+
1832
+ Usage: datacontract catalog [OPTIONS]
1833
+
1834
+ Create a html catalog of data contracts.
1835
+
1836
+ ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
1837
+ │ --files TEXT Glob pattern for the data contract files to include in the catalog.
1838
+ │ Applies recursively to any subfolders.
1839
+ │ [default: *.yaml]
1840
+ │ --output TEXT Output directory for the catalog html files. [default: catalog/]
1841
+ --schema TEXT The location (url or path) of the Data Contract Specification JSON Schema
1842
+ [default: None]
1843
+ --help Show this message and exit.
1844
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1567
1845
 
1568
1846
  ```
1569
1847
 
@@ -1579,51 +1857,50 @@ datacontract catalog --files "*.odcs.yaml"
1579
1857
 
1580
1858
  ### publish
1581
1859
  ```
1582
-
1583
- Usage: datacontract publish [OPTIONS] [LOCATION]
1584
-
1585
- Publish the data contract to the Data Mesh Manager.
1586
-
1587
- ╭─ Arguments ──────────────────────────────────────────────────────────────────╮
1588
- │ location [LOCATION] The location (url or path) of the data contract
1589
- │ yaml.
1590
- │ [default: datacontract.yaml] │
1591
- ╰──────────────────────────────────────────────────────────────────────────────╯
1592
- ╭─ Options ────────────────────────────────────────────────────────────────────╮
1593
- --schema TEXT The location (url or
1594
- path) of the Data
1595
- Contract Specification
1596
- JSON Schema
1597
- │ [default:
1598
- https://datacontract.c…
1599
- │ --ssl-verification --no-ssl-verification SSL verification when │
1600
- │ publishing the data │
1601
- │ contract. │
1602
- │ [default: │
1603
- │ ssl-verification] │
1604
- │ --help Show this message and │
1605
- │ exit. │
1606
- ╰──────────────────────────────────────────────────────────────────────────────╯
1860
+
1861
+ Usage: datacontract publish [OPTIONS] [LOCATION]
1862
+
1863
+ Publish the data contract to the Data Mesh Manager.
1864
+
1865
+ ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
1866
+ │ location [LOCATION] The location (url or path) of the data contract yaml.
1867
+ [default: datacontract.yaml]
1868
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1869
+ ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
1870
+ --schema TEXT The location (url or path) of the Data │
1871
+ Contract Specification JSON Schema
1872
+ [default: None]
1873
+ --ssl-verification --no-ssl-verification SSL verification when publishing the data
1874
+ contract.
1875
+ │ [default: ssl-verification]
1876
+ --help Show this message and exit.
1877
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1607
1878
 
1608
1879
  ```
1609
1880
 
1610
1881
  ### api
1611
1882
  ```
1612
-
1613
- Usage: datacontract api [OPTIONS]
1614
-
1615
- Start the datacontract CLI as server application with REST API.
1616
- The OpenAPI documentation as Swagger UI is available on http://localhost:4242. You can execute the commands directly from the Swagger UI.
1617
- To protect the API, you can set the environment variable DATACONTRACT_CLI_API_KEY to a secret API key. To authenticate, requests must include the header 'x-api-key' with the
1618
- correct API key. This is highly recommended, as data contract tests may be subject to SQL injections or leak sensitive information.
1619
- To connect to servers (such as a Snowflake data source), set the credentials as environment variables as documented in https://cli.datacontract.com/#test
1620
-
1621
- ╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
1622
- --port INTEGER Bind socket to this port. [default: 4242] │
1623
- --host TEXT Bind socket to this host. Hint: For running in docker, set it to 0.0.0.0 [default: 127.0.0.1] │
1624
- --help Show this message and exit.
1625
- ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
1626
-
1883
+
1884
+ Usage: datacontract api [OPTIONS]
1885
+
1886
+ Start the datacontract CLI as server application with REST API.
1887
+ The OpenAPI documentation as Swagger UI is available on http://localhost:4242. You can execute the
1888
+ commands directly from the Swagger UI.
1889
+ To protect the API, you can set the environment variable DATACONTRACT_CLI_API_KEY to a secret API
1890
+ key. To authenticate, requests must include the header 'x-api-key' with the correct API key. This
1891
+ is highly recommended, as data contract tests may be subject to SQL injections or leak sensitive
1892
+ information.
1893
+ To connect to servers (such as a Snowflake data source), set the credentials as environment
1894
+ variables as documented in https://cli.datacontract.com/#test
1895
+ It is possible to run the API with extra arguments for `uvicorn.run()` as keyword arguments, e.g.:
1896
+ `datacontract api --port 1234 --root_path /datacontract`.
1897
+
1898
+ ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
1899
+ │ --port INTEGER Bind socket to this port. [default: 4242] │
1900
+ │ --host TEXT Bind socket to this host. Hint: For running in docker, set it to 0.0.0.0 │
1901
+ │ [default: 127.0.0.1] │
1902
+ │ --help Show this message and exit. │
1903
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
1627
1904
 
1628
1905
  ```
1629
1906
 
@@ -1666,8 +1943,7 @@ Create a data contract based on the actual data. This is the fastest way to get
1666
1943
  $ datacontract test
1667
1944
  ```
1668
1945
 
1669
- 3. Make sure that all the best practices for a `datacontract.yaml` are met using the linter. You
1670
- probably forgot to document some fields and add the terms and conditions.
1946
+ 3. Validate that the `datacontract.yaml` is correctly formatted and adheres to the Data Contract Specification.
1671
1947
  ```bash
1672
1948
  $ datacontract lint
1673
1949
  ```
@@ -1688,8 +1964,7 @@ Create a data contract based on the requirements from use cases.
1688
1964
  ```
1689
1965
 
1690
1966
  2. Create the model and quality guarantees based on your business requirements. Fill in the terms,
1691
- descriptions, etc. Make sure you follow all best practices for a `datacontract.yaml` using the
1692
- linter.
1967
+ descriptions, etc. Validate that your `datacontract.yaml` is correctly formatted.
1693
1968
  ```bash
1694
1969
  $ datacontract lint
1695
1970
  ```
@@ -1883,7 +2158,7 @@ if __name__ == "__main__":
1883
2158
  Output
1884
2159
 
1885
2160
  ```yaml
1886
- dataContractSpecification: 1.1.0
2161
+ dataContractSpecification: 1.2.1
1887
2162
  id: uuid-custom
1888
2163
  info:
1889
2164
  title: my_custom_imported_data
@@ -1902,19 +2177,41 @@ models:
1902
2177
  ```
1903
2178
  ## Development Setup
1904
2179
 
1905
- Python base interpreter should be 3.11.x (unless working on 3.12 release candidate).
2180
+ - Install [uv](https://docs.astral.sh/uv/)
2181
+ - Python base interpreter should be 3.11.x .
2182
+ - Docker engine must be running to execute the tests.
1906
2183
 
1907
2184
  ```bash
1908
- # create venv
1909
- python3.11 -m venv venv
1910
- source venv/bin/activate
2185
+ # make sure uv is installed
2186
+ uv python pin 3.11
2187
+ uv venv
2188
+ uv pip install -e '.[dev]'
2189
+ uv run ruff check
2190
+ uv run pytest
2191
+ ```
2192
+
2193
+ ### Troubleshooting
2194
+
2195
+ #### Windows: Some tests fail
2196
+
2197
+ Run in wsl. (We need to fix the pathes in the tests so that normal Windows will work, contributions are appreciated)
1911
2198
 
1912
- # Install Requirements
1913
- pip install --upgrade pip setuptools wheel
1914
- pip install -e '.[dev]'
1915
- pre-commit install
1916
- pre-commit run --all-files
1917
- pytest
2199
+ #### PyCharm does not pick up the `.venv`
2200
+
2201
+ This [uv issue](https://github.com/astral-sh/uv/issues/12545) might be relevant.
2202
+
2203
+ Try to sync all groups:
2204
+
2205
+ ```
2206
+ uv sync --all-groups --all-extras
2207
+ ```
2208
+
2209
+ #### Errors in tests that use PySpark (e.g. test_test_kafka.py)
2210
+
2211
+ Ensure you have a JDK 17 or 21 installed. Java 25 causes issues.
2212
+
2213
+ ```
2214
+ java --version
1918
2215
  ```
1919
2216
 
1920
2217
 
@@ -1949,27 +2246,6 @@ docker compose run --rm datacontract --version
1949
2246
 
1950
2247
  This command runs the container momentarily to check the version of the `datacontract` CLI. The `--rm` flag ensures that the container is automatically removed after the command executes, keeping your environment clean.
1951
2248
 
1952
- ## Use with pre-commit
1953
-
1954
- To run `datacontract-cli` as part of a [pre-commit](https://pre-commit.com/) workflow, add something like the below to the `repos` list in the project's `.pre-commit-config.yaml`:
1955
-
1956
- ```yaml
1957
- repos:
1958
- - repo: https://github.com/datacontract/datacontract-cli
1959
- rev: "v0.10.9"
1960
- hooks:
1961
- - id: datacontract-lint
1962
- - id: datacontract-test
1963
- args: ["--server", "production"]
1964
- ```
1965
-
1966
- ### Available Hook IDs
1967
-
1968
- | Hook ID | Description | Dependency |
1969
- | ----------------- | -------------------------------------------------- | ---------- |
1970
- | datacontract-lint | Runs the lint subcommand. | Python3 |
1971
- | datacontract-test | Runs the test subcommand. Please look at | Python3 |
1972
- | | [test](#test) section for all available arguments. | |
1973
2249
 
1974
2250
  ## Release Steps
1975
2251
 
@@ -1986,8 +2262,10 @@ We are happy to receive your contributions. Propose your change in an issue or d
1986
2262
 
1987
2263
  ## Companies using this tool
1988
2264
 
2265
+ - [Entropy Data](https://www.entropy-data.com)
1989
2266
  - [INNOQ](https://innoq.com)
1990
2267
  - [Data Catering](https://data.catering/)
2268
+ - [Oliver Wyman](https://www.oliverwyman.com/)
1991
2269
  - And many more. To add your company, please create a pull request.
1992
2270
 
1993
2271
  ## Related Tools
@@ -2003,7 +2281,7 @@ We are happy to receive your contributions. Propose your change in an issue or d
2003
2281
 
2004
2282
  ## Credits
2005
2283
 
2006
- Created by [Stefan Negele](https://www.linkedin.com/in/stefan-negele-573153112/) and [Jochen Christ](https://www.linkedin.com/in/jochenchrist/).
2284
+ Created by [Stefan Negele](https://www.linkedin.com/in/stefan-negele-573153112/), [Jochen Christ](https://www.linkedin.com/in/jochenchrist/), and [Simon Harrer]().
2007
2285
 
2008
2286
 
2009
2287