datacontract-cli 0.10.8__py3-none-any.whl → 0.10.10__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of datacontract-cli might be problematic. Click here for more details.

Files changed (42) hide show
  1. datacontract/catalog/catalog.py +4 -2
  2. datacontract/cli.py +36 -18
  3. datacontract/data_contract.py +13 -53
  4. datacontract/engines/soda/check_soda_execute.py +10 -2
  5. datacontract/engines/soda/connections/duckdb.py +32 -12
  6. datacontract/engines/soda/connections/trino.py +26 -0
  7. datacontract/export/avro_converter.py +1 -1
  8. datacontract/export/exporter.py +3 -2
  9. datacontract/export/exporter_factory.py +132 -39
  10. datacontract/export/jsonschema_converter.py +7 -7
  11. datacontract/export/sodacl_converter.py +17 -12
  12. datacontract/export/spark_converter.py +211 -0
  13. datacontract/export/sql_type_converter.py +28 -0
  14. datacontract/imports/avro_importer.py +149 -7
  15. datacontract/imports/bigquery_importer.py +17 -0
  16. datacontract/imports/dbt_importer.py +117 -0
  17. datacontract/imports/glue_importer.py +116 -33
  18. datacontract/imports/importer.py +34 -0
  19. datacontract/imports/importer_factory.py +90 -0
  20. datacontract/imports/jsonschema_importer.py +14 -3
  21. datacontract/imports/odcs_importer.py +8 -0
  22. datacontract/imports/spark_importer.py +134 -0
  23. datacontract/imports/sql_importer.py +8 -0
  24. datacontract/imports/unity_importer.py +23 -9
  25. datacontract/integration/publish_datamesh_manager.py +10 -5
  26. datacontract/lint/resolve.py +87 -21
  27. datacontract/lint/schema.py +24 -4
  28. datacontract/model/data_contract_specification.py +37 -4
  29. datacontract/templates/datacontract.html +18 -3
  30. datacontract/templates/index.html +1 -1
  31. datacontract/templates/partials/datacontract_information.html +20 -0
  32. datacontract/templates/partials/datacontract_terms.html +7 -0
  33. datacontract/templates/partials/definition.html +9 -1
  34. datacontract/templates/partials/model_field.html +23 -6
  35. datacontract/templates/partials/server.html +49 -16
  36. datacontract/templates/style/output.css +42 -0
  37. {datacontract_cli-0.10.8.dist-info → datacontract_cli-0.10.10.dist-info}/METADATA +310 -122
  38. {datacontract_cli-0.10.8.dist-info → datacontract_cli-0.10.10.dist-info}/RECORD +42 -36
  39. {datacontract_cli-0.10.8.dist-info → datacontract_cli-0.10.10.dist-info}/WHEEL +1 -1
  40. {datacontract_cli-0.10.8.dist-info → datacontract_cli-0.10.10.dist-info}/LICENSE +0 -0
  41. {datacontract_cli-0.10.8.dist-info → datacontract_cli-0.10.10.dist-info}/entry_points.txt +0 -0
  42. {datacontract_cli-0.10.8.dist-info → datacontract_cli-0.10.10.dist-info}/top_level.txt +0 -0
@@ -1,10 +1,10 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: datacontract-cli
3
- Version: 0.10.8
4
- Summary: Test data contracts
5
- Author-email: Jochen Christ <jochen.christ@innoq.com>, Stefan Negele <stefan.negele@innoq.com>
3
+ Version: 0.10.10
4
+ Summary: The datacontract CLI is an open source command-line tool for working with Data Contracts. It uses data contract YAML files to lint the data contract, connect to data sources and execute schema and quality tests, detect breaking changes, and export to different formats. The tool is written in Python. It can be used as a standalone CLI tool, in a CI/CD pipeline, or directly as a Python library.
5
+ Author-email: Jochen Christ <jochen.christ@innoq.com>, Stefan Negele <stefan.negele@innoq.com>, Simon Harrer <simon.harrer@innoq.com>
6
6
  Project-URL: Homepage, https://cli.datacontract.com
7
- Project-URL: Issues, https://github.com/datacontract/cli/issues
7
+ Project-URL: Issues, https://github.com/datacontract/datacontract-cli/issues
8
8
  Classifier: Programming Language :: Python :: 3
9
9
  Classifier: License :: OSI Approved :: MIT License
10
10
  Classifier: Operating System :: OS Independent
@@ -12,10 +12,10 @@ Requires-Python: >=3.10
12
12
  Description-Content-Type: text/markdown
13
13
  License-File: LICENSE
14
14
  Requires-Dist: typer[all] <0.13,>=0.9
15
- Requires-Dist: pydantic <2.8.0,>=2.5.3
15
+ Requires-Dist: pydantic <2.9.0,>=2.8.2
16
16
  Requires-Dist: pyyaml ~=6.0.1
17
17
  Requires-Dist: requests <2.33,>=2.31
18
- Requires-Dist: fastapi ==0.111.0
18
+ Requires-Dist: fastapi ==0.111.1
19
19
  Requires-Dist: fastparquet ==2024.5.0
20
20
  Requires-Dist: python-multipart ==0.0.9
21
21
  Requires-Dist: rich ~=13.7.0
@@ -28,18 +28,18 @@ Requires-Dist: python-dotenv ~=1.0.0
28
28
  Requires-Dist: rdflib ==7.0.0
29
29
  Requires-Dist: opentelemetry-exporter-otlp-proto-grpc ~=1.16
30
30
  Requires-Dist: opentelemetry-exporter-otlp-proto-http ~=1.16
31
- Requires-Dist: boto3 <1.34.130,>=1.34.41
32
- Requires-Dist: botocore <1.34.128,>=1.34.41
31
+ Requires-Dist: boto3 <1.34.137,>=1.34.41
32
+ Requires-Dist: botocore <1.34.137,>=1.34.41
33
33
  Requires-Dist: jinja-partials >=0.2.1
34
34
  Provides-Extra: all
35
- Requires-Dist: datacontract-cli[bigquery,databricks,deltalake,kafka,postgres,s3,snowflake,sqlserver] ; extra == 'all'
35
+ Requires-Dist: datacontract-cli[bigquery,databricks,deltalake,kafka,postgres,s3,snowflake,sqlserver,trino] ; extra == 'all'
36
36
  Provides-Extra: avro
37
37
  Requires-Dist: avro ==1.11.3 ; extra == 'avro'
38
38
  Provides-Extra: bigquery
39
39
  Requires-Dist: soda-core-bigquery <3.4.0,>=3.3.1 ; extra == 'bigquery'
40
40
  Provides-Extra: databricks
41
41
  Requires-Dist: soda-core-spark-df <3.4.0,>=3.3.1 ; extra == 'databricks'
42
- Requires-Dist: databricks-sql-connector <3.2.0,>=3.1.2 ; extra == 'databricks'
42
+ Requires-Dist: databricks-sql-connector <3.3.0,>=3.1.2 ; extra == 'databricks'
43
43
  Requires-Dist: soda-core-spark[databricks] <3.4.0,>=3.3.1 ; extra == 'databricks'
44
44
  Provides-Extra: deltalake
45
45
  Requires-Dist: deltalake <0.19,>=0.17 ; extra == 'deltalake'
@@ -50,10 +50,12 @@ Requires-Dist: ruff ; extra == 'dev'
50
50
  Requires-Dist: pre-commit ~=3.7.1 ; extra == 'dev'
51
51
  Requires-Dist: pytest ; extra == 'dev'
52
52
  Requires-Dist: pytest-xdist ; extra == 'dev'
53
- Requires-Dist: moto ; extra == 'dev'
53
+ Requires-Dist: moto ==5.0.11 ; extra == 'dev'
54
54
  Requires-Dist: pymssql ==2.3.0 ; extra == 'dev'
55
55
  Requires-Dist: kafka-python ; extra == 'dev'
56
- Requires-Dist: testcontainers ~=4.5.0 ; extra == 'dev'
56
+ Requires-Dist: trino ==0.329.0 ; extra == 'dev'
57
+ Requires-Dist: testcontainers <4.8,>=4.5 ; extra == 'dev'
58
+ Requires-Dist: testcontainers[core] ; extra == 'dev'
57
59
  Requires-Dist: testcontainers[minio] ; extra == 'dev'
58
60
  Requires-Dist: testcontainers[postgres] ; extra == 'dev'
59
61
  Requires-Dist: testcontainers[kafka] ; extra == 'dev'
@@ -64,12 +66,14 @@ Requires-Dist: soda-core-spark-df <3.4.0,>=3.3.1 ; extra == 'kafka'
64
66
  Provides-Extra: postgres
65
67
  Requires-Dist: soda-core-postgres <3.4.0,>=3.3.1 ; extra == 'postgres'
66
68
  Provides-Extra: s3
67
- Requires-Dist: s3fs ==2024.6.0 ; extra == 's3'
69
+ Requires-Dist: s3fs ==2024.6.1 ; extra == 's3'
68
70
  Provides-Extra: snowflake
69
- Requires-Dist: snowflake-connector-python[pandas] <3.11,>=3.6 ; extra == 'snowflake'
71
+ Requires-Dist: snowflake-connector-python[pandas] <3.12,>=3.6 ; extra == 'snowflake'
70
72
  Requires-Dist: soda-core-snowflake <3.4.0,>=3.3.1 ; extra == 'snowflake'
71
73
  Provides-Extra: sqlserver
72
74
  Requires-Dist: soda-core-sqlserver <3.4.0,>=3.3.1 ; extra == 'sqlserver'
75
+ Provides-Extra: trino
76
+ Requires-Dist: soda-core-trino <3.4.0,>=3.3.1 ; extra == 'trino'
73
77
 
74
78
  # Data Contract CLI
75
79
 
@@ -258,17 +262,18 @@ pip install datacontract-cli[all]
258
262
 
259
263
  A list of available extras:
260
264
 
261
- | Dependency | Installation Command |
262
- |-------------------------|-------------------------------------------------------------|
263
- | Avro Support | `pip install datacontract-cli[avro]` |
264
- | Google BigQuery | `pip install datacontract-cli[bigquery]` |
265
- | Databricks Integration | `pip install datacontract-cli[databricks]` |
266
- | Deltalake Integration | `pip install datacontract-cli[deltalake]` |
267
- | Kafka Integration | `pip install datacontract-cli[kafka]` |
268
- | PostgreSQL Integration | `pip install datacontract-cli[postgres]` |
269
- | S3 Integration | `pip install datacontract-cli[s3]` |
270
- | Snowflake Integration | `pip install datacontract-cli[snowflake]` |
271
- | Microsoft SQL Server | `pip install datacontract-cli[sqlserver]` |
265
+ | Dependency | Installation Command |
266
+ |------------------------|--------------------------------------------|
267
+ | Avro Support | `pip install datacontract-cli[avro]` |
268
+ | Google BigQuery | `pip install datacontract-cli[bigquery]` |
269
+ | Databricks Integration | `pip install datacontract-cli[databricks]` |
270
+ | Deltalake Integration | `pip install datacontract-cli[deltalake]` |
271
+ | Kafka Integration | `pip install datacontract-cli[kafka]` |
272
+ | PostgreSQL Integration | `pip install datacontract-cli[postgres]` |
273
+ | S3 Integration | `pip install datacontract-cli[s3]` |
274
+ | Snowflake Integration | `pip install datacontract-cli[snowflake]` |
275
+ | Microsoft SQL Server | `pip install datacontract-cli[sqlserver]` |
276
+ | Trino | `pip install datacontract-cli[trino]` |
272
277
 
273
278
 
274
279
 
@@ -295,16 +300,16 @@ Commands
295
300
  Download a datacontract.yaml template and write it to file.
296
301
 
297
302
  ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────╮
298
- │ location [LOCATION] The location (url or path) of the data contract yaml to create.
299
- │ [default: datacontract.yaml]
303
+ │ location [LOCATION] The location (url or path) of the data contract yaml to create.
304
+ │ [default: datacontract.yaml]
300
305
  ╰──────────────────────────────────────────────────────────────────────────────────────────────╯
301
306
  ╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
302
- │ --template TEXT URL of a template or data contract
303
- │ [default:
304
- │ https://datacontract.com/datacontract.init.yaml]
305
- │ --overwrite --no-overwrite Replace the existing datacontract.yaml
306
- │ [default: no-overwrite]
307
- │ --help Show this message and exit.
307
+ │ --template TEXT URL of a template or data contract
308
+ │ [default:
309
+ │ https://datacontract.com/datacontract.init.yaml]
310
+ │ --overwrite --no-overwrite Replace the existing datacontract.yaml
311
+ │ [default: no-overwrite]
312
+ │ --help Show this message and exit.
308
313
  ╰──────────────────────────────────────────────────────────────────────────────────────────────╯
309
314
  ```
310
315
 
@@ -316,12 +321,12 @@ Commands
316
321
  Validate that the datacontract.yaml is correctly formatted.
317
322
 
318
323
  ╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
319
- │ location [LOCATION] The location (url or path) of the data contract yaml. [default: datacontract.yaml]
324
+ │ location [LOCATION] The location (url or path) of the data contract yaml. [default: datacontract.yaml]
320
325
  ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
321
326
  ╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
322
- │ --schema TEXT The location (url or path) of the Data Contract Specification JSON Schema
323
- │ [default: https://datacontract.com/datacontract.schema.json]
324
- │ --help Show this message and exit.
327
+ │ --schema TEXT The location (url or path) of the Data Contract Specification JSON Schema
328
+ │ [default: https://datacontract.com/datacontract.schema.json]
329
+ │ --help Show this message and exit.
325
330
  ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
326
331
  ```
327
332
 
@@ -333,28 +338,28 @@ Commands
333
338
  Run schema and quality tests on configured servers.
334
339
 
335
340
  ╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
336
- │ location [LOCATION] The location (url or path) of the data contract yaml. [default: datacontract.yaml]
341
+ │ location [LOCATION] The location (url or path) of the data contract yaml. [default: datacontract.yaml]
337
342
  ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
338
343
  ╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
339
- │ --schema TEXT The location (url or path) of the Data Contract
340
- │ Specification JSON Schema
341
- │ [default:
342
- │ https://datacontract.com/datacontract.schema.json]
343
- │ --server TEXT The server configuration to run the schema and quality
344
- │ tests. Use the key of the server object in the data
345
- │ contract yaml file to refer to a server, e.g.,
346
- │ `production`, or `all` for all servers (default).
347
- │ [default: all]
348
- │ --examples --no-examples Run the schema and quality tests on the example data
349
- │ within the data contract.
350
- │ [default: no-examples]
351
- │ --publish TEXT The url to publish the results after the test
352
- │ [default: None]
353
- │ --publish-to-opentelemetry --no-publish-to-opentelemetry Publish the results to opentelemetry. Use environment
354
- │ variables to configure the OTLP endpoint, headers, etc.
355
- │ [default: no-publish-to-opentelemetry]
356
- │ --logs --no-logs Print logs [default: no-logs]
357
- │ --help Show this message and exit.
344
+ │ --schema TEXT The location (url or path) of the Data Contract
345
+ │ Specification JSON Schema
346
+ │ [default:
347
+ │ https://datacontract.com/datacontract.schema.json]
348
+ │ --server TEXT The server configuration to run the schema and quality
349
+ │ tests. Use the key of the server object in the data
350
+ │ contract yaml file to refer to a server, e.g.,
351
+ │ `production`, or `all` for all servers (default).
352
+ │ [default: all]
353
+ │ --examples --no-examples Run the schema and quality tests on the example data
354
+ │ within the data contract.
355
+ │ [default: no-examples]
356
+ │ --publish TEXT The url to publish the results after the test
357
+ │ [default: None]
358
+ │ --publish-to-opentelemetry --no-publish-to-opentelemetry Publish the results to opentelemetry. Use environment
359
+ │ variables to configure the OTLP endpoint, headers, etc.
360
+ │ [default: no-publish-to-opentelemetry]
361
+ │ --logs --no-logs Print logs [default: no-logs]
362
+ │ --help Show this message and exit.
358
363
  ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
359
364
  ```
360
365
 
@@ -384,6 +389,7 @@ Supported server types:
384
389
  - [snowflake](#snowflake)
385
390
  - [kafka](#kafka)
386
391
  - [postgres](#postgres)
392
+ - [trino](#trino)
387
393
  - [local](#local)
388
394
 
389
395
  Supported formats:
@@ -429,11 +435,12 @@ servers:
429
435
 
430
436
  #### Environment Variables
431
437
 
432
- | Environment Variable | Example | Description |
433
- |-----------------------------------|-------------------------------|-----------------------|
434
- | `DATACONTRACT_S3_REGION` | `eu-central-1` | Region of S3 bucket |
435
- | `DATACONTRACT_S3_ACCESS_KEY_ID` | `AKIAXV5Q5QABCDEFGH` | AWS Access Key ID |
436
- | `DATACONTRACT_S3_SECRET_ACCESS_KEY` | `93S7LRrJcqLaaaa/XXXXXXXXXXXXX` | AWS Secret Access Key |
438
+ | Environment Variable | Example | Description |
439
+ |-------------------------------------|---------------------------------|----------------------------------------|
440
+ | `DATACONTRACT_S3_REGION` | `eu-central-1` | Region of S3 bucket |
441
+ | `DATACONTRACT_S3_ACCESS_KEY_ID` | `AKIAXV5Q5QABCDEFGH` | AWS Access Key ID |
442
+ | `DATACONTRACT_S3_SECRET_ACCESS_KEY` | `93S7LRrJcqLaaaa/XXXXXXXXXXXXX` | AWS Secret Access Key |
443
+ | `DATACONTRACT_S3_SESSION_TOKEN` | `AQoDYXdzEJr...` | AWS temporary session token (optional) |
437
444
 
438
445
 
439
446
 
@@ -466,7 +473,6 @@ models:
466
473
  | `DATACONTRACT_BIGQUERY_ACCOUNT_INFO_JSON_PATH` | `~/service-access-key.json` | Service Access key as saved on key creation by BigQuery. If this environment variable isn't set, the cli tries to use `GOOGLE_APPLICATION_CREDENTIALS` as a fallback, so if you have that set for using their Python library anyway, it should work seamlessly. |
467
474
 
468
475
 
469
-
470
476
  ### Azure
471
477
 
472
478
  Data Contract CLI can test data that is stored in Azure Blob storage or Azure Data Lake Storage (Gen2) (ADLS) in various formats.
@@ -486,11 +492,11 @@ servers:
486
492
 
487
493
  Authentication works with an Azure Service Principal (SPN) aka App Registration with a secret.
488
494
 
489
- | Environment Variable | Example | Description |
490
- |-----------------------------------|-------------------------------|------------------------------------------------------|
491
- | `DATACONTRACT_AZURE_TENANT_ID` | `79f5b80f-10ff-40b9-9d1f-774b42d605fc` | The Azure Tenant ID |
492
- | `DATACONTRACT_AZURE_CLIENT_ID` | `3cf7ce49-e2e9-4cbc-a922-4328d4a58622` | The ApplicationID / ClientID of the app registration |
493
- | `DATACONTRACT_AZURE_CLIENT_SECRET` | `yZK8Q~GWO1MMXXXXXXXXXXXXX` | The Client Secret value |
495
+ | Environment Variable | Example | Description |
496
+ |------------------------------------|----------------------------------------|------------------------------------------------------|
497
+ | `DATACONTRACT_AZURE_TENANT_ID` | `79f5b80f-10ff-40b9-9d1f-774b42d605fc` | The Azure Tenant ID |
498
+ | `DATACONTRACT_AZURE_CLIENT_ID` | `3cf7ce49-e2e9-4cbc-a922-4328d4a58622` | The ApplicationID / ClientID of the app registration |
499
+ | `DATACONTRACT_AZURE_CLIENT_SECRET` | `yZK8Q~GWO1MMXXXXXXXXXXXXX` | The Client Secret value |
494
500
 
495
501
 
496
502
 
@@ -520,13 +526,13 @@ models:
520
526
 
521
527
  #### Environment Variables
522
528
 
523
- | Environment Variable | Example | Description |
524
- |----------------------------------|--------------------|-------------|
525
- | `DATACONTRACT_SQLSERVER_USERNAME` | `root` | Username |
526
- | `DATACONTRACT_SQLSERVER_PASSWORD` | `toor` | Password |
527
- | `DATACONTRACT_SQLSERVER_TRUSTED_CONNECTION` | `True` | Use windows authentication, instead of login |
528
- | `DATACONTRACT_SQLSERVER_TRUST_SERVER_CERTIFICATE` | `True` | Trust self-signed certificate |
529
- | `DATACONTRACT_SQLSERVER_ENCRYPTED_CONNECTION` | `True` | Use SSL |
529
+ | Environment Variable | Example| Description |
530
+ |---------------------------------------------------|--------|----------------------------------------------|
531
+ | `DATACONTRACT_SQLSERVER_USERNAME` | `root` | Username |
532
+ | `DATACONTRACT_SQLSERVER_PASSWORD` | `toor` | Password |
533
+ | `DATACONTRACT_SQLSERVER_TRUSTED_CONNECTION` | `True` | Use windows authentication, instead of login |
534
+ | `DATACONTRACT_SQLSERVER_TRUST_SERVER_CERTIFICATE` | `True` | Trust self-signed certificate |
535
+ | `DATACONTRACT_SQLSERVER_ENCRYPTED_CONNECTION` | `True` | Use SSL |
530
536
 
531
537
 
532
538
 
@@ -557,8 +563,8 @@ models:
557
563
 
558
564
  | Environment Variable | Example | Description |
559
565
  |----------------------------------------------|--------------------------------------|-------------------------------------------------------|
560
- | `DATACONTRACT_DATABRICKS_TOKEN` | `dapia00000000000000000000000000000` | The personal access token to authenticate |
561
- | `DATACONTRACT_DATABRICKS_HTTP_PATH` | `/sql/1.0/warehouses/b053a3ffffffff` | The HTTP path to the SQL warehouse or compute cluster |
566
+ | `DATACONTRACT_DATABRICKS_TOKEN` | `dapia00000000000000000000000000000` | The personal access token to authenticate |
567
+ | `DATACONTRACT_DATABRICKS_HTTP_PATH` | `/sql/1.0/warehouses/b053a3ffffffff` | The HTTP path to the SQL warehouse or compute cluster |
562
568
 
563
569
 
564
570
  ### Databricks (programmatic)
@@ -603,7 +609,7 @@ run.result
603
609
 
604
610
  Works with Spark DataFrames.
605
611
  DataFrames need to be created as named temporary views.
606
- Multiple temporary views are suppored if your data contract contains multiple models.
612
+ Multiple temporary views are supported if your data contract contains multiple models.
607
613
 
608
614
  Testing DataFrames is useful to test your datasets in a pipeline before writing them to a data source.
609
615
 
@@ -724,6 +730,35 @@ models:
724
730
  | `DATACONTRACT_POSTGRES_PASSWORD` | `mysecretpassword` | Password |
725
731
 
726
732
 
733
+ ### Trino
734
+
735
+ Data Contract CLI can test data in Trino.
736
+
737
+ #### Example
738
+
739
+ datacontract.yaml
740
+ ```yaml
741
+ servers:
742
+ trino:
743
+ type: trino
744
+ host: localhost
745
+ port: 8080
746
+ catalog: my_catalog
747
+ schema: my_schema
748
+ models:
749
+ my_table_1: # corresponds to a table
750
+ type: table
751
+ fields:
752
+ my_column_1: # corresponds to a column
753
+ type: varchar
754
+ ```
755
+
756
+ #### Environment Variables
757
+
758
+ | Environment Variable | Example | Description |
759
+ |-------------------------------|--------------------|-------------|
760
+ | `DATACONTRACT_TRINO_USERNAME` | `trino` | Username |
761
+ | `DATACONTRACT_TRINO_PASSWORD` | `mysecretpassword` | Password |
727
762
 
728
763
 
729
764
 
@@ -742,7 +777,7 @@ models:
742
777
  │ * --format [jsonschema|pydantic-model|sodacl|dbt|dbt-sources|db The export format. [default: None] [required] │
743
778
  │ t-staging-sql|odcs|rdf|avro|protobuf|great-expectati │
744
779
  │ ons|terraform|avro-idl|sql|sql-query|html|go|bigquer │
745
- │ y|dbml]
780
+ │ y|dbml|spark]
746
781
  │ --output PATH Specify the file path where the exported data will be │
747
782
  │ saved. If no path is provided, the output will be │
748
783
  │ printed to stdout. │
@@ -792,6 +827,7 @@ Available export options:
792
827
  | `go` | Export to Go types | ✅ |
793
828
  | `pydantic-model` | Export to pydantic models | ✅ |
794
829
  | `DBML` | Export to a DBML Diagram description | ✅ |
830
+ | `spark` | Export to a Spark StructType | ✅ |
795
831
  | Missing something? | Please create an issue on GitHub | TBD |
796
832
 
797
833
  #### Great Expectations
@@ -838,6 +874,10 @@ The export function converts the logical data types of the datacontract into the
838
874
  if a server is selected via the `--server` option (based on the `type` of that server). If no server is selected, the
839
875
  logical data types are exported.
840
876
 
877
+ #### Spark
878
+
879
+ The export function converts the data contract specification into a StructType Spark schema. The returned value is a Python code picture of the model schemas.
880
+ Spark DataFrame schema is defined as StructType. For more details about Spark Data Types please see [the spark documentation](https://spark.apache.org/docs/latest/sql-ref-datatypes.html)
841
881
 
842
882
  #### Avro
843
883
 
@@ -888,18 +928,19 @@ models:
888
928
  Create a data contract from the given source location. Prints to stdout.
889
929
 
890
930
  ╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
891
- │ * --format [sql|avro|glue|bigquery|jsonschema|unity] The format of the source file. [default: None] [required]
892
- --source TEXT The path to the file or Glue Database that should be imported. │
893
- [default: None]
894
- --glue-table TEXT List of table ids to import from the Glue Database (repeat for
895
- multiple table ids, leave empty for all tables in the dataset).
896
- [default: None]
897
- --bigquery-project TEXT The bigquery project id. [default: None]
898
- │ --bigquery-dataset TEXT The bigquery dataset id. [default: None] │
899
- │ --bigquery-table TEXT List of table ids to import from the bigquery API (repeat for
900
- multiple table ids, leave empty for all tables in the dataset).
901
- [default: None]
902
- --help Show this message and exit.
931
+ │ * --format [sql|avro|glue|bigquery|jsonschema| The format of the source file. [default: None] [required] |
932
+ unity|spark] |
933
+ --source TEXT The path to the file or Glue Database that should be imported.
934
+ [default: None]
935
+ --glue-table TEXT List of table ids to import from the Glue Database (repeat for
936
+ multiple table ids, leave empty for all tables in the dataset).
937
+ [default: None]
938
+ │ --bigquery-project TEXT The bigquery project id. [default: None] │
939
+ │ --bigquery-dataset TEXT The bigquery dataset id. [default: None]
940
+ --bigquery-table TEXT List of table ids to import from the bigquery API (repeat for
941
+ multiple table ids, leave empty for all tables in the dataset).
942
+ [default: None]
943
+ │ --help Show this message and exit. │
903
944
  ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
904
945
  ```
905
946
 
@@ -917,11 +958,11 @@ Available import options:
917
958
  | `avro` | Import from AVRO schemas | ✅ |
918
959
  | `glue` | Import from AWS Glue DataCatalog | ✅ |
919
960
  | `protobuf` | Import from Protobuf schemas | TBD |
920
- | `jsonschema` | Import from JSON Schemas | ✅ |
921
- | `bigquery` | Import from BigQuery Schemas | ✅ |
961
+ | `jsonschema` | Import from JSON Schemas | ✅ |
962
+ | `bigquery` | Import from BigQuery Schemas | ✅ |
922
963
  | `unity` | Import from Databricks Unity Catalog | partial |
923
964
  | `dbt` | Import from dbt models | TBD |
924
- | `odcs` | Import from Open Data Contract Standard (ODCS) | ✅ |
965
+ | `odcs` | Import from Open Data Contract Standard (ODCS) | ✅ |
925
966
  | Missing something? | Please create an issue on GitHub | TBD |
926
967
 
927
968
 
@@ -964,7 +1005,7 @@ export DATABRICKS_IMPORT_ACCESS_TOKEN=<token>
964
1005
  datacontract import --format unity --unity-table-full-name <table_full_name>
965
1006
  ```
966
1007
 
967
- ### Glue
1008
+ #### Glue
968
1009
 
969
1010
  Importing from Glue reads the necessary Data directly off of the AWS API.
970
1011
  You may give the `glue-table` parameter to enumerate the tables that should be imported. If no tables are given, _all_ available tables of the database will be imported.
@@ -981,6 +1022,15 @@ datacontract import --format glue --source <database_name> --glue-table <table_n
981
1022
  datacontract import --format glue --source <database_name>
982
1023
  ```
983
1024
 
1025
+ #### Spark
1026
+
1027
+ Importing from Spark table or view these must be created or accessible in the Spark context. Specify tables list in `source` parameter.
1028
+
1029
+ Example:
1030
+
1031
+ ```bash
1032
+ datacontract import --format spark --source "users,orders"
1033
+ ```
984
1034
 
985
1035
  ### breaking
986
1036
 
@@ -990,11 +1040,11 @@ datacontract import --format glue --source <database_name>
990
1040
  Identifies breaking changes between data contracts. Prints to stdout.
991
1041
 
992
1042
  ╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
993
- │ * location_old TEXT The location (url or path) of the old data contract yaml. [default: None] [required]
994
- │ * location_new TEXT The location (url or path) of the new data contract yaml. [default: None] [required]
1043
+ │ * location_old TEXT The location (url or path) of the old data contract yaml. [default: None] [required]
1044
+ │ * location_new TEXT The location (url or path) of the new data contract yaml. [default: None] [required]
995
1045
  ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
996
1046
  ╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
997
- │ --help Show this message and exit.
1047
+ │ --help Show this message and exit.
998
1048
  ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
999
1049
  ```
1000
1050
 
@@ -1006,11 +1056,11 @@ datacontract import --format glue --source <database_name>
1006
1056
  Generate a changelog between data contracts. Prints to stdout.
1007
1057
 
1008
1058
  ╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
1009
- │ * location_old TEXT The location (url or path) of the old data contract yaml. [default: None] [required]
1010
- │ * location_new TEXT The location (url or path) of the new data contract yaml. [default: None] [required]
1059
+ │ * location_old TEXT The location (url or path) of the old data contract yaml. [default: None] [required]
1060
+ │ * location_new TEXT The location (url or path) of the new data contract yaml. [default: None] [required]
1011
1061
  ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
1012
1062
  ╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
1013
- │ --help Show this message and exit.
1063
+ │ --help Show this message and exit.
1014
1064
  ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
1015
1065
  ```
1016
1066
 
@@ -1022,8 +1072,8 @@ datacontract import --format glue --source <database_name>
1022
1072
  PLACEHOLDER. Currently works as 'changelog' does.
1023
1073
 
1024
1074
  ╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
1025
- │ * location_old TEXT The location (url or path) of the old data contract yaml. [default: None] [required]
1026
- │ * location_new TEXT The location (url or path) of the new data contract yaml. [default: None] [required]
1075
+ │ * location_old TEXT The location (url or path) of the old data contract yaml. [default: None] [required]
1076
+ │ * location_new TEXT The location (url or path) of the new data contract yaml. [default: None] [required]
1027
1077
  ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
1028
1078
  ╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
1029
1079
  │ --help Show this message and exit. │
@@ -1039,9 +1089,9 @@ datacontract import --format glue --source <database_name>
1039
1089
  Create an html catalog of data contracts.
1040
1090
 
1041
1091
  ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
1042
- │ --files TEXT Glob pattern for the data contract files to include in the catalog. [default: *.yaml]
1043
- │ --output TEXT Output directory for the catalog html files. [default: catalog/]
1044
- │ --help Show this message and exit.
1092
+ │ --files TEXT Glob pattern for the data contract files to include in the catalog. [default: *.yaml]
1093
+ │ --output TEXT Output directory for the catalog html files. [default: catalog/]
1094
+ │ --help Show this message and exit.
1045
1095
  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
1046
1096
  ```
1047
1097
 
@@ -1063,19 +1113,22 @@ datacontract import --format glue --source <database_name>
1063
1113
 
1064
1114
  ## Integrations
1065
1115
 
1066
- | Integration | Option | Description |
1067
- |-------------------|------------------------------|-------------------------------------------------------------------------------------------------------|
1068
- | Data Mesh Manager | `--publish` | Push full results to the [Data Mesh Manager API](https://api.datamesh-manager.com/swagger/index.html) |
1069
- | OpenTelemetry | `--publish-to-opentelemetry` | Push result as gauge metrics |
1116
+ | Integration | Option | Description |
1117
+ |-----------------------|------------------------------|---------------------------------------------------------------------------------------------------------------|
1118
+ | Data Mesh Manager | `--publish` | Push full results to the [Data Mesh Manager API](https://api.datamesh-manager.com/swagger/index.html) |
1119
+ | Data Contract Manager | `--publish` | Push full results to the [Data Contract Manager API](https://api.datacontract-manager.com/swagger/index.html) |
1120
+ | OpenTelemetry | `--publish-to-opentelemetry` | Push result as gauge metrics |
1070
1121
 
1071
1122
  ### Integration with Data Mesh Manager
1072
1123
 
1073
- If you use [Data Mesh Manager](https://datamesh-manager.com/), you can use the data contract URL and append the `--publish` option to send and display the test results. Set an environment variable for your API key.
1124
+ If you use [Data Mesh Manager](https://datamesh-manager.com/) or [Data Contract Manager](https://datacontract-manager.com/), you can use the data contract URL and append the `--publish` option to send and display the test results. Set an environment variable for your API key.
1074
1125
 
1075
1126
  ```bash
1076
1127
  # Fetch current data contract, execute tests on production, and publish result to data mesh manager
1077
1128
  $ EXPORT DATAMESH_MANAGER_API_KEY=xxx
1078
- $ datacontract test https://demo.datamesh-manager.com/demo279750347121/datacontracts/4df9d6ee-e55d-4088-9598-b635b2fdcbbc/datacontract.yaml --server production --publish
1129
+ $ datacontract test https://demo.datamesh-manager.com/demo279750347121/datacontracts/4df9d6ee-e55d-4088-9598-b635b2fdcbbc/datacontract.yaml \
1130
+ --server production \
1131
+ --publish https://api.datamesh-manager.com/api/test-results
1079
1132
  ```
1080
1133
 
1081
1134
  ### Integration with OpenTelemetry
@@ -1085,12 +1138,12 @@ If you use OpenTelemetry, you can use the data contract URL and append the `--pu
1085
1138
  The metric name is "datacontract.cli.test.result" and it uses the following encoding for the result:
1086
1139
 
1087
1140
  | datacontract.cli.test.result | Description |
1088
- |-------|---------------------------------------|
1089
- | 0 | test run passed, no warnings |
1090
- | 1 | test run has warnings |
1091
- | 2 | test run failed |
1092
- | 3 | test run not possible due to an error |
1093
- | 4 | test status unknown |
1141
+ |------------------------------|---------------------------------------|
1142
+ | 0 | test run passed, no warnings |
1143
+ | 1 | test run has warnings |
1144
+ | 2 | test run failed |
1145
+ | 3 | test run not possible due to an error |
1146
+ | 4 | test status unknown |
1094
1147
 
1095
1148
 
1096
1149
  ```bash
@@ -1118,7 +1171,7 @@ Create a data contract based on the actual data. This is the fastest way to get
1118
1171
 
1119
1172
  1. Use an existing physical schema (e.g., SQL DDL) as a starting point to define your logical data model in the contract. Double check right after the import whether the actual data meets the imported logical data model. Just to be sure.
1120
1173
  ```bash
1121
- $ datacontract import --format sql ddl.sql
1174
+ $ datacontract import --format sql --source ddl.sql
1122
1175
  $ datacontract test
1123
1176
  ```
1124
1177
 
@@ -1141,7 +1194,7 @@ Create a data contract based on the actual data. This is the fastest way to get
1141
1194
 
1142
1195
  5. Set up a CI pipeline that executes daily and reports the results to the [Data Mesh Manager](https://datamesh-manager.com). Or to some place else. You can even publish to any opentelemetry compatible system.
1143
1196
  ```bash
1144
- $ datacontract test --publish https://api.datamesh-manager.com/api/runs
1197
+ $ datacontract test --publish https://api.datamesh-manager.com/api/test-results
1145
1198
  ```
1146
1199
 
1147
1200
  ### Contract-First
@@ -1193,7 +1246,7 @@ Examples: adding models or fields
1193
1246
  - Add the models or fields in the datacontract.yaml
1194
1247
  - Increment the minor version of the datacontract.yaml on any change. Simply edit the datacontract.yaml for this.
1195
1248
  - You need a policy that these changes are non-breaking. That means that one cannot use the star expression in SQL to query a table under contract. Make the consequences known.
1196
- - Fail the build in the Pull Request if a datacontract.yaml accidentially adds a breaking change even despite only a minor version change
1249
+ - Fail the build in the Pull Request if a datacontract.yaml accidentally adds a breaking change even despite only a minor version change
1197
1250
  ```bash
1198
1251
  $ datacontract breaking datacontract-from-pr.yaml datacontract-from-main.yaml
1199
1252
  ```
@@ -1214,6 +1267,121 @@ Examples: Removing or renaming models and fields.
1214
1267
  $ datacontract changelog datacontract-from-pr.yaml datacontract-from-main.yaml
1215
1268
  ```
1216
1269
 
1270
+ ## Customizing Exporters and Importers
1271
+
1272
+ ### Custom Exporter
1273
+ Using the exporter factory to add a new custom exporter
1274
+ ```python
1275
+
1276
+ from datacontract.data_contract import DataContract
1277
+ from datacontract.export.exporter import Exporter
1278
+ from datacontract.export.exporter_factory import exporter_factory
1279
+
1280
+
1281
+ # Create a custom class that implements export method
1282
+ class CustomExporter(Exporter):
1283
+ def export(self, data_contract, model, server, sql_server_type, export_args) -> dict:
1284
+ result = {
1285
+ "title": data_contract.info.title,
1286
+ "version": data_contract.info.version,
1287
+ "description": data_contract.info.description,
1288
+ "email": data_contract.info.contact.email,
1289
+ "url": data_contract.info.contact.url,
1290
+ "model": model,
1291
+ "model_columns": ", ".join(list(data_contract.models.get(model).fields.keys())),
1292
+ "export_args": export_args,
1293
+ "custom_args": export_args.get("custom_arg", ""),
1294
+ }
1295
+ return result
1296
+
1297
+
1298
+ # Register the new custom class into factory
1299
+ exporter_factory.register_exporter("custom", CustomExporter)
1300
+
1301
+
1302
+ if __name__ == "__main__":
1303
+ # Create a DataContract instance
1304
+ data_contract = DataContract(
1305
+ data_contract_file="/path/datacontract.yaml"
1306
+ )
1307
+ # call export
1308
+ result = data_contract.export(
1309
+ export_format="custom", model="orders", server="production", custom_arg="my_custom_arg"
1310
+ )
1311
+ print(result)
1312
+
1313
+ ```
1314
+ Output
1315
+ ```python
1316
+ {
1317
+ 'title': 'Orders Unit Test',
1318
+ 'version': '1.0.0',
1319
+ 'description': 'The orders data contract',
1320
+ 'email': 'team-orders@example.com',
1321
+ 'url': 'https://wiki.example.com/teams/checkout',
1322
+ 'model': 'orders',
1323
+ 'model_columns': 'order_id, order_total, order_status',
1324
+ 'export_args': {'server': 'production', 'custom_arg': 'my_custom_arg'},
1325
+ 'custom_args': 'my_custom_arg'
1326
+ }
1327
+ ```
1328
+
1329
+ ### Custom Importer
1330
+ Using the importer factory to add a new custom importer
1331
+ ```python
1332
+
1333
+ from datacontract.model.data_contract_specification import DataContractSpecification
1334
+ from datacontract.data_contract import DataContract
1335
+ from datacontract.imports.importer import Importer
1336
+ from datacontract.imports.importer_factory import importer_factory
1337
+ import json
1338
+
1339
+ # Create a custom class that implements import_source method
1340
+ class CustomImporter(Importer):
1341
+ def import_source(
1342
+ self, data_contract_specification: DataContractSpecification, source: str, import_args: dict
1343
+ ) -> dict:
1344
+ source_dict = json.loads(source)
1345
+ data_contract_specification.id = source_dict.get("id_custom")
1346
+ data_contract_specification.info.title = source_dict.get("title")
1347
+ data_contract_specification.info.description = source_dict.get("description_from_app")
1348
+
1349
+ return data_contract_specification
1350
+
1351
+
1352
+ # Register the new custom class into factory
1353
+ importer_factory.register_importer("custom_company_importer", CustomImporter)
1354
+
1355
+
1356
+ if __name__ == "__main__":
1357
+ # get a custom da
1358
+ json_from_custom_app = '{"id_custom":"uuid-custom","version":"0.0.2", "title":"my_custom_imported_data", "description_from_app": "Custom contract description"}'
1359
+ # Create a DataContract instance
1360
+ data_contract = DataContract()
1361
+
1362
+ # call import_from
1363
+ result = data_contract.import_from_source(
1364
+ format="custom_company_importer", data_contract_specification=DataContract.init(), source=json_from_custom_app
1365
+ )
1366
+ print(dict(result))
1367
+
1368
+ ```
1369
+ Output
1370
+
1371
+ ```python
1372
+ {
1373
+ 'dataContractSpecification': '0.9.3',
1374
+ 'id': 'uuid-custom',
1375
+ 'info': Info(title='my_custom_imported_data', version='0.0.1', status=None, description='Custom contract description', owner=None, contact=None),
1376
+ 'servers': {},
1377
+ 'terms': None,
1378
+ 'models': {},
1379
+ 'definitions': {},
1380
+ 'examples': [],
1381
+ 'quality': None,
1382
+ 'servicelevels': None
1383
+ }
1384
+ ```
1217
1385
  ## Development Setup
1218
1386
 
1219
1387
  Python base interpreter should be 3.11.x (unless working on 3.12 release candidate).
@@ -1263,7 +1431,27 @@ docker compose run --rm datacontract --version
1263
1431
 
1264
1432
  This command runs the container momentarily to check the version of the `datacontract` CLI. The `--rm` flag ensures that the container is automatically removed after the command executes, keeping your environment clean.
1265
1433
 
1434
+ ## Use with pre-commit
1435
+
1436
+ To run `datacontract-cli` as part of a [pre-commit](https://pre-commit.com/) workflow, add something like the below to the `repos` list in the project's `.pre-commit-config.yaml`:
1437
+
1438
+ ```yaml
1439
+ repos:
1440
+ - repo: https://github.com/datacontract/datacontract-cli
1441
+ rev: "v0.10.9"
1442
+ hooks:
1443
+ - id: datacontract-lint
1444
+ - id: datacontract-test
1445
+ args: ["--server", "production"]
1446
+ ```
1447
+
1448
+ ### Available Hook IDs
1266
1449
 
1450
+ | Hook ID | Description | Dependency |
1451
+ | ----------------- | -------------------------------------------------- | ---------- |
1452
+ | datacontract-lint | Runs the lint subcommand. | Python3 |
1453
+ | datacontract-test | Runs the test subcommand. Please look at | Python3 |
1454
+ | | [test](#test) section for all available arguments. | |
1267
1455
 
1268
1456
  ## Release Steps
1269
1457
 
@@ -1285,7 +1473,7 @@ We are happy to receive your contributions. Propose your change in an issue or d
1285
1473
 
1286
1474
  ## Related Tools
1287
1475
 
1288
- - [Data Mesh Manager](https://www.datamesh-manager.com/) is a commercial tool to manage data products and data contracts. It supports the data contract specification and allows the user to import or export data contracts using this specification.
1476
+ - [Data Contract Manager](https://www.datacontract-manager.com/) is a commercial tool to manage data contracts. It contains a web UI, access management, and data governance for a full enterprise data marketplace.
1289
1477
  - [Data Contract GPT](https://gpt.datacontract.com) is a custom GPT that can help you write data contracts.
1290
1478
  - [Data Contract Editor](https://editor.datacontract.com) is an editor for Data Contracts, including a live html preview.
1291
1479