datacontract-x 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- datacontract_x-0.1.0/LICENSE +21 -0
- datacontract_x-0.1.0/PKG-INFO +328 -0
- datacontract_x-0.1.0/README.md +290 -0
- datacontract_x-0.1.0/datacontract_x.egg-info/PKG-INFO +328 -0
- datacontract_x-0.1.0/datacontract_x.egg-info/SOURCES.txt +39 -0
- datacontract_x-0.1.0/datacontract_x.egg-info/dependency_links.txt +1 -0
- datacontract_x-0.1.0/datacontract_x.egg-info/entry_points.txt +2 -0
- datacontract_x-0.1.0/datacontract_x.egg-info/requires.txt +13 -0
- datacontract_x-0.1.0/datacontract_x.egg-info/top_level.txt +1 -0
- datacontract_x-0.1.0/dcx/__init__.py +10 -0
- datacontract_x-0.1.0/dcx/api.py +1277 -0
- datacontract_x-0.1.0/dcx/apply/__init__.py +10 -0
- datacontract_x-0.1.0/dcx/apply/snowflake.py +623 -0
- datacontract_x-0.1.0/dcx/cli.py +129 -0
- datacontract_x-0.1.0/dcx/enrich/__init__.py +41 -0
- datacontract_x-0.1.0/dcx/enrich/all.py +151 -0
- datacontract_x-0.1.0/dcx/enrich/base.py +216 -0
- datacontract_x-0.1.0/dcx/enrich/columns.py +429 -0
- datacontract_x-0.1.0/dcx/enrich/quality.py +395 -0
- datacontract_x-0.1.0/dcx/enrich/tags.py +482 -0
- datacontract_x-0.1.0/dcx/exporters/__init__.py +6 -0
- datacontract_x-0.1.0/dcx/exporters/command.py +111 -0
- datacontract_x-0.1.0/dcx/exporters/snowflake.py +523 -0
- datacontract_x-0.1.0/dcx/import_commands.py +163 -0
- datacontract_x-0.1.0/dcx/importers/__init__.py +7 -0
- datacontract_x-0.1.0/dcx/importers/kafka.py +143 -0
- datacontract_x-0.1.0/dcx/importers/registry.py +163 -0
- datacontract_x-0.1.0/dcx/importers/snowflake.py +487 -0
- datacontract_x-0.1.0/dcx/serve.py +33 -0
- datacontract_x-0.1.0/dcx/target/__init__.py +10 -0
- datacontract_x-0.1.0/dcx/target/command.py +1257 -0
- datacontract_x-0.1.0/pyproject.toml +74 -0
- datacontract_x-0.1.0/setup.cfg +4 -0
- datacontract_x-0.1.0/tests/test_api.py +567 -0
- datacontract_x-0.1.0/tests/test_apply.py +597 -0
- datacontract_x-0.1.0/tests/test_enrich.py +1146 -0
- datacontract_x-0.1.0/tests/test_kafka_import.py +144 -0
- datacontract_x-0.1.0/tests/test_smoke.py +19 -0
- datacontract_x-0.1.0/tests/test_snowflake_export.py +616 -0
- datacontract_x-0.1.0/tests/test_snowflake_import.py +515 -0
- datacontract_x-0.1.0/tests/test_target.py +477 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 dcx contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,328 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: datacontract-x
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Data Contract eXtended — AI-native, platform-extensible data contracts: LLM enrichment (descriptions, tags, data quality), live import, and apply. Built on datacontract-cli.
|
|
5
|
+
Author: MickaelBZH
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/MickaelBZH/data-contract-x
|
|
8
|
+
Project-URL: Repository, https://github.com/MickaelBZH/data-contract-x
|
|
9
|
+
Project-URL: Issues, https://github.com/MickaelBZH/data-contract-x/issues
|
|
10
|
+
Keywords: data-contract,odcs,data-quality,data-governance,data-catalog,llm,ai,datacontract-cli,metadata,tagging,snowflake
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Environment :: Console
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: Intended Audience :: Information Technology
|
|
15
|
+
Classifier: Operating System :: OS Independent
|
|
16
|
+
Classifier: Programming Language :: Python :: 3 :: Only
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
20
|
+
Classifier: Topic :: Database
|
|
21
|
+
Classifier: Topic :: Software Development :: Quality Assurance
|
|
22
|
+
Requires-Python: <3.13,>=3.10
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
License-File: LICENSE
|
|
25
|
+
Requires-Dist: datacontract-cli<0.13,>=0.12.4
|
|
26
|
+
Requires-Dist: typer<0.26,>=0.18.0
|
|
27
|
+
Requires-Dist: open-data-contract-standard<4.0.0,>=3.1.2
|
|
28
|
+
Requires-Dist: fastapi<0.137.0,>=0.115.0
|
|
29
|
+
Requires-Dist: uvicorn<1.0,>=0.30
|
|
30
|
+
Requires-Dist: pyyaml<7.0,>=6.0
|
|
31
|
+
Requires-Dist: snowflake-connector-python<4.0,>=3.0
|
|
32
|
+
Requires-Dist: litellm<2.0,>=1.50
|
|
33
|
+
Provides-Extra: dev
|
|
34
|
+
Requires-Dist: pytest>=8.0; extra == "dev"
|
|
35
|
+
Requires-Dist: ruff>=0.6; extra == "dev"
|
|
36
|
+
Requires-Dist: httpx<1.0,>=0.27; extra == "dev"
|
|
37
|
+
Dynamic: license-file
|
|
38
|
+
|
|
39
|
+
<p align="center">
|
|
40
|
+
<img src="assets/logo.svg" alt="dcx — Data Contract eXtended" width="520">
|
|
41
|
+
</p>
|
|
42
|
+
|
|
43
|
+
<h3 align="center">Data Contract e<strong>X</strong>tended — AI-native, platform-extensible data contracts</h3>
|
|
44
|
+
|
|
45
|
+
<p align="center">
|
|
46
|
+
Author data contracts with an LLM, sync them with your live platforms.<br>
|
|
47
|
+
A lean, no-fork extension of <a href="https://github.com/datacontract/datacontract-cli">datacontract-cli</a>, built on the <a href="https://bitol.io/">Open Data Contract Standard (ODCS)</a>.
|
|
48
|
+
</p>
|
|
49
|
+
|
|
50
|
+
<p align="center">
|
|
51
|
+
<img alt="PyPI" src="https://img.shields.io/pypi/v/datacontract-x?color=6366F1&label=pypi">
|
|
52
|
+
<img alt="Python" src="https://img.shields.io/badge/python-3.10%20|%203.11%20|%203.12-3776AB?logo=python&logoColor=white">
|
|
53
|
+
<img alt="License: MIT" src="https://img.shields.io/badge/license-MIT-22c55e">
|
|
54
|
+
<img alt="ODCS" src="https://img.shields.io/badge/ODCS-v3.1-0EA5E9">
|
|
55
|
+
<img alt="Built on datacontract-cli" src="https://img.shields.io/badge/built%20on-datacontract--cli-6366F1">
|
|
56
|
+
</p>
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## What is dcx?
|
|
61
|
+
|
|
62
|
+
**dcx (Data Contract eXtended)** adds three things to the Open Data Contract Standard workflow that plain datacontract-cli doesn't do:
|
|
63
|
+
|
|
64
|
+
1. **AI authoring** — use an LLM to enrich a contract with column descriptions, validation constraints, governance **tags** from your own catalog, and an executable **data-quality** suite.
|
|
65
|
+
2. **Live import** — build a contract *from* a running system (its real columns, keys, comments, tags).
|
|
66
|
+
3. **Apply** — push the contract's governance *back* to the platform (comments, tags, data-quality, and the table itself).
|
|
67
|
+
|
|
68
|
+
It's **platform-extensible by design**: each platform is a small importer / exporter / apply module that plugs into datacontract-cli's factories. **Snowflake is the first end-to-end platform** (import → enrich → apply), with Kafka import today and more platforms built to slot in the same way.
|
|
69
|
+
|
|
70
|
+
The pipeline is: **import** a live schema into an ODCS contract → **enrich** it (columns · tags · quality) → **apply** it back to the platform, or **export** it to SQL / docs / schemas. Everything is available both as a **CLI** and as a **REST API** (`dcx api`).
|
|
71
|
+
|
|
72
|
+
## Why dcx?
|
|
73
|
+
|
|
74
|
+
- 🧠 **AI authoring that's safe to ship.** Forced tool-calling, `temperature=0`, and strict server-side validation against the ODCS schema — the model can only produce spec-valid output, never free-form guesses.
|
|
75
|
+
- 🏷️ **A tag *manager*, not a tag guesser.** You define a controlled [tag catalog](#the-tag-catalog) (names, allowed values, examples); the LLM classifies columns into *your* vocabulary, with optional defaults.
|
|
76
|
+
- ✅ **Executable, portable data quality.** Quality rules prefer ODCS `library` metrics (portable, mappable to platform-native checks) and fall back to portable `sql` checks — across all seven ODCS dimensions.
|
|
77
|
+
- 🔌 **Any LLM provider.** Powered by [litellm](https://github.com/BerriAI/litellm) — Anthropic, OpenAI, Azure, Bedrock, Gemini, Ollama, … behind one `--model` flag.
|
|
78
|
+
- 🧩 **Pluggable platforms, no fork.** You keep all 30+ upstream importers/exporters and `lint` / `test` / `changelog`, and gain the AI + platform layer on top.
|
|
79
|
+
- 🔐 **Auth that makes sense per surface.** Live platform operations over the API use **caller-supplied OAuth**; secrets are never CLI flags.
|
|
80
|
+
|
|
81
|
+
## Install
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
pip install datacontract-x
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
The import package and CLI are both `dcx`:
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
dcx --help
|
|
91
|
+
dcx info
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
From source (for development):
|
|
95
|
+
|
|
96
|
+
```bash
|
|
97
|
+
git clone https://github.com/MickaelBZH/data-contract-x.git
|
|
98
|
+
cd data-contract-x
|
|
99
|
+
pip install -e ".[dev]"
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
> Requires Python 3.10–3.12. Installing pulls in `datacontract-cli`, `litellm`, FastAPI, and the platform connectors automatically.
|
|
103
|
+
|
|
104
|
+
## Quickstart
|
|
105
|
+
|
|
106
|
+
The full loop — import a live schema, enrich it with an LLM, sync it back. Snowflake here is the example platform.
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
# 1. Import an existing schema into a contract (real columns, PKs, comments, tags)
|
|
110
|
+
dcx import snowflake --database MY_DB --schema LOAD --authenticator externalbrowser --output contract.yaml
|
|
111
|
+
|
|
112
|
+
# 2. Enrich with an LLM: descriptions + constraints + tags + data-quality tests
|
|
113
|
+
export ANTHROPIC_API_KEY=... # or OPENAI_API_KEY / AZURE_API_KEY / ...
|
|
114
|
+
dcx enrich all contract.yaml --catalog tags_catalog.yaml --output contract.enriched.yaml
|
|
115
|
+
|
|
116
|
+
# 3. Preview exactly what will run — no connection needed
|
|
117
|
+
dcx apply snowflake contract.enriched.yaml --include-quality --dry-run
|
|
118
|
+
|
|
119
|
+
# 4. Apply it: creates the table if missing, governs it (comments + tags + DQ) if it exists
|
|
120
|
+
dcx apply snowflake contract.enriched.yaml --include-quality
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## Commands
|
|
126
|
+
|
|
127
|
+
Every command is `dcx <command>`, and most are mirrored to a REST endpoint when you run [`dcx api`](#rest-api). Each section below lists the sub-commands, a CLI example, and the matching API call. Run `dcx <command> --help` for the full option list.
|
|
128
|
+
|
|
129
|
+
### `import` — build a contract from a source
|
|
130
|
+
|
|
131
|
+
| Sub-command | Source |
|
|
132
|
+
|---|---|
|
|
133
|
+
| `dcx import snowflake` | A live Snowflake schema (columns, primary keys, comments, tags) |
|
|
134
|
+
| `dcx import kafka` | A Kafka topic's value schema (Confluent Schema Registry) |
|
|
135
|
+
| `dcx import <format>` | A file/document — `sql`, `avro`, `dbml`, `glue`, `bigquery`, `unity`, `jsonschema`, `json`, `odcs`, `parquet`, `csv`, `protobuf`, `spark`, `iceberg`, `excel`, `dbt` |
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
dcx import snowflake --database MY_DB --schema LOAD --authenticator externalbrowser --output contract.yaml
|
|
139
|
+
dcx import kafka --schema-registry https://sr:8081 --topic orders --output contract.yaml
|
|
140
|
+
dcx import sql --source schema.sql --dialect snowflake --output contract.yaml
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
**API**
|
|
144
|
+
- `POST /import/snowflake` — live import, authenticated by the caller's Snowflake OAuth token (`Authorization: Bearer <token>`).
|
|
145
|
+
- `POST /import/{format}` — file-based importers; send the document inline as `source_content`.
|
|
146
|
+
- *(Kafka import is CLI-only.)*
|
|
147
|
+
|
|
148
|
+
### `enrich` — AI authoring with an LLM
|
|
149
|
+
|
|
150
|
+
| Sub-command | Adds |
|
|
151
|
+
|---|---|
|
|
152
|
+
| `dcx enrich columns` | Business descriptions, `logicalTypeOptions` constraints, `required` / `unique` flags |
|
|
153
|
+
| `dcx enrich tags` | Governance tags, classified against your [tag catalog](#the-tag-catalog) |
|
|
154
|
+
| `dcx enrich quality` | An executable data-quality suite across all ODCS dimensions |
|
|
155
|
+
| `dcx enrich all` | columns → tags → quality, in that order so each stage grounds the next |
|
|
156
|
+
|
|
157
|
+
Each sub-command is independent and idempotent (existing values are preserved unless you pass `--overwrite`). The provider key is read from the environment — there is no `--api-key` flag. Use `--model` for any litellm model and `--base-url` for a proxy / Azure / Ollama endpoint.
|
|
158
|
+
|
|
159
|
+
```bash
|
|
160
|
+
dcx enrich columns contract.yaml --output contract.enriched.yaml
|
|
161
|
+
dcx enrich tags contract.yaml --catalog tags_catalog.yaml --output contract.tagged.yaml
|
|
162
|
+
dcx enrich quality contract.yaml --model gpt-4o --output contract.dq.yaml
|
|
163
|
+
dcx enrich all contract.yaml --catalog tags_catalog.yaml --output contract.full.yaml
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
**API** (the LLM key comes from the *server's* environment)
|
|
167
|
+
- `POST /enrich/columns` · `POST /enrich/quality`
|
|
168
|
+
- `POST /enrich/tags` · `POST /enrich/all` — take the tag catalog inline in the request body.
|
|
169
|
+
|
|
170
|
+
### `export` — convert a contract to a target format
|
|
171
|
+
|
|
172
|
+
| Sub-command | Output |
|
|
173
|
+
|---|---|
|
|
174
|
+
| `dcx export snowflake-full` | A Snowflake setup script: DDL + tags + Data Metric Functions, in one file |
|
|
175
|
+
| `dcx export <format>` | Any upstream format — `sql`, `jsonschema`, `html`, `markdown`, `mermaid`, `dbt-*`, `avro`, `protobuf`, `bigquery`, `spark`, `sqlalchemy`, `iceberg`, `sodacl`, `great-expectations`, `dbml`, `pydantic-model`, `odcs`, `rdf`, `go`, `excel`, … |
|
|
176
|
+
|
|
177
|
+
`snowflake-full` options: `--include-tags`, `--include-quality`, `--create-tags`, `--tag-namespace DB.SCHEMA`, and `--structured-types` (render nested columns as typed `OBJECT(field type, …)` / `ARRAY(type)`).
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
dcx export snowflake-full contract.yaml --include-quality --create-tags --output setup.sql
|
|
181
|
+
dcx export html contract.yaml --output contract.html
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
**API**
|
|
185
|
+
- `POST /export/{format}` — including `POST /export/snowflake-full`. The response media type depends on the format (JSON / YAML / text / binary).
|
|
186
|
+
|
|
187
|
+
### `apply` — push governance to a live platform
|
|
188
|
+
|
|
189
|
+
| Sub-command | Target |
|
|
190
|
+
|---|---|
|
|
191
|
+
| `dcx apply snowflake` | A live Snowflake account |
|
|
192
|
+
|
|
193
|
+
With the default `--ddl-mode auto` you don't need to know whether the table exists: **missing tables are created** (`CREATE TABLE IF NOT EXISTS`) and **existing ones are governed** — column/table comments, tags, and (with `--include-quality`) data-quality metrics. For existing tables, dcx also **compares the live schema to the contract** and reports drift as warnings — or, with `--strict`, an error that aborts before any change (the check uses `DESCRIBE TABLE`, so it needs no active warehouse).
|
|
194
|
+
|
|
195
|
+
| Option | Effect |
|
|
196
|
+
|---|---|
|
|
197
|
+
| `--ddl-mode auto\|always\|never` | create-if-missing-then-govern (default) · always `CREATE TABLE` · govern existing only |
|
|
198
|
+
| `--strict` | fail instead of warn on schema drift |
|
|
199
|
+
| `--structured-types` | typed nested `OBJECT(...)` / `ARRAY(...)` |
|
|
200
|
+
| `--include-quality` · `--create-tags` · `--tag-namespace` | data-metric functions · `CREATE TAG IF NOT EXISTS` · qualify tag refs |
|
|
201
|
+
| `--dry-run` | print the SQL without connecting |
|
|
202
|
+
|
|
203
|
+
```bash
|
|
204
|
+
dcx apply snowflake contract.yaml --dry-run # preview
|
|
205
|
+
dcx apply snowflake contract.yaml --include-quality # create-or-govern
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
**API**
|
|
209
|
+
- `POST /apply/snowflake` — authenticated by the caller's Snowflake OAuth token. Supports `dry_run`, `ddl_mode`, `strict`, `structured_types`, … and returns the executed SQL plus any drift `warnings`.
|
|
210
|
+
|
|
211
|
+
### `target` — bind a contract to a platform
|
|
212
|
+
|
|
213
|
+
`dcx target <type>` sets the contract's server block and resolves each column's `physicalType` for that platform. ~30 types: `snowflake`, `bigquery`, `databricks`, `postgres`, `redshift`, `mysql`, `sqlserver`, `oracle`, `s3`, `kafka`, `trino`, `athena`, `glue`, `duckdb`, `local`, …
|
|
214
|
+
|
|
215
|
+
```bash
|
|
216
|
+
dcx target snowflake contract.yaml --output contract.snowflake.yaml
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
**API**
|
|
220
|
+
- `POST /target/{type}` — one route per supported platform type.
|
|
221
|
+
|
|
222
|
+
### From datacontract-cli
|
|
223
|
+
|
|
224
|
+
These commands work unchanged — `dcx <command>` behaves exactly like `datacontract <command>`.
|
|
225
|
+
|
|
226
|
+
| Command | Sub-commands | Purpose | API |
|
|
227
|
+
|---|---|---|---|
|
|
228
|
+
| `dcx init` | — | Create an empty data contract | — |
|
|
229
|
+
| `dcx lint` | — | Validate a contract against the ODCS schema | `POST /lint` |
|
|
230
|
+
| `dcx test` | — | Run schema + data-quality tests against a configured server | `POST /test` |
|
|
231
|
+
| `dcx ci` | — | `test` for CI/CD — emits GitHub Actions annotations | — |
|
|
232
|
+
| `dcx changelog` | — | Semantic changelog between two contract versions | `POST /changelog` |
|
|
233
|
+
| `dcx catalog` | — | Render an HTML catalog of many contracts | — |
|
|
234
|
+
| `dcx publish` | — | Publish a contract to Entropy Data | — |
|
|
235
|
+
| `dcx dbt` | `sync` | Sync contracts into a dbt project | — |
|
|
236
|
+
|
|
237
|
+
### `api` / `info`
|
|
238
|
+
|
|
239
|
+
```bash
|
|
240
|
+
dcx api --port 4242 # start the REST server (Swagger UI at /docs)
|
|
241
|
+
dcx info # show dcx + datacontract-cli versions (API: GET /info)
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
---
|
|
245
|
+
|
|
246
|
+
## The tag catalog
|
|
247
|
+
|
|
248
|
+
`dcx enrich tags` does **controlled-vocabulary** tagging: instead of letting the model invent tags, you give it a catalog of allowed names and values, and it classifies each column into that vocabulary. The catalog is a small YAML (or JSON) file — the only extra input auto-tagging needs.
|
|
249
|
+
|
|
250
|
+
```yaml
|
|
251
|
+
# tags_catalog.yaml
|
|
252
|
+
tags:
|
|
253
|
+
- name: DATA_CLASSIFICATION # the tag name (becomes the platform TAG name)
|
|
254
|
+
description: > # tells the model what this tag is for
|
|
255
|
+
Data sensitivity level. Assign exactly one — the highest level that applies.
|
|
256
|
+
multiple: false # false = at most one value per column; true = many
|
|
257
|
+
values:
|
|
258
|
+
- value: PUBLIC # the model may only pick from these values
|
|
259
|
+
description: Non-sensitive data that can be shared freely.
|
|
260
|
+
examples: [country_code, currency, language, product_category] # guide classification
|
|
261
|
+
- value: INTERNAL
|
|
262
|
+
description: Internal business data, not for public release. The default.
|
|
263
|
+
default: true # assigned when the model picks nothing else
|
|
264
|
+
examples: [order_id, status, created_at, loyalty_points]
|
|
265
|
+
- value: CONFIDENTIAL
|
|
266
|
+
description: Personal data or sensitive business data; need-to-know access.
|
|
267
|
+
examples: [full_name, email, phone, home_address, date_of_birth]
|
|
268
|
+
- value: RESTRICTED
|
|
269
|
+
description: Highly sensitive data under legal/regulatory controls (financial, health, credentials, IDs).
|
|
270
|
+
examples: [national_id, passport_number, iban, credit_card_number, health_status]
|
|
271
|
+
|
|
272
|
+
- name: DATA_DOMAIN # you can define several tags
|
|
273
|
+
description: The business domain that owns the column.
|
|
274
|
+
multiple: false
|
|
275
|
+
values:
|
|
276
|
+
- value: CUSTOMER
|
|
277
|
+
examples: [customer_id, email, loyalty_points]
|
|
278
|
+
- value: FINANCE
|
|
279
|
+
examples: [amount, currency, invoice_id, iban]
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
| Field | Meaning |
|
|
283
|
+
|---|---|
|
|
284
|
+
| `name` | Tag name. Required. Becomes the tag key everywhere downstream. |
|
|
285
|
+
| `description` | What the tag means — given to the model as classification guidance. |
|
|
286
|
+
| `multiple` | `false` (default): at most one value per column. `true`: a column may carry several. |
|
|
287
|
+
| `values[].value` | An allowed value. **The model may only assign values listed here** — anything else is dropped. |
|
|
288
|
+
| `values[].description` | What the value means — strongly improves accuracy. |
|
|
289
|
+
| `values[].examples` | Example column names that fit this value — the model's strongest signal. |
|
|
290
|
+
| `values[].default` | If `true`, assigned to columns the model leaves unclassified for this tag. At most one per tag. |
|
|
291
|
+
|
|
292
|
+
Assigned tags are written on each column as `NAME=VALUE` (e.g. `DATA_CLASSIFICATION=CONFIDENTIAL`) — the convention `export snowflake-full` and `apply snowflake` consume. A worked catalog and example contracts live in [`examples/`](examples/).
|
|
293
|
+
|
|
294
|
+
## REST API
|
|
295
|
+
|
|
296
|
+
```bash
|
|
297
|
+
dcx api --port 4242 # Swagger UI at http://127.0.0.1:4242/docs
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
Every command above is mirrored to an endpoint, with request **and** response schemas in the OpenAPI spec. Auth model:
|
|
301
|
+
|
|
302
|
+
- **Live platform operations** (`/import/snowflake`, `/apply/snowflake`) act *as the caller* — the OAuth bearer token comes from the `Authorization` header, so the server never uses ambient credentials for someone else's data.
|
|
303
|
+
- **Enrichment** (`/enrich/*`) uses the **server's** LLM key (from the environment). Put service-level auth/quota in front of it before exposing it publicly.
|
|
304
|
+
- **The CLI never takes secrets as flags** — platform secrets come from env vars or the platform's own config; LLM keys from the provider's standard env var.
|
|
305
|
+
|
|
306
|
+
## How it fits with datacontract-cli
|
|
307
|
+
|
|
308
|
+
dcx is a **separate package that depends on datacontract-cli as a library** — no fork. It registers new importers (`snowflake`, `kafka`) and the `snowflake-full` exporter into the upstream factories, adds `target` / `enrich` / `apply` sub-apps and live-import commands to the upstream Typer app, and mirrors every command to FastAPI for `dcx api`. So you keep all of upstream's importers, exporters, `lint`, `test`, and `changelog`, and gain the AI + platform layer on top.
|
|
309
|
+
|
|
310
|
+
## Development
|
|
311
|
+
|
|
312
|
+
```bash
|
|
313
|
+
pip install -e ".[dev]"
|
|
314
|
+
pytest # 211 tests
|
|
315
|
+
ruff check dcx # lint
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
Tests never hit live services or real LLMs — platform connections, the Schema Registry, and every LLM call are mocked, so the suite stays fast and offline. See [`RELEASING.md`](RELEASING.md) for the PyPI release process.
|
|
319
|
+
|
|
320
|
+
## Contributing
|
|
321
|
+
|
|
322
|
+
Issues and PRs welcome. Please run `pytest` and `ruff check dcx` before opening a PR, and add tests for new behavior.
|
|
323
|
+
|
|
324
|
+
## License
|
|
325
|
+
|
|
326
|
+
[MIT](LICENSE) © MickaelBZH.
|
|
327
|
+
|
|
328
|
+
<p align="center"><sub>Built on <a href="https://github.com/datacontract/datacontract-cli">datacontract-cli</a> · <a href="https://bitol.io/">Open Data Contract Standard</a> · <a href="https://github.com/BerriAI/litellm">litellm</a></sub></p>
|
|
@@ -0,0 +1,290 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<img src="assets/logo.svg" alt="dcx — Data Contract eXtended" width="520">
|
|
3
|
+
</p>
|
|
4
|
+
|
|
5
|
+
<h3 align="center">Data Contract e<strong>X</strong>tended — AI-native, platform-extensible data contracts</h3>
|
|
6
|
+
|
|
7
|
+
<p align="center">
|
|
8
|
+
Author data contracts with an LLM, sync them with your live platforms.<br>
|
|
9
|
+
A lean, no-fork extension of <a href="https://github.com/datacontract/datacontract-cli">datacontract-cli</a>, built on the <a href="https://bitol.io/">Open Data Contract Standard (ODCS)</a>.
|
|
10
|
+
</p>
|
|
11
|
+
|
|
12
|
+
<p align="center">
|
|
13
|
+
<img alt="PyPI" src="https://img.shields.io/pypi/v/datacontract-x?color=6366F1&label=pypi">
|
|
14
|
+
<img alt="Python" src="https://img.shields.io/badge/python-3.10%20|%203.11%20|%203.12-3776AB?logo=python&logoColor=white">
|
|
15
|
+
<img alt="License: MIT" src="https://img.shields.io/badge/license-MIT-22c55e">
|
|
16
|
+
<img alt="ODCS" src="https://img.shields.io/badge/ODCS-v3.1-0EA5E9">
|
|
17
|
+
<img alt="Built on datacontract-cli" src="https://img.shields.io/badge/built%20on-datacontract--cli-6366F1">
|
|
18
|
+
</p>
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## What is dcx?
|
|
23
|
+
|
|
24
|
+
**dcx (Data Contract eXtended)** adds three things to the Open Data Contract Standard workflow that plain datacontract-cli doesn't do:
|
|
25
|
+
|
|
26
|
+
1. **AI authoring** — use an LLM to enrich a contract with column descriptions, validation constraints, governance **tags** from your own catalog, and an executable **data-quality** suite.
|
|
27
|
+
2. **Live import** — build a contract *from* a running system (its real columns, keys, comments, tags).
|
|
28
|
+
3. **Apply** — push the contract's governance *back* to the platform (comments, tags, data-quality, and the table itself).
|
|
29
|
+
|
|
30
|
+
It's **platform-extensible by design**: each platform is a small importer / exporter / apply module that plugs into datacontract-cli's factories. **Snowflake is the first end-to-end platform** (import → enrich → apply), with Kafka import today and more platforms built to slot in the same way.
|
|
31
|
+
|
|
32
|
+
The pipeline is: **import** a live schema into an ODCS contract → **enrich** it (columns · tags · quality) → **apply** it back to the platform, or **export** it to SQL / docs / schemas. Everything is available both as a **CLI** and as a **REST API** (`dcx api`).
|
|
33
|
+
|
|
34
|
+
## Why dcx?
|
|
35
|
+
|
|
36
|
+
- 🧠 **AI authoring that's safe to ship.** Forced tool-calling, `temperature=0`, and strict server-side validation against the ODCS schema — the model can only produce spec-valid output, never free-form guesses.
|
|
37
|
+
- 🏷️ **A tag *manager*, not a tag guesser.** You define a controlled [tag catalog](#the-tag-catalog) (names, allowed values, examples); the LLM classifies columns into *your* vocabulary, with optional defaults.
|
|
38
|
+
- ✅ **Executable, portable data quality.** Quality rules prefer ODCS `library` metrics (portable, mappable to platform-native checks) and fall back to portable `sql` checks — across all seven ODCS dimensions.
|
|
39
|
+
- 🔌 **Any LLM provider.** Powered by [litellm](https://github.com/BerriAI/litellm) — Anthropic, OpenAI, Azure, Bedrock, Gemini, Ollama, … behind one `--model` flag.
|
|
40
|
+
- 🧩 **Pluggable platforms, no fork.** You keep all 30+ upstream importers/exporters and `lint` / `test` / `changelog`, and gain the AI + platform layer on top.
|
|
41
|
+
- 🔐 **Auth that makes sense per surface.** Live platform operations over the API use **caller-supplied OAuth**; secrets are never CLI flags.
|
|
42
|
+
|
|
43
|
+
## Install
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
pip install datacontract-x
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
The import package and CLI are both `dcx`:
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
dcx --help
|
|
53
|
+
dcx info
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
From source (for development):
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
git clone https://github.com/MickaelBZH/data-contract-x.git
|
|
60
|
+
cd data-contract-x
|
|
61
|
+
pip install -e ".[dev]"
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
> Requires Python 3.10–3.12. Installing pulls in `datacontract-cli`, `litellm`, FastAPI, and the platform connectors automatically.
|
|
65
|
+
|
|
66
|
+
## Quickstart
|
|
67
|
+
|
|
68
|
+
The full loop — import a live schema, enrich it with an LLM, sync it back. Snowflake here is the example platform.
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
# 1. Import an existing schema into a contract (real columns, PKs, comments, tags)
|
|
72
|
+
dcx import snowflake --database MY_DB --schema LOAD --authenticator externalbrowser --output contract.yaml
|
|
73
|
+
|
|
74
|
+
# 2. Enrich with an LLM: descriptions + constraints + tags + data-quality tests
|
|
75
|
+
export ANTHROPIC_API_KEY=... # or OPENAI_API_KEY / AZURE_API_KEY / ...
|
|
76
|
+
dcx enrich all contract.yaml --catalog tags_catalog.yaml --output contract.enriched.yaml
|
|
77
|
+
|
|
78
|
+
# 3. Preview exactly what will run — no connection needed
|
|
79
|
+
dcx apply snowflake contract.enriched.yaml --include-quality --dry-run
|
|
80
|
+
|
|
81
|
+
# 4. Apply it: creates the table if missing, governs it (comments + tags + DQ) if it exists
|
|
82
|
+
dcx apply snowflake contract.enriched.yaml --include-quality
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## Commands
|
|
88
|
+
|
|
89
|
+
Every command is `dcx <command>`, and most are mirrored to a REST endpoint when you run [`dcx api`](#rest-api). Each section below lists the sub-commands, a CLI example, and the matching API call. Run `dcx <command> --help` for the full option list.
|
|
90
|
+
|
|
91
|
+
### `import` — build a contract from a source
|
|
92
|
+
|
|
93
|
+
| Sub-command | Source |
|
|
94
|
+
|---|---|
|
|
95
|
+
| `dcx import snowflake` | A live Snowflake schema (columns, primary keys, comments, tags) |
|
|
96
|
+
| `dcx import kafka` | A Kafka topic's value schema (Confluent Schema Registry) |
|
|
97
|
+
| `dcx import <format>` | A file/document — `sql`, `avro`, `dbml`, `glue`, `bigquery`, `unity`, `jsonschema`, `json`, `odcs`, `parquet`, `csv`, `protobuf`, `spark`, `iceberg`, `excel`, `dbt` |
|
|
98
|
+
|
|
99
|
+
```bash
|
|
100
|
+
dcx import snowflake --database MY_DB --schema LOAD --authenticator externalbrowser --output contract.yaml
|
|
101
|
+
dcx import kafka --schema-registry https://sr:8081 --topic orders --output contract.yaml
|
|
102
|
+
dcx import sql --source schema.sql --dialect snowflake --output contract.yaml
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
**API**
|
|
106
|
+
- `POST /import/snowflake` — live import, authenticated by the caller's Snowflake OAuth token (`Authorization: Bearer <token>`).
|
|
107
|
+
- `POST /import/{format}` — file-based importers; send the document inline as `source_content`.
|
|
108
|
+
- *(Kafka import is CLI-only.)*
|
|
109
|
+
|
|
110
|
+
### `enrich` — AI authoring with an LLM
|
|
111
|
+
|
|
112
|
+
| Sub-command | Adds |
|
|
113
|
+
|---|---|
|
|
114
|
+
| `dcx enrich columns` | Business descriptions, `logicalTypeOptions` constraints, `required` / `unique` flags |
|
|
115
|
+
| `dcx enrich tags` | Governance tags, classified against your [tag catalog](#the-tag-catalog) |
|
|
116
|
+
| `dcx enrich quality` | An executable data-quality suite across all ODCS dimensions |
|
|
117
|
+
| `dcx enrich all` | columns → tags → quality, in that order so each stage grounds the next |
|
|
118
|
+
|
|
119
|
+
Each sub-command is independent and idempotent (existing values are preserved unless you pass `--overwrite`). The provider key is read from the environment — there is no `--api-key` flag. Use `--model` for any litellm model and `--base-url` for a proxy / Azure / Ollama endpoint.
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
dcx enrich columns contract.yaml --output contract.enriched.yaml
|
|
123
|
+
dcx enrich tags contract.yaml --catalog tags_catalog.yaml --output contract.tagged.yaml
|
|
124
|
+
dcx enrich quality contract.yaml --model gpt-4o --output contract.dq.yaml
|
|
125
|
+
dcx enrich all contract.yaml --catalog tags_catalog.yaml --output contract.full.yaml
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
**API** (the LLM key comes from the *server's* environment)
|
|
129
|
+
- `POST /enrich/columns` · `POST /enrich/quality`
|
|
130
|
+
- `POST /enrich/tags` · `POST /enrich/all` — take the tag catalog inline in the request body.
|
|
131
|
+
|
|
132
|
+
### `export` — convert a contract to a target format
|
|
133
|
+
|
|
134
|
+
| Sub-command | Output |
|
|
135
|
+
|---|---|
|
|
136
|
+
| `dcx export snowflake-full` | A Snowflake setup script: DDL + tags + Data Metric Functions, in one file |
|
|
137
|
+
| `dcx export <format>` | Any upstream format — `sql`, `jsonschema`, `html`, `markdown`, `mermaid`, `dbt-*`, `avro`, `protobuf`, `bigquery`, `spark`, `sqlalchemy`, `iceberg`, `sodacl`, `great-expectations`, `dbml`, `pydantic-model`, `odcs`, `rdf`, `go`, `excel`, … |
|
|
138
|
+
|
|
139
|
+
`snowflake-full` options: `--include-tags`, `--include-quality`, `--create-tags`, `--tag-namespace DB.SCHEMA`, and `--structured-types` (render nested columns as typed `OBJECT(field type, …)` / `ARRAY(type)`).
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
dcx export snowflake-full contract.yaml --include-quality --create-tags --output setup.sql
|
|
143
|
+
dcx export html contract.yaml --output contract.html
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**API**
|
|
147
|
+
- `POST /export/{format}` — including `POST /export/snowflake-full`. The response media type depends on the format (JSON / YAML / text / binary).
|
|
148
|
+
|
|
149
|
+
### `apply` — push governance to a live platform
|
|
150
|
+
|
|
151
|
+
| Sub-command | Target |
|
|
152
|
+
|---|---|
|
|
153
|
+
| `dcx apply snowflake` | A live Snowflake account |
|
|
154
|
+
|
|
155
|
+
With the default `--ddl-mode auto` you don't need to know whether the table exists: **missing tables are created** (`CREATE TABLE IF NOT EXISTS`) and **existing ones are governed** — column/table comments, tags, and (with `--include-quality`) data-quality metrics. For existing tables, dcx also **compares the live schema to the contract** and reports drift as warnings — or, with `--strict`, an error that aborts before any change (the check uses `DESCRIBE TABLE`, so it needs no active warehouse).
|
|
156
|
+
|
|
157
|
+
| Option | Effect |
|
|
158
|
+
|---|---|
|
|
159
|
+
| `--ddl-mode auto\|always\|never` | create-if-missing-then-govern (default) · always `CREATE TABLE` · govern existing only |
|
|
160
|
+
| `--strict` | fail instead of warn on schema drift |
|
|
161
|
+
| `--structured-types` | typed nested `OBJECT(...)` / `ARRAY(...)` |
|
|
162
|
+
| `--include-quality` · `--create-tags` · `--tag-namespace` | data-metric functions · `CREATE TAG IF NOT EXISTS` · qualify tag refs |
|
|
163
|
+
| `--dry-run` | print the SQL without connecting |
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
dcx apply snowflake contract.yaml --dry-run # preview
|
|
167
|
+
dcx apply snowflake contract.yaml --include-quality # create-or-govern
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
**API**
|
|
171
|
+
- `POST /apply/snowflake` — authenticated by the caller's Snowflake OAuth token. Supports `dry_run`, `ddl_mode`, `strict`, `structured_types`, … and returns the executed SQL plus any drift `warnings`.
|
|
172
|
+
|
|
173
|
+
### `target` — bind a contract to a platform
|
|
174
|
+
|
|
175
|
+
`dcx target <type>` sets the contract's server block and resolves each column's `physicalType` for that platform. ~30 types: `snowflake`, `bigquery`, `databricks`, `postgres`, `redshift`, `mysql`, `sqlserver`, `oracle`, `s3`, `kafka`, `trino`, `athena`, `glue`, `duckdb`, `local`, …
|
|
176
|
+
|
|
177
|
+
```bash
|
|
178
|
+
dcx target snowflake contract.yaml --output contract.snowflake.yaml
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
**API**
|
|
182
|
+
- `POST /target/{type}` — one route per supported platform type.
|
|
183
|
+
|
|
184
|
+
### From datacontract-cli
|
|
185
|
+
|
|
186
|
+
These commands work unchanged — `dcx <command>` behaves exactly like `datacontract <command>`.
|
|
187
|
+
|
|
188
|
+
| Command | Sub-commands | Purpose | API |
|
|
189
|
+
|---|---|---|---|
|
|
190
|
+
| `dcx init` | — | Create an empty data contract | — |
|
|
191
|
+
| `dcx lint` | — | Validate a contract against the ODCS schema | `POST /lint` |
|
|
192
|
+
| `dcx test` | — | Run schema + data-quality tests against a configured server | `POST /test` |
|
|
193
|
+
| `dcx ci` | — | `test` for CI/CD — emits GitHub Actions annotations | — |
|
|
194
|
+
| `dcx changelog` | — | Semantic changelog between two contract versions | `POST /changelog` |
|
|
195
|
+
| `dcx catalog` | — | Render an HTML catalog of many contracts | — |
|
|
196
|
+
| `dcx publish` | — | Publish a contract to Entropy Data | — |
|
|
197
|
+
| `dcx dbt` | `sync` | Sync contracts into a dbt project | — |
|
|
198
|
+
|
|
199
|
+
### `api` / `info`
|
|
200
|
+
|
|
201
|
+
```bash
|
|
202
|
+
dcx api --port 4242 # start the REST server (Swagger UI at /docs)
|
|
203
|
+
dcx info # show dcx + datacontract-cli versions (API: GET /info)
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
## The tag catalog
|
|
209
|
+
|
|
210
|
+
`dcx enrich tags` does **controlled-vocabulary** tagging: instead of letting the model invent tags, you give it a catalog of allowed names and values, and it classifies each column into that vocabulary. The catalog is a small YAML (or JSON) file — the only extra input auto-tagging needs.
|
|
211
|
+
|
|
212
|
+
```yaml
|
|
213
|
+
# tags_catalog.yaml
|
|
214
|
+
tags:
|
|
215
|
+
- name: DATA_CLASSIFICATION # the tag name (becomes the platform TAG name)
|
|
216
|
+
description: > # tells the model what this tag is for
|
|
217
|
+
Data sensitivity level. Assign exactly one — the highest level that applies.
|
|
218
|
+
multiple: false # false = at most one value per column; true = many
|
|
219
|
+
values:
|
|
220
|
+
- value: PUBLIC # the model may only pick from these values
|
|
221
|
+
description: Non-sensitive data that can be shared freely.
|
|
222
|
+
examples: [country_code, currency, language, product_category] # guide classification
|
|
223
|
+
- value: INTERNAL
|
|
224
|
+
description: Internal business data, not for public release. The default.
|
|
225
|
+
default: true # assigned when the model picks nothing else
|
|
226
|
+
examples: [order_id, status, created_at, loyalty_points]
|
|
227
|
+
- value: CONFIDENTIAL
|
|
228
|
+
description: Personal data or sensitive business data; need-to-know access.
|
|
229
|
+
examples: [full_name, email, phone, home_address, date_of_birth]
|
|
230
|
+
- value: RESTRICTED
|
|
231
|
+
description: Highly sensitive data under legal/regulatory controls (financial, health, credentials, IDs).
|
|
232
|
+
examples: [national_id, passport_number, iban, credit_card_number, health_status]
|
|
233
|
+
|
|
234
|
+
- name: DATA_DOMAIN # you can define several tags
|
|
235
|
+
description: The business domain that owns the column.
|
|
236
|
+
multiple: false
|
|
237
|
+
values:
|
|
238
|
+
- value: CUSTOMER
|
|
239
|
+
examples: [customer_id, email, loyalty_points]
|
|
240
|
+
- value: FINANCE
|
|
241
|
+
examples: [amount, currency, invoice_id, iban]
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
| Field | Meaning |
|
|
245
|
+
|---|---|
|
|
246
|
+
| `name` | Tag name. Required. Becomes the tag key everywhere downstream. |
|
|
247
|
+
| `description` | What the tag means — given to the model as classification guidance. |
|
|
248
|
+
| `multiple` | `false` (default): at most one value per column. `true`: a column may carry several. |
|
|
249
|
+
| `values[].value` | An allowed value. **The model may only assign values listed here** — anything else is dropped. |
|
|
250
|
+
| `values[].description` | What the value means — strongly improves accuracy. |
|
|
251
|
+
| `values[].examples` | Example column names that fit this value — the model's strongest signal. |
|
|
252
|
+
| `values[].default` | If `true`, assigned to columns the model leaves unclassified for this tag. At most one per tag. |
|
|
253
|
+
|
|
254
|
+
Assigned tags are written on each column as `NAME=VALUE` (e.g. `DATA_CLASSIFICATION=CONFIDENTIAL`) — the convention `export snowflake-full` and `apply snowflake` consume. A worked catalog and example contracts live in [`examples/`](examples/).
|
|
255
|
+
|
|
256
|
+
## REST API
|
|
257
|
+
|
|
258
|
+
```bash
|
|
259
|
+
dcx api --port 4242 # Swagger UI at http://127.0.0.1:4242/docs
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
Every command above is mirrored to an endpoint, with request **and** response schemas in the OpenAPI spec. Auth model:
|
|
263
|
+
|
|
264
|
+
- **Live platform operations** (`/import/snowflake`, `/apply/snowflake`) act *as the caller* — the OAuth bearer token comes from the `Authorization` header, so the server never uses ambient credentials for someone else's data.
|
|
265
|
+
- **Enrichment** (`/enrich/*`) uses the **server's** LLM key (from the environment). Put service-level auth/quota in front of it before exposing it publicly.
|
|
266
|
+
- **The CLI never takes secrets as flags** — platform secrets come from env vars or the platform's own config; LLM keys from the provider's standard env var.
|
|
267
|
+
|
|
268
|
+
## How it fits with datacontract-cli
|
|
269
|
+
|
|
270
|
+
dcx is a **separate package that depends on datacontract-cli as a library** — no fork. It registers new importers (`snowflake`, `kafka`) and the `snowflake-full` exporter into the upstream factories, adds `target` / `enrich` / `apply` sub-apps and live-import commands to the upstream Typer app, and mirrors every command to FastAPI for `dcx api`. So you keep all of upstream's importers, exporters, `lint`, `test`, and `changelog`, and gain the AI + platform layer on top.
|
|
271
|
+
|
|
272
|
+
## Development
|
|
273
|
+
|
|
274
|
+
```bash
|
|
275
|
+
pip install -e ".[dev]"
|
|
276
|
+
pytest # 211 tests
|
|
277
|
+
ruff check dcx # lint
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
Tests never hit live services or real LLMs — platform connections, the Schema Registry, and every LLM call are mocked, so the suite stays fast and offline. See [`RELEASING.md`](RELEASING.md) for the PyPI release process.
|
|
281
|
+
|
|
282
|
+
## Contributing
|
|
283
|
+
|
|
284
|
+
Issues and PRs welcome. Please run `pytest` and `ruff check dcx` before opening a PR, and add tests for new behavior.
|
|
285
|
+
|
|
286
|
+
## License
|
|
287
|
+
|
|
288
|
+
[MIT](LICENSE) © MickaelBZH.
|
|
289
|
+
|
|
290
|
+
<p align="center"><sub>Built on <a href="https://github.com/datacontract/datacontract-cli">datacontract-cli</a> · <a href="https://bitol.io/">Open Data Contract Standard</a> · <a href="https://github.com/BerriAI/litellm">litellm</a></sub></p>
|