structurize 2.16.2__py3-none-any.whl → 2.16.5__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. avrotize/__init__.py +63 -63
  2. avrotize/__main__.py +5 -5
  3. avrotize/_version.py +34 -34
  4. avrotize/asn1toavro.py +160 -160
  5. avrotize/avrotize.py +152 -152
  6. avrotize/avrotocpp.py +483 -483
  7. avrotize/avrotocsharp.py +992 -992
  8. avrotize/avrotocsv.py +121 -121
  9. avrotize/avrotodatapackage.py +173 -173
  10. avrotize/avrotodb.py +1383 -1383
  11. avrotize/avrotogo.py +476 -476
  12. avrotize/avrotographql.py +197 -197
  13. avrotize/avrotoiceberg.py +210 -210
  14. avrotize/avrotojava.py +1023 -1023
  15. avrotize/avrotojs.py +250 -250
  16. avrotize/avrotojsons.py +481 -481
  17. avrotize/avrotojstruct.py +345 -345
  18. avrotize/avrotokusto.py +363 -363
  19. avrotize/avrotomd.py +137 -137
  20. avrotize/avrotools.py +168 -168
  21. avrotize/avrotoparquet.py +208 -208
  22. avrotize/avrotoproto.py +358 -358
  23. avrotize/avrotopython.py +622 -622
  24. avrotize/avrotorust.py +435 -435
  25. avrotize/avrotots.py +598 -598
  26. avrotize/avrotoxsd.py +344 -344
  27. avrotize/commands.json +2493 -2433
  28. avrotize/common.py +828 -828
  29. avrotize/constants.py +4 -4
  30. avrotize/csvtoavro.py +131 -131
  31. avrotize/datapackagetoavro.py +76 -76
  32. avrotize/dependency_resolver.py +348 -348
  33. avrotize/jsonstoavro.py +1698 -1698
  34. avrotize/jsonstostructure.py +2642 -2642
  35. avrotize/jstructtoavro.py +878 -878
  36. avrotize/kstructtoavro.py +93 -93
  37. avrotize/kustotoavro.py +455 -455
  38. avrotize/parquettoavro.py +157 -157
  39. avrotize/proto2parser.py +497 -497
  40. avrotize/proto3parser.py +402 -402
  41. avrotize/prototoavro.py +382 -382
  42. avrotize/structuretocsharp.py +2005 -2005
  43. avrotize/structuretojsons.py +498 -498
  44. avrotize/structuretopython.py +772 -772
  45. avrotize/structuretots.py +653 -0
  46. avrotize/xsdtoavro.py +413 -413
  47. {structurize-2.16.2.dist-info → structurize-2.16.5.dist-info}/METADATA +848 -805
  48. structurize-2.16.5.dist-info/RECORD +52 -0
  49. {structurize-2.16.2.dist-info → structurize-2.16.5.dist-info}/licenses/LICENSE +200 -200
  50. structurize-2.16.2.dist-info/RECORD +0 -51
  51. {structurize-2.16.2.dist-info → structurize-2.16.5.dist-info}/WHEEL +0 -0
  52. {structurize-2.16.2.dist-info → structurize-2.16.5.dist-info}/entry_points.txt +0 -0
  53. {structurize-2.16.2.dist-info → structurize-2.16.5.dist-info}/top_level.txt +0 -0
@@ -1,805 +1,848 @@
1
- Metadata-Version: 2.4
2
- Name: structurize
3
- Version: 2.16.2
4
- Summary: Tools to convert from and to JSON Structure from various other schema languages.
5
- Author-email: Clemens Vasters <clemensv@microsoft.com>
6
- Classifier: Programming Language :: Python :: 3
7
- Classifier: License :: OSI Approved :: MIT License
8
- Classifier: Operating System :: OS Independent
9
- Requires-Python: >=3.10
10
- Description-Content-Type: text/markdown
11
- License-File: LICENSE
12
- Requires-Dist: jsonschema>=4.23.0
13
- Requires-Dist: lark>=1.1.9
14
- Requires-Dist: pyarrow>=22.0.0
15
- Requires-Dist: asn1tools>=0.167.0
16
- Requires-Dist: jsonpointer>=3.0.0
17
- Requires-Dist: jsonpath-ng>=1.6.1
18
- Requires-Dist: jsoncomparison>=1.1.0
19
- Requires-Dist: requests>=2.32.3
20
- Requires-Dist: azure-kusto-data>=5.0.5
21
- Requires-Dist: azure-identity>=1.17.1
22
- Requires-Dist: datapackage>=1.15.4
23
- Requires-Dist: jinja2>=3.1.4
24
- Requires-Dist: pyiceberg>=0.10.0
25
- Requires-Dist: pandas>=2.2.2
26
- Requires-Dist: docker>=7.1.0
27
- Provides-Extra: dev
28
- Requires-Dist: pytest>=8.3.2; extra == "dev"
29
- Requires-Dist: fastavro>=1.9.5; extra == "dev"
30
- Requires-Dist: xmlschema>=3.3.2; extra == "dev"
31
- Requires-Dist: xmlunittest>=1.0.1; extra == "dev"
32
- Requires-Dist: pylint>=3.2.6; extra == "dev"
33
- Requires-Dist: dataclasses_json>=0.6.7; extra == "dev"
34
- Requires-Dist: dataclasses>=0.8; extra == "dev"
35
- Requires-Dist: pydantic>=2.8.2; extra == "dev"
36
- Requires-Dist: avro>=1.12.0; extra == "dev"
37
- Requires-Dist: testcontainers>=4.7.2; extra == "dev"
38
- Requires-Dist: pymysql>=1.1.1; extra == "dev"
39
- Requires-Dist: psycopg2>=2.9.9; extra == "dev"
40
- Requires-Dist: pyodbc>=5.1.0; extra == "dev"
41
- Requires-Dist: pymongo>=4.8.0; extra == "dev"
42
- Requires-Dist: oracledb>=2.3.0; extra == "dev"
43
- Requires-Dist: cassandra-driver>=3.29.1; extra == "dev"
44
- Requires-Dist: sqlalchemy>=2.0.32; extra == "dev"
45
- Dynamic: license-file
46
-
47
- # Avrotize
48
-
49
- Avrotize is a ["Rosetta Stone"](https://en.wikipedia.org/wiki/Rosetta_Stone) for data structure definitions, allowing you to convert between numerous data and database schema formats and to generate code for different programming languages.
50
-
51
- It is, for instance, a well-documented and predictable converter and code generator for data structures originally defined in JSON Schema (of arbitrary complexity).
52
-
53
- The tool leans on the Apache Avro-derived [Avrotize Schema](specs/avrotize-schema.md) as its schema model.
54
-
55
- - Programming languages: Python, C#, Java, TypeScript, JavaScript, Rust, Go, C++
56
- - SQL Databases: MySQL, MariaDB, PostgreSQL, SQL Server, Oracle, SQLite, BigQuery, Snowflake, Redshift, DB2
57
- - Other databases: KQL/Kusto, MongoDB, Cassandra, Redis, Elasticsearch, DynamoDB, CosmosDB
58
- - Data schema formats: Avro, JSON Schema, XML Schema (XSD), Protocol Buffers 2 and 3, ASN.1, Apache Parquet
59
-
60
- ## Installation
61
-
62
- You can install Avrotize from PyPI, [having installed Python 3.10 or later](https://www.python.org/downloads/):
63
-
64
- ```bash
65
- pip install avrotize
66
- ```
67
-
68
- ## Usage
69
-
70
- Avrotize provides several commands for converting schema formats via Avrotize Schema.
71
-
72
- Converting to Avrotize Schema:
73
-
74
- - [`avrotize p2a`](#convert-proto-schema-to-avrotize-schema) - Convert Protobuf (2 or 3) schema to Avrotize Schema.
75
- - [`avrotize j2a`](#convert-json-schema-to-avrotize-schema) - Convert JSON schema to Avrotize Schema.
76
- - [`avrotize x2a`](#convert-xml-schema-xsd-to-avrotize-schema) - Convert XML schema to Avrotize Schema.
77
- - [`avrotize asn2a`](#convert-asn1-schema-to-avrotize-schema) - Convert ASN.1 to Avrotize Schema.
78
- - [`avrotize k2a`](#convert-kusto-table-definition-to-avrotize-schema) - Convert Kusto table definitions to Avrotize Schema.
79
- - [`avrotize pq2a`](#convert-parquet-schema-to-avrotize-schema) - Convert Parquet schema to Avrotize Schema.
80
- - [`avrotize csv2a`](#convert-csv-file-to-avrotize-schema) - Convert CSV file to Avrotize Schema.
81
- - [`avrotize kstruct2a`](#convert-kafka-connect-schema-to-avrotize-schema) - Convert Kafka Connect Schema to Avrotize Schema.
82
-
83
- Converting from Avrotize Schema:
84
-
85
- - [`avrotize a2p`](#convert-avrotize-schema-to-proto-schema) - Convert Avrotize Schema to Protobuf 3 schema.
86
- - [`avrotize a2j`](#convert-avrotize-schema-to-json-schema) - Convert Avrotize Schema to JSON schema.
87
- - [`avrotize a2x`](#convert-avrotize-schema-to-xml-schema) - Convert Avrotize Schema to XML schema.
88
- - [`avrotize a2k`](#convert-avrotize-schema-to-kusto-table-declaration) - Convert Avrotize Schema to Kusto table definition.
89
- - [`avrotize a2sql`](#convert-avrotize-schema-to-sql-table-definition) - Convert Avrotize Schema to SQL table definition.
90
- - [`avrotize a2pq`](#convert-avrotize-schema-to-empty-parquet-file) - Convert Avrotize Schema to Parquet or Iceberg schema.
91
- - [`avrotize a2ib`](#convert-avrotize-schema-to-iceberg-schema) - Convert Avrotize Schema to Iceberg schema.
92
- - [`avrotize a2mongo`](#convert-avrotize-schema-to-mongodb-schema) - Convert Avrotize Schema to MongoDB schema.
93
- - [`avrotize a2cassandra`](#convert-avrotize-schema-to-cassandra-schema) - Convert Avrotize Schema to Cassandra schema.
94
- - [`avrotize a2es`](#convert-avrotize-schema-to-elasticsearch-schema) - Convert Avrotize Schema to Elasticsearch schema.
95
- - [`avrotize a2dynamodb`](#convert-avrotize-schema-to-dynamodb-schema) - Convert Avrotize Schema to DynamoDB schema.
96
- - [`avrotize a2cosmos`](#convert-avrotize-schema-to-cosmosdb-schema) - Convert Avrotize Schema to CosmosDB schema.
97
- - [`avrotize a2couchdb`](#convert-avrotize-schema-to-couchdb-schema) - Convert Avrotize Schema to CouchDB schema.
98
- - [`avrotize a2firebase`](#convert-avrotize-schema-to-firebase-schema) - Convert Avrotize Schema to Firebase schema.
99
- - [`avrotize a2hbase`](#convert-avrotize-schema-to-hbase-schema) - Convert Avrotize Schema to HBase schema.
100
- - [`avrotize a2neo4j`](#convert-avrotize-schema-to-neo4j-schema) - Convert Avrotize Schema to Neo4j schema.
101
- - [`avrotize a2dp`](#convert-avrotize-schema-to-datapackage-schema) - Convert Avrotize Schema to Datapackage schema.
102
- - [`avrotize a2md`](#convert-avrotize-schema-to-markdown-documentation) - Convert Avrotize Schema to Markdown documentation.
103
-
104
- Generate code from Avrotize Schema:
105
-
106
- - [`avrotize a2cs`](#convert-avrotize-schema-to-c-classes) - Generate C# code from Avrotize Schema.
107
- - [`avrotize a2java`](#convert-avrotize-schema-to-java-classes) - Generate Java code from Avrotize Schema.
108
- - [`avrotize a2py`](#convert-avrotize-schema-to-python-classes) - Generate Python code from Avrotize Schema.
109
- - [`avrotize a2ts`](#convert-avrotize-schema-to-typescript-classes) - Generate TypeScript code from Avrotize Schema.
110
- - [`avrotize a2js`](#convert-avrotize-schema-to-javascript-classes) - Generate JavaScript code from Avrotize Schema.
111
- - [`avrotize a2cpp`](#convert-avrotize-schema-to-c-classes) - Generate C++ code from Avrotize Schema.
112
- - [`avrotize a2go`](#convert-avrotize-schema-to-go-classes) - Generate Go code from Avrotize Schema.
113
- - [`avrotize a2rust`](#convert-avrotize-schema-to-rust-classes) - Generate Rust code from Avrotize Schema.
114
-
115
- Other commands:
116
-
117
- - [`avrotize pcf`](#create-the-parsing-canonical-form-pcf-of-an-avrotize-schema) - Create the Parsing Canonical Form (PCF) of an Avrotize Schema.
118
-
119
- ## Overview
120
-
121
- You can use Avrotize to convert between Avro/Avrotize Schema and other schema formats like JSON Schema, XML Schema (XSD), Protocol Buffers (Protobuf), ASN.1, and database schema formats like Kusto Data Table Definition (KQL) and SQL Table Definition. That means you can also convert from JSON Schema to Protobuf going via Avrotize Schema.
122
-
123
- You can also generate C#, Java, TypeScript, JavaScript, and Python code from Avrotize Schema documents. The difference to the native Avro tools is that Avrotize can emit data classes without Avro library dependencies and, optionally, with annotations for JSON serialization libraries like Jackson or System.Text.Json.
124
-
125
- The tool does not convert data (instances of schemas), only the data structure definitions.
126
-
127
- Mind that the primary objective of the tool is the conversion of schemas that describe data structures used in applications, databases, and message systems. While the project's internal tests do cover a lot of ground, it is nevertheless not a primary goal of the tool to convert every complex document schema like those used for devops pipeline or system configuration files.
128
-
129
- ## Why?
130
-
131
- Data structure definitions are an essential part of data exchange, serialization, and storage. They define the shape and type of data, and they are foundational for tooling and libraries for working with the data. Nearly all data schema languages are coupled to a specific data exchange or storage format, locking the definitions to that format.
132
-
133
- Avrotize is designed as a tool to "unlock" data definitions from JSON Schema or XML Schema and make them usable in other contexts. The intent is also to lay a foundation for transcoding data from one format to another, by translating the schema definitions as accurately as possible into the schema model of the target format's schema. The transcoding of the data itself requires separate tools that are beyond the scope of this project.
134
-
135
- The use of the term "data structure definition" and not "data object definition" is quite intentional. The focus of the tool is on data structures that can be used for messaging and eventing payloads, for data serialization, and for database tables, with the goal that those structures can be mapped cleanly from and to common programming language types.
136
-
137
- Therefore, Avrotize intentionally ignores common techniques to model object-oriented inheritance. For instance, when converting from JSON Schema, all content from `allOf` expressions is merged into a single record type rather than trying to model the inheritance tree in Avro.
138
-
139
- ## Avrotize Schema
140
-
141
- Avrotize Schema is a schema model that is a full superset of the popular Apache Avro Schema model. Avrotize Schema is the "pivot point" for this tool. All schemas are converted from and to Avrotize Schema.
142
-
143
- Since Avrotize Schema is a superset of Avro Schema and uses its extensibility features, every Avrotize Schema is also a valid Avro Schema and vice versa.
144
-
145
- Why did we pick Avro Schema as the foundational schema model?
146
-
147
- Avro Schema ...
148
-
149
- - provides a simple, clean, and concise way to define data structures. It is quite easy to understand and use.
150
- - is self-contained by design without having or requiring external references. Avro Schema can express complex data structure hierarchies spanning multiple namespace boundaries all in a single file, which neither JSON Schema nor XML Schema nor Protobuf can do.
151
- - can be resolved by code generators and other tools "top-down" since it enforces dependencies to be ordered such that no forward-referencing occurs.
152
- - emerged out of the Apache Hadoop ecosystem and is widely used for serialization and storage of data and for data exchange between systems.
153
- - supports native and logical types that cover the needs of many business and technical use cases.
154
- - can describe the popular JSON data encoding very well and in a way that always maps cleanly to a wide range of programming languages and systems. In contrast, it's quite easy to inadvertently define a JSON Schema that is very difficult to map to a programming language structure.
155
- - is itself expressed as JSON. That makes it easy to parse and generate, which is not the case for Protobuf or ASN.1, which require bespoke parsers.
156
-
157
- > It needs to be noted here that while Avro Schema is great for defining data structures, and data classes generated from Avro Schema using this tool or other tools can be used to with the most popular JSON serialization libraries, the Apache Avro project's own JSON encoding has fairly grave interoperability issues with common usage of JSON. Avrotize defines an alternate JSON encoding
158
-
159
- in [`avrojson.md`](specs/avrojson.md).
160
-
161
- Avro Schema does not support all the bells and whistles of XML Schema or JSON Schema, but that is a feature, not a bug, as it ensures the portability of the schemas across different systems and infrastructures. Specifically, Avro Schema does not support many of the data validation features found in JSON Schema or XML Schema. There are no `pattern`, `format`, `minimum`, `maximum`, or `required` keywords in Avro Schema, and Avro does not support conditional validation.
162
-
163
- In a system where data originates as XML or JSON described by a validating XML Schema or JSON Schema, the assumption we make here is that data will be validated using its native schema language first, and then the Avro Schema will be used for transformation or transfer or storage.
164
-
165
- ## Adding CloudEvents columns for database tables
166
-
167
- When converting Avrotize Schema to Kusto Data Table Definition (KQL), SQL Table Definition, or Parquet Schema, the tool can add special columns for [CloudEvents](https://cloudevents.io) attributes. CNCF CloudEvents is a specification for describing event data in a common way.
168
-
169
- The rationale for adding such columns to database tables is that messages and events commonly separate event metadata from the payload data, while that information is merged when events are projected into a database. The metadata often carries important context information about the event that is not contained in the payload itself. Therefore, the tool can add those columns to the database tables for easy alignment of the message context with the payload when building event stores.
170
-
171
- ### Convert Proto schema to Avrotize Schema
172
-
173
- ```bash
174
- avrotize p2a <path_to_proto_file> [--out <path_to_avro_schema_file>]
175
- ```
176
-
177
- Parameters:
178
-
179
- - `<path_to_proto_file>`: The path to the Protobuf schema file to be converted. If omitted, the file is read from stdin.
180
- - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
181
-
182
- Conversion notes:
183
-
184
- - Proto 2 and Proto 3 syntax are supported.
185
- - Proto package names are mapped to Avro namespaces. The tool does resolve imports and consolidates all imported types into a single Avrotize Schema file.
186
- - The tool embeds all 'well-known' Protobuf 3.0 types in Avro format and injects them as needed when the respective types are imported. Only the `Timestamp` type is mapped to the Avro logical type 'timestamp-millis'. The rest of the well-known Protobuf types are kept as Avro record types with the same field names and types.
187
- - Protobuf allows any scalar type as key in a `map`, Avro does not. When converting from Proto to Avro, the type information for the map keys is ignored.
188
- - The field numbers in message types are not mapped to the positions of the fields in Avro records. The fields in Avro are ordered as they appear in the Proto schema. Consequently, the Avrotize Schema also ignores the `extensions` and `reserved` keywords in the Proto schema.
189
- - The `optional` keyword results in an Avro field being nullable (union with the `null` type), while the `required` keyword results in a non-nullable field. The `repeated` keyword results in an Avro field being an array of the field type.
190
- - The `oneof` keyword in Proto is mapped to an Avro union type.
191
- - All `options` in the Proto schema are ignored.
192
-
193
- ### Convert Avrotize Schema to Proto schema
194
-
195
- ```bash
196
- avrotize a2p <path_to_avro_schema_file> [--out <path_to_proto_directory>] [--naming <naming_mode>] [--allow-optional]
197
- ```
198
-
199
- Parameters:
200
-
201
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
202
- - `--out`: The path to the Protobuf schema directory to write the conversion result to. If omitted, the output is directed to stdout.
203
- - `--naming`: (optional) Type naming convention. Choices are `snake`, `camel`, `pascal`.
204
- - `--allow-optional`: (optional) Enable support for 'optional' fields.
205
-
206
- Conversion notes:
207
-
208
- - Avro namespaces are resolved into distinct proto package definitions. The tool will create a new `.proto` file with the package definition and an `import` statement for each namespace found in the Avrotize Schema.
209
- - Avro type unions `[]` are converted to `oneof` expressions in Proto. Avro allows for maps and arrays in the type union, whereas Proto only supports scalar types and message type references. The tool will therefore emit message types containing a single array or map field for any such case and add it to the containing type, and will also recursively resolve further unions in the array and map values.
210
- - The sequence of fields in a message follows the sequence of fields in the Avro record. When type unions need to be resolved into `oneof` expressions, the alternative fields need to be assigned field numbers, which will shift the field numbers for any subsequent fields.
211
-
212
- ### Convert JSON schema to Avrotize Schema
213
-
214
- ```bash
215
- avrotize j2a <path_to_json_schema_file> [--out <path_to_avro_schema_file>] [--namespace <avro_schema_namespace>] [--split-top-level-records]
216
- ```
217
-
218
- Parameters:
219
-
220
- - `<path_to_json_schema_file>`: The path to the JSON schema file to be converted. If omitted, the file is read from stdin.
221
- - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
222
- - `--namespace`: (optional) The namespace to use in the Avrotize Schema if the JSON schema does not define a namespace.
223
- - `--split-top-level-records`: (optional) Split top-level records into separate files.
224
-
225
- Conversion notes:
226
-
227
- - [JSON Schema Handling in Avrotize](specs/jsonschema.md)
228
-
229
- ### Convert Avrotize Schema to JSON schema
230
-
231
- ```bash
232
- avrotize a2j <path_to_avro_schema_file> [--out <path_to_json_schema_file>] [--naming <naming_mode>]
233
- ```
234
-
235
- Parameters:
236
-
237
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
238
- - `--out`: The path to the JSON schema file to write the conversion result to. If omitted, the output is directed to stdout.
239
- - `--naming`: (optional) Type naming convention. Choices are `snake`, `camel`, `pascal`, `default`.
240
-
241
- Conversion notes:
242
-
243
- - [JSON Schema Handling in Avrotize](specs/jsonschema.md)
244
-
245
- ### Convert XML Schema (XSD) to Avrotize Schema
246
-
247
- ```bash
248
- avrotize x2a <path_to_xsd_file> [--out <path_to_avro_schema_file>] [--namespace <avro_schema_namespace>]
249
- ```
250
-
251
- Parameters:
252
-
253
- - `<path_to_xsd_file>`: The path to the XML schema file to be converted. If omitted, the file is read from stdin.
254
- - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
255
- - `--namespace`: (optional) The namespace to use in the Avrotize Schema if the XML schema does not define a namespace.
256
-
257
- Conversion notes:
258
-
259
- - All XML Schema constructs are mapped to Avro record types with fields, whereby **both**, elements and attributes, become fields in the record. XML is therefore flattened into fields and this aspect of the structure is not preserved.
260
- - Avro does not support `xsd:any` as Avro does not support arbitrary typing and must always use a named type. The tool will map `xsd:any` to a field `any` typed as a union that allows scalar values or two levels of array and/or map nesting.
261
- - `simpleType` declarations that define enums are mapped to `enum` types in Avro. All other facets are ignored and simple types are mapped to the corresponding Avro type.
262
- - `complexType` declarations that have simple content where a base type is augmented with attributes is mapped to a record type in Avro. Any other facets defined on the complex type are ignored.
263
- - If the schema defines a single root element, the tool will emit a single Avro record type. If the schema defines multiple root elements, the tool will emit a union of record types, each corresponding to a root element.
264
- - All fields in the resulting Avrotize Schema are annotated with an `xmlkind` extension attribute that indicates whether the field was an `element` or an `attribute` in the XML schema.
265
-
266
- ### Convert Avrotize Schema to XML schema
267
-
268
- ```bash
269
- avrotize a2x <path_to_avro_schema_file> [--out <path_to_xsd_schema_file>] [--namespace <target_namespace>]
270
- ```
271
-
272
- Parameters:
273
-
274
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
275
- - `--out`: The path to the XML schema file to write the conversion result to. If omitted, the output is directed to stdout.
276
- - `--namespace`: (optional) Target namespace for the XSD schema.
277
-
278
- Conversion notes:
279
-
280
- - Avro record types are mapped to XML Schema complex types with elements.
281
- - Avro enum types are mapped to XML Schema simple types with restrictions.
282
- - Avro logical types are mapped to XML Schema simple types with restrictions where required.
283
- - Avro unions are mapped to standalone XSD simple type definitions with a union restriction if all union types are primitives.
284
- - Avro unions with complex types are resolved into distinct types for each option that are
285
-
286
- then joined with a choice.
287
-
288
- ### Convert ASN.1 schema to Avrotize Schema
289
-
290
- ```bash
291
- avrotize asn2a <path_to_asn1_schema_file>[,<path_to_asn1_schema_file>,...] [--out <path_to_avro_schema_file>]
292
- ```
293
-
294
- Parameters:
295
-
296
- - `<path_to_asn1_schema_file>`: The path to the ASN.1 schema file to be converted. The tool supports multiple files in a comma-separated list. If omitted, the file is read from stdin.
297
- - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
298
-
299
- Conversion notes:
300
-
301
- - All ASN.1 types are mapped to Avro record types, enums, and unions. Avro does not support the same level of nesting of types as ASN.1, the tool will map the types to the best fit.
302
- - The tool will map the following ASN.1 types to Avro types:
303
- - `SEQUENCE` and `SET` are mapped to Avro record types.
304
- - `CHOICE` is mapped to an Avro record types with all fields being optional. While the `CHOICE` type technically corresponds to an Avro union, the ASN.1 type has different named fields for each option, which is not a feature of Avro unions.
305
- - `OBJECT IDENTIFIER` is mapped to an Avro string type.
306
- - `ENUMERATED` is mapped to an Avro enum type.
307
- - `SEQUENCE OF` and `SET OF` are mapped to Avro array type.
308
- - `BIT STRING` is mapped to Avro bytes type.
309
- - `OCTET STRING` is mapped to Avro bytes type.
310
- - `INTEGER` is mapped to Avro long type.
311
- - `REAL` is mapped to Avro double type.
312
- - `BOOLEAN` is mapped to Avro boolean type.
313
- - `NULL` is mapped to Avro null type.
314
- - `UTF8String`, `PrintableString`, `IA5String`, `BMPString`, `NumericString`, `TeletexString`, `VideotexString`, `GraphicString`, `VisibleString`, `GeneralString`, `UniversalString`, `CharacterString`, `T61String` are all mapped to Avro string type.
315
- - All other ASN.1 types are mapped to Avro string type.
316
- - The ability to parse ASN.1 schema files is limited and the tool may not be able to parse all ASN.1 files. The tool is based on the Python asn1tools package and is limited to that package's capabilities.
317
-
318
- ### Convert Kusto table definition to Avrotize Schema
319
-
320
- ```bash
321
- avrotize k2a --kusto-uri <kusto_cluster_uri> --kusto-database <kusto_database> [--out <path_to_avro_schema_file>] [--emit-cloudevents-xregistry]
322
- ```
323
-
324
- Parameters:
325
-
326
- - `--kusto-uri`: The URI of the Kusto cluster to connect to.
327
- - `--kusto-database`: The name of the Kusto database to read the table definitions from.
328
- - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
329
- - `--emit-cloudevents-xregistry`: (optional) See discussion below.
330
-
331
- Conversion notes:
332
-
333
- - The tool directly connects to the Kusto cluster and reads the table definitions from the specified database. The tool will convert all tables in the database to Avro record types, returned in a top-level type union.
334
- - Connecting to the Kusto cluster leans on the same authentication mechanisms as the Azure CLI. The tool will use the same authentication context as the Azure CLI if it is installed and authenticated.
335
- - The tool will map the Kusto column types to Avro types as follows:
336
- - `bool` is mapped to Avro boolean type.
337
- - `datetime` is mapped to Avro long type with logical type `timestamp-millis`.
338
- - `decimal` is mapped to a logical Avro type with the `logicalType` set to `decimal` and the `precision` and `scale` set to the values of the `decimal` type in Kusto.
339
- - `guid` is mapped to Avro string type.
340
- - `int` is mapped to Avro int type.
341
- - `long` is mapped to Avro long type.
342
- - `real` is mapped to Avro double type.
343
- - `string` is mapped to Avro string type.
344
- - `timespan` is mapped to a logical Avro type with the `logicalType` set to `duration`.
345
- - For `dynamic` columns, the tool will sample the data in the table to determine the structure of the dynamic column. The tool will map the dynamic column to an Avro record type with fields that correspond to the fields found in the dynamic column. If the dynamic column contains nested dynamic columns, the tool will recursively map those to Avro record types. If records with conflicting structures are found in the dynamic column, the tool will emit a union of record types for the dynamic column.
346
- - If the `--emit-cloudevents-xregistry` option is set, the tool will emit an [xRegistry](http://xregistry.io) registry manifest file with a CloudEvent message definition for each table in the Kusto database and a separate Avro Schema for each table in the embedded schema registry. If one or more tables are found to contain CloudEvent data (as indicated by the presence of the CloudEvents attribute columns), the tool will inspect the content of the `type` (or `__type` or `__type`) columns to determine which CloudEvent types have been stored in the table and will emit a CloudEvent definition and schema for each unique type.
347
-
348
- ### Convert Avrotize Schema to Kusto table declaration
349
-
350
- ```bash
351
- avrotize a2k <path_to_avro_schema_file> [--out <path_to_kusto_kql_file>] [--record-type <record_type>] [--emit-cloudevents-columns] [--emit-cloudevents-dispatch]
352
- ```
353
-
354
- Parameters:
355
-
356
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
357
- - `--out`: The path to the Kusto KQL file to write the conversion result to. If omitted, the output is directed to stdout.
358
- - `--record-type`: (optional) The name of the Avro record type to convert to a Kusto table.
359
- - `--emit-cloudevents-columns`: (optional) If set, the tool will add [CloudEvents](https://cloudevents.io) attribute columns to the table: `___id`, `___source`, `___subject`, `___type`, and `___time`.
360
- - `--emit-cloudevents-dispatch`: (optional) If set, the tool will add a table named `_cloudevents_dispatch` to the script or database, which serves as an ingestion and dispatch table for CloudEvents. The table has columns for the core CloudEvents attributes and a `data` column that holds the CloudEvents data. For each table in the Avrotize Schema, the tool will create an update policy that maps events whose `type` attribute matches the Avro type name to the respective table.
361
-
362
- Conversion notes:
363
-
364
- - Only the Avro `record` type can be mapped to a Kusto table. If the Avrotize Schema contains other types (like `enum` or `array`), the tool will ignore them.
365
- - Only the first `record` type in the Avrotize Schema is converted to a Kusto table. If the Avrotize Schema contains other `record` types, they will be ignored. The `--record-type` option can be used to specify which `record` type to convert.
366
- - The fields of the record are mapped to columns in the Kusto table. Fields that are records or arrays or maps are mapped to columns of type `dynamic` in the Kusto table.
367
-
368
- ### Convert Avrotize Schema to SQL Schema
369
-
370
- ```bash
371
- avrotize a2sql [input] --out <path_to_sql_script> --dialect <dialect>
372
- ```
373
-
374
- Parameters:
375
-
376
- - `input`: The path to the Avrotize schema file to be converted (or read from stdin if omitted).
377
- - `--out`: The path to the SQL script file to write the conversion result to.
378
- - `--dialect`: The SQL dialect (database type) to target. Supported dialects include:
379
- - `mysql`, `mariadb`, `postgres`, `sqlserver`, `oracle`, `sqlite`, `bigquery`, `snowflake`, `redshift`, `db2`
380
- - `--emit-cloudevents-columns`: (Optional) Add CloudEvents columns to the SQL table.
381
-
382
- For detailed conversion rules and type mappings for each SQL dialect, refer to the [SQL Conversion Notes](sqlcodegen.md) document.
383
-
384
- ### Convert Avrotize Schema to MongoDB schema
385
-
386
- ```bash
387
- avrotize a2mongo <path_to_avro_schema_file> [--out <path_to_mongodb_schema>] [--emit-cloudevents-columns]
388
- ```
389
-
390
- Parameters:
391
-
392
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
393
- - `--out`: The path to the MongoDB schema file to write the conversion result to.
394
- - `--emit-cloudevents-columns`: (optional) If set, the tool will add [CloudEvents](https://cloudevents.io) attribute columns to the MongoDB schema.
395
-
396
- Conversion notes:
397
-
398
- - The fields of the Avro record type are mapped to fields in the MongoDB schema. Fields that are records or arrays or maps are mapped to fields of type `object`.
399
- - The emitted MongoDB schema file is a JSON file that can be used with MongoDB's `mongoimport` tool to create a collection with the specified schema.
400
-
401
- Here are the "Convert ..." sections for the newly added commands:
402
-
403
- ### Convert Avrotize schema to Cassandra schema
404
-
405
- ```bash
406
- avrotize a2cassandra [input] --out <output_directory> [--emit-cloudevents-columns]
407
- ```
408
-
409
- - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
410
- - `--out`: Output path for the Cassandra schema (required).
411
- - `--emit-cloudevents-columns`: Add CloudEvents columns to the Cassandra schema (optional, default: false).
412
-
413
- Refer to the detailed conversion notes for Cassandra in the [NoSQL Conversion Notes](nosqlcodegen.md).
414
-
415
- ### Convert Avrotize schema to DynamoDB schema
416
-
417
- ```bash
418
- avrotize a2dynamodb [input] --out <output_directory> [--emit-cloudevents-columns]
419
- ```
420
-
421
- - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
422
- - `--out`: Output path for the DynamoDB schema (required).
423
- - `--emit-cloudevents-columns`: Add CloudEvents columns to the DynamoDB schema (optional, default: false).
424
-
425
- Refer to the detailed conversion notes for DynamoDB in the [NoSQL Conversion Notes](nosqlcodegen.md).
426
-
427
- ### Convert Avrotize schema to Elasticsearch schema
428
-
429
- ```bash
430
- avrotize a2es [input] --out <output_directory> [--emit-cloudevents-columns]
431
- ```
432
-
433
- - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
434
- - `--out`: Output path for the Elasticsearch schema (required).
435
- - `--emit-cloudevents-columns`: Add CloudEvents columns to the Elasticsearch schema (optional, default: false).
436
-
437
- Refer to the detailed conversion notes for Elasticsearch in the [NoSQL Conversion Notes](nosqlcodegen.md).
438
-
439
- ### Convert Avrotize schema to CouchDB schema
440
-
441
- ```bash
442
- avrotize a2couchdb [input] --out <output_directory> [--emit-cloudevents-columns]
443
- ```
444
-
445
- - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
446
- - `--out`: Output path for the CouchDB schema (required).
447
- - `--emit-cloudevents-columns`: Add CloudEvents columns to the CouchDB schema (optional, default: false).
448
-
449
- Refer to the detailed conversion notes for CouchDB in the [NoSQL Conversion Notes](nosqlcodegen.md).
450
-
451
- ### Convert Avrotize schema to Neo4j schema
452
-
453
- ```bash
454
- avrotize a2neo4j [input] --out <output_directory> [--emit-cloudevents-columns]
455
- ```
456
-
457
- - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
458
- - `--out`: Output path for the Neo4j schema (required).
459
- - `--emit-cloudevents-columns`: Add CloudEvents columns to the Neo4j schema (optional, default: false).
460
-
461
- Refer to the detailed conversion notes for Neo4j in the [NoSQL Conversion Notes](nosqlcodegen.md).
462
-
463
- ### Convert Avrotize schema to Firebase schema
464
-
465
- ```bash
466
- avrotize a2firebase [input] --out <output_directory> [--emit-cloudevents-columns]
467
- ```
468
-
469
- - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
470
- - `--out`: Output path for the Firebase schema (required).
471
- - `--emit-cloudevents-columns`: Add CloudEvents columns to the Firebase schema (optional, default: false).
472
-
473
- Refer to the detailed conversion notes for Firebase in the [NoSQL Conversion Notes](nosqlcodegen.md).
474
-
475
- ### Convert Avrotize schema to CosmosDB schema
476
-
477
- ```bash
478
- avrotize a2cosmos [input] --out <output_directory> [--emit-cloudevents-columns]
479
- ```
480
-
481
- - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
482
- - `--out`: Output path for the CosmosDB schema (required).
483
- - `--emit-cloudevents-columns`: Add CloudEvents columns to the CosmosDB schema (optional, default: false).
484
-
485
- Refer to the detailed conversion notes for CosmosDB in the [NoSQL Conversion Notes](nosqlcodegen.md).
486
-
487
- ### Convert Avrotize schema to HBase schema
488
-
489
- ```bash
490
- avrotize a2hbase [input] --out <output_directory> [--emit-cloudevents-columns]
491
- ```
492
-
493
- - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
494
- - `--out`: Output path for the HBase schema (required).
495
- - `--emit-cloudevents-columns`: Add CloudEvents columns to the HBase schema (optional, default: false).
496
-
497
- Refer to the detailed conversion notes for HBase in the [NoSQL Conversion Notes](nosqlcodegen.md).
498
-
499
- ### Convert Avrotize Schema to empty Parquet file
500
-
501
- ```bash
502
- avrotize a2pq <path_to_avro_schema_file> [--out <path_to_parquet_schema_file>] [--record-type <record-type-from-avro>] [--emit-cloudevents-columns]
503
- ```
504
-
505
- Parameters:
506
-
507
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
508
- - `--out`: The path to the Parquet schema file to write the conversion result to. If omitted, the output is directed to stdout.
509
- - `--record-type`: (optional) The name of the Avro record type to convert to a Parquet schema.
510
- - `--emit-cloudevents-columns`: (optional) If set, the tool will add [CloudEvents](https://cloudevents.io) attribute columns to the Parquet schema: `__id`, `__source`, `__subject`, `__type`, and `__time`.
511
-
512
- Conversion notes:
513
-
514
- - The emitted Parquet file contains only the schema, no data rows.
515
- - The tool only supports writing Parquet files for Avrotize Schema that describe a single `record` type. If the Avrotize Schema contains a top-level union, the `--record-type` option must be used to specify which record type to emit.
516
- - The fields of the record are mapped to columns in the Parquet file. Array and record fields are mapped to Parquet nested types. Avro type unions are mapped to structures, not to Parquet unions since those are not supported by the PyArrow library used here.
517
-
518
- ### Convert Avrotize Schema to Iceberg schema
519
-
520
- ```bash
521
- avrotize a2ib <path_to_avro_schema_file> [--out <path_to_iceberg_schema_file>] [--record-type <record-type-from-avro>] [--emit-cloudevents-columns]
522
- ```
523
-
524
- Parameters:
525
-
526
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
527
- - `--out`: The path to the Iceberg schema file to write the conversion result to. If omitted, the output is directed to stdout.
528
- - `--record-type`: (optional) The name of the Avro record type to convert to an Iceberg schema.
529
- - `--emit-cloudevents-columns`: (optional) If set, the tool will add [CloudEvents](https://cloudevents.io) attribute columns to the Iceberg schema: `__id`, `__source`, `__subject`, `__type`, and `__time`.
530
-
531
- Conversion notes:
532
-
533
- - The emitted Iceberg file contains only the schema, no data rows.
534
- - The tool only supports writing Iceberg files for Avrotize Schema that describe a single `record` type. If the Avrotize Schema contains a top-level union, the `--record-type` option must be used to specify which record type to emit.
535
- - The fields of the record are mapped to columns in the Iceberg file. Array and record fields are mapped to Iceberg nested types. Avro type unions are mapped to structures, not to Iceberg unions since those are not supported by the PyArrow library used here.
536
-
537
- ### Convert Parquet schema to Avrotize Schema
538
-
539
- ```bash
540
- avrotize pq2a <path_to_parquet_file> [--out <path_to_avro_schema_file>] [--namespace <avro_schema_namespace>]
541
- ```
542
-
543
- Parameters:
544
-
545
- - `<path_to_parquet_file>`: The path to the Parquet file to be converted. If omitted, the file is read from stdin.
546
- - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
547
- - `--namespace`: (optional) The namespace to use in the Avrotize Schema if the Parquet file does not define a namespace.
548
-
549
- Conversion notes:
550
-
551
- - The tool reads the schema from the Parquet file and converts it to Avrotize Schema. The data in the Parquet file is not read or converted.
552
- - The fields of the Parquet schema are mapped to fields in the Avrotize Schema. Nested fields are mapped to nested records in the Avrotize Schema.
553
-
554
- ### Convert CSV file to Avrotize Schema
555
-
556
- ```bash
557
- avrotize csv2a <path_to_csv_file> [--out <path_to_avro_schema_file>] [--namespace <avro_schema_namespace>]
558
- ```
559
-
560
- Parameters:
561
-
562
- - `<path_to_csv_file>`: The path to the CSV file to be converted. If omitted, the file is read from stdin.
563
- - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
564
- - `--namespace`: (optional) The namespace to use in the Avrotize Schema if the CSV file does not define a namespace.
565
-
566
- Conversion notes:
567
-
568
- - The tool reads the CSV file and converts it to Avrotize Schema. The first row of the CSV file is assumed to be the header row, containing the field names.
569
- - The fields of the CSV file are mapped to fields in the Avrotize Schema. The tool infers the types of the fields from the data in the CSV file.
570
-
571
- ### Convert Kafka Connect Schema to Avrotize Schema
572
-
573
- ```bash
574
- avrotize kstruct2a [input] --out <path_to_avro_schema_file>
575
- ```
576
-
577
- Parameters:
578
-
579
- - `input`: The path to the Kafka Struct file to be converted (or read from stdin if omitted).
580
- - `--out`: The path to the Avrotize Schema file to write the conversion result to.
581
- - `--kstruct`: Deprecated: The path to the Kafka Struct file (for backward compatibility).
582
-
583
- Conversion notes:
584
-
585
- - The tool converts the Kafka Struct definition to an Avrotize Schema, mapping Kafka data types to their Avro equivalents.
586
- - Kafka Structs are typically used to define data structures for Kafka Connect and other Kafka-based applications. This command facilitates interoperability by enabling the conversion of these definitions into Avro, which can be further used with various serialization and schema registry tools.
587
-
588
- ### Convert Avrotize Schema to C# classes
589
-
590
- ```bash
591
- avrotize a2cs <path_to_avro_schema_file> [--out <path_to_csharp_dir>] [--namespace <csharp_namespace>] [--avro-annotation] [--system_text_json_annotation] [--newtonsoft-json-annotation] [--pascal-properties]
592
- ```
593
-
594
- Parameters:
595
-
596
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
597
- - `--out`: The path to the directory to write the C# classes to. Required.
598
- - `--namespace`: (optional) The namespace to use in the C# classes.
599
- - `--avro-annotation`: (optional) Use Avro annotations.
600
- - `--system_text_json_annotation`: (optional) Use System.Text.Json annotations.
601
- - `--newtonsoft-json-annotation`: (optional) Use Newtonsoft.Json annotations.
602
- - `--pascal-properties`: (optional) Use PascalCase properties.
603
-
604
- Conversion notes:
605
-
606
- - The tool generates C# classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a C# class.
607
- - The fields of the record are mapped to properties in the C# class. Nested records are mapped to nested classes in the C# class.
608
- - The tool supports adding annotations to the properties in the C# class. The `--avro-annotation` option adds Avro annotations, the `--system_text_json_annotation` option adds System.Text.Json annotations, and the `--newtonsoft-json-annotation` option adds Newtonsoft.Json annotations.
609
- - The `--pascal-properties` option changes the naming convention of the properties to PascalCase.
610
-
611
- ### Convert Avrotize Schema to Java classes
612
-
613
- ```bash
614
- avrotize a2java <path_to_avro_schema_file> [--out <path_to_java_dir>] [--package <java_package>] [--avro-annotation] [--jackson-annotation] [--pascal-properties]
615
- ```
616
-
617
- Parameters:
618
-
619
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
620
- - `--out`: The path to the directory to write the Java classes to. Required.
621
- - `--package`: (optional) The package to use in the Java classes.
622
- - `--avro-annotation`: (optional) Use Avro annotations.
623
- - `--jackson-annotation`: (optional) Use Jackson annotations.
624
- - `--pascal-properties`: (optional) Use PascalCase properties.
625
-
626
- Conversion notes:
627
-
628
- - The tool generates Java classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Java class.
629
- - The fields of the record are mapped to properties in the Java class. Nested records are mapped to nested classes in the Java class.
630
- - The tool supports adding annotations to the properties in the Java class. The `--avro-annotation` option adds Avro annotations, and the `--jackson-annotation` option adds Jackson annotations.
631
- - The `--pascal-properties` option changes the naming convention of the properties to PascalCase.
632
-
633
- ### Convert Avrotize Schema to Python classes
634
-
635
- ```bash
636
- avrotize a2py <path_to_avro_schema_file> [--out <path_to_python_dir>] [--package <python_package>] [--dataclasses-json-annotation] [--avro-annotation]
637
- ```
638
-
639
- Parameters:
640
-
641
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
642
- - `--out`: The path to the directory to write the Python classes to. Required.
643
- - `--package`: (optional) The package to use in the Python classes.
644
- - `--dataclasses-json-annotation`: (optional) Use dataclasses-json annotations.
645
- - `--avro-annotation`: (optional) Use Avro annotations.
646
-
647
- Conversion notes:
648
-
649
- - The tool generates Python classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Python class.
650
- - The fields of the record are mapped to properties in the Python class. Nested records are mapped to nested classes in the Python class.
651
- - The tool supports adding annotations to the properties in the Python class. The `--dataclasses-json-annotation` option adds dataclasses-json annotations, and the `--avro-annotation` option adds Avro annotations.
652
-
653
- ### Convert Avrotize Schema to TypeScript classes
654
-
655
- ```bash
656
- avrotize a2ts <path_to_avro_schema_file> [--out <path_to_typescript_dir>] [--package <typescript_package>] [--avro-annotation] [--typedjson-annotation]
657
- ```
658
-
659
- Parameters:
660
-
661
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
662
- - `--out`: The path to the directory to write the TypeScript classes to. Required.
663
- - `--package`: (optional) The package to use in the TypeScript classes.
664
- - `--avro-annotation`: (optional) Use Avro annotations.
665
- - `--typedjson-annotation`: (optional) Use TypedJSON annotations.
666
-
667
- Conversion notes:
668
-
669
- - The tool generates TypeScript classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a TypeScript class.
670
- - The fields of the record are mapped to properties in the TypeScript class. Nested records are mapped to nested classes in the TypeScript class.
671
- - The tool supports adding annotations to the properties in the TypeScript class. The `--avro-annotation` option adds Avro annotations, and the `--typedjson-annotation` option adds TypedJSON annotations.
672
-
673
- ### Convert Avrotize Schema to JavaScript classes
674
-
675
- ```bash
676
- avrotize a2js <path_to_avro_schema_file> [--out <path_to_javascript_dir>] [--package <javascript_package>] [--avro-annotation]
677
- ```
678
-
679
- Parameters:
680
-
681
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
682
- - `--out`: The path to the directory to write the JavaScript classes to. Required.
683
- - `--package`: (optional) The package to use in the JavaScript classes.
684
- - `--avro-annotation`: (optional) Use Avro annotations.
685
-
686
- Conversion notes:
687
-
688
- - The tool generates JavaScript classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a JavaScript class.
689
- - The fields of the record are mapped to properties in the JavaScript class. Nested records are mapped to nested classes in the JavaScript class.
690
- - The tool supports adding annotations to the properties in the JavaScript class. The `--avro-annotation` option adds Avro annotations.
691
-
692
- ### Convert Avrotize Schema to C++ classes
693
-
694
- ```bash
695
- avrotize a2cpp <path_to_avro_schema_file> [--out <path_to_cpp_dir>] [--namespace <cpp_namespace>] [--avro-annotation] [--json-annotation]
696
- ```
697
-
698
- Parameters:
699
-
700
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
701
- - `--out`: The path to the directory to write the C++ classes to. Required.
702
- - `--namespace`: (optional) The namespace to use in the C++ classes.
703
- - `--avro-annotation`: (optional) Use Avro annotations.
704
- - `--json-annotation`: (optional) Use JSON annotations.
705
-
706
- Conversion notes:
707
-
708
- - The tool generates C++ classes from the Avrotize Schema. Each record type in the Av
709
-
710
- rotize Schema is converted to a C++ class.
711
-
712
- - The fields of the record are mapped to properties in the C++ class. Nested records are mapped to nested classes in the C++ class.
713
- - The tool supports adding annotations to the properties in the C++ class. The `--avro-annotation` option adds Avro annotations, and the `--json-annotation` option adds JSON annotations.
714
-
715
- ### Convert Avrotize Schema to Go classes
716
-
717
- ```bash
718
- avrotize a2go <path_to_avro_schema_file> [--out <path_to_go_dir>] [--package <go_package>] [--avro-annotation] [--json-annotation] [--package-site <go_package_site>] [--package-username <go_package_username>]
719
- ```
720
-
721
- Parameters:
722
-
723
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
724
- - `--out`: The path to the directory to write the Go classes to. Required.
725
- - `--package`: (optional) The package to use in the Go classes.
726
- - `--package-site`: (optional) The package site to use in the Go classes.
727
- - `--package-username`: (optional) The package username to use in the Go classes.
728
- - `--avro-annotation`: (optional) Use Avro annotations.
729
- - `--json-annotation`: (optional) Use JSON annotations.
730
-
731
- Conversion notes:
732
-
733
- - The tool generates Go classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Go class.
734
- - The fields of the record are mapped to properties in the Go class. Nested records are mapped to nested classes in the Go class.
735
- - The tool supports adding annotations to the properties in the Go class. The `--avro-annotation` option adds Avro annotations, and the `--json-annotation` option adds JSON annotations.
736
-
737
- ### Convert Avrotize Schema to Rust classes
738
-
739
- ```bash
740
- avrotize a2rust <path_to_avro_schema_file> [--out <path_to_rust_dir>] [--package <rust_package>] [--avro-annotation] [--serde-annotation]
741
- ```
742
-
743
- Parameters:
744
-
745
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
746
- - `--out`: The path to the directory to write the Rust classes to. Required.
747
- - `--package`: (optional) The package to use in the Rust classes.
748
- - `--avro-annotation`: (optional) Use Avro annotations.
749
- - `--serde-annotation`: (optional) Use Serde annotations.
750
-
751
- Conversion notes:
752
-
753
- - The tool generates Rust classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Rust class.
754
- - The fields of the record are mapped to properties in the Rust class. Nested records are mapped to nested classes in the Rust class.
755
- - The tool supports adding annotations to the properties in the Rust class. The `--avro-annotation` option adds Avro annotations, and the `--serde-annotation` option adds Serde annotations.
756
-
757
- ### Convert Avrotize Schema to Datapackage schema
758
-
759
- ```bash
760
- avrotize a2dp <path_to_avro_schema_file> [--out <path_to_datapackage_file>] [--record-type <record-type-from-avro>]
761
- ```
762
-
763
- Parameters:
764
-
765
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
766
- - `--out`: The path to the Datapackage schema file to write the conversion result to. If omitted, the output is directed to stdout.
767
- - `--record-type`: (optional) The name of the Avro record type to convert to a Datapackage schema.
768
-
769
- Conversion notes:
770
-
771
- - The tool generates a Datapackage schema from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Datapackage resource.
772
- - The fields of the record are mapped to fields in the Datapackage resource. Nested records are mapped to nested resources in the Datapackage.
773
-
774
- ### Convert Avrotize Schema to Markdown documentation
775
-
776
- ```bash
777
- avrotize a2md <path_to_avro_schema_file> [--out <path_to_markdown_file>]
778
- ```
779
-
780
- Parameters:
781
-
782
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
783
- - `--out`: The path to the Markdown file to write the conversion result to. If omitted, the output is directed to stdout.
784
-
785
- Conversion notes:
786
-
787
- - The tool generates Markdown documentation from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Markdown section.
788
- - The fields of the record are documented in a table in the Markdown section. Nested records are documented in nested sections in the Markdown file.
789
-
790
- ### Create the Parsing Canonical Form (PCF) of an Avrotize schema
791
-
792
- ```bash
793
- avrotize pcf <path_to_avro_schema_file>
794
- ```
795
-
796
- Parameters:
797
-
798
- - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
799
-
800
- Conversion notes:
801
-
802
- - The tool generates the Parsing Canonical Form (PCF) of the Avrotize Schema. The PCF is a normalized form of the schema that is used for schema comparison and compatibility checking.
803
- - The PCF is a JSON object that is written to stdout.
804
-
805
- This document provides an overview of the usage and functionality of Avrotize. For more detailed information, please refer to the [Avrotize Schema documentation](specs/avrotize-schema.md) and the individual command help messages.
1
+ Metadata-Version: 2.4
2
+ Name: structurize
3
+ Version: 2.16.5
4
+ Summary: Tools to convert from and to JSON Structure from various other schema languages.
5
+ Author-email: Clemens Vasters <clemensv@microsoft.com>
6
+ Classifier: Programming Language :: Python :: 3
7
+ Classifier: License :: OSI Approved :: MIT License
8
+ Classifier: Operating System :: OS Independent
9
+ Requires-Python: >=3.10
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: jsonschema>=4.23.0
13
+ Requires-Dist: lark>=1.1.9
14
+ Requires-Dist: pyarrow>=22.0.0
15
+ Requires-Dist: asn1tools>=0.167.0
16
+ Requires-Dist: jsonpointer>=3.0.0
17
+ Requires-Dist: jsonpath-ng>=1.6.1
18
+ Requires-Dist: jsoncomparison>=1.1.0
19
+ Requires-Dist: requests>=2.32.3
20
+ Requires-Dist: azure-kusto-data>=5.0.5
21
+ Requires-Dist: azure-identity>=1.17.1
22
+ Requires-Dist: datapackage>=1.15.4
23
+ Requires-Dist: jinja2>=3.1.4
24
+ Requires-Dist: pyiceberg>=0.10.0
25
+ Requires-Dist: pandas>=2.2.2
26
+ Requires-Dist: docker>=7.1.0
27
+ Provides-Extra: dev
28
+ Requires-Dist: pytest>=8.3.2; extra == "dev"
29
+ Requires-Dist: fastavro>=1.9.5; extra == "dev"
30
+ Requires-Dist: xmlschema>=3.3.2; extra == "dev"
31
+ Requires-Dist: xmlunittest>=1.0.1; extra == "dev"
32
+ Requires-Dist: pylint>=3.2.6; extra == "dev"
33
+ Requires-Dist: dataclasses_json>=0.6.7; extra == "dev"
34
+ Requires-Dist: dataclasses>=0.8; extra == "dev"
35
+ Requires-Dist: pydantic>=2.8.2; extra == "dev"
36
+ Requires-Dist: avro>=1.12.0; extra == "dev"
37
+ Requires-Dist: testcontainers>=4.7.2; extra == "dev"
38
+ Requires-Dist: pymysql>=1.1.1; extra == "dev"
39
+ Requires-Dist: psycopg2>=2.9.9; extra == "dev"
40
+ Requires-Dist: pyodbc>=5.1.0; extra == "dev"
41
+ Requires-Dist: pymongo>=4.8.0; extra == "dev"
42
+ Requires-Dist: oracledb>=2.3.0; extra == "dev"
43
+ Requires-Dist: cassandra-driver>=3.29.1; extra == "dev"
44
+ Requires-Dist: sqlalchemy>=2.0.32; extra == "dev"
45
+ Dynamic: license-file
46
+
47
+ # Avrotize & Structurize
48
+
49
+ Avrotize is a ["Rosetta Stone"](https://en.wikipedia.org/wiki/Rosetta_Stone) for data structure definitions, allowing you to convert between numerous data and database schema formats and to generate code for different programming languages.
50
+
51
+ It is, for instance, a well-documented and predictable converter and code generator for data structures originally defined in JSON Schema (of arbitrary complexity).
52
+
53
+ The tool leans on the Apache Avro-derived [Avrotize Schema](specs/avrotize-schema.md) as its schema model.
54
+
55
+ - Programming languages: Python, C#, Java, TypeScript, JavaScript, Rust, Go, C++
56
+ - SQL Databases: MySQL, MariaDB, PostgreSQL, SQL Server, Oracle, SQLite, BigQuery, Snowflake, Redshift, DB2
57
+ - Other databases: KQL/Kusto, MongoDB, Cassandra, Redis, Elasticsearch, DynamoDB, CosmosDB
58
+ - Data schema formats: Avro, JSON Schema, XML Schema (XSD), Protocol Buffers 2 and 3, ASN.1, Apache Parquet
59
+
60
+ ## Installation
61
+
62
+ You can install Avrotize from PyPI, [having installed Python 3.10 or later](https://www.python.org/downloads/):
63
+
64
+ ```bash
65
+ pip install avrotize
66
+ ```
67
+
68
+ ## Usage
69
+
70
+ Avrotize provides several commands for converting schema formats via Avrotize Schema.
71
+
72
+ Converting to Avrotize Schema:
73
+
74
+ - [`avrotize p2a`](#convert-proto-schema-to-avrotize-schema) - Convert Protobuf (2 or 3) schema to Avrotize Schema.
75
+ - [`avrotize j2a`](#convert-json-schema-to-avrotize-schema) - Convert JSON schema to Avrotize Schema.
76
+ - [`avrotize x2a`](#convert-xml-schema-xsd-to-avrotize-schema) - Convert XML schema to Avrotize Schema.
77
+ - [`avrotize asn2a`](#convert-asn1-schema-to-avrotize-schema) - Convert ASN.1 to Avrotize Schema.
78
+ - [`avrotize k2a`](#convert-kusto-table-definition-to-avrotize-schema) - Convert Kusto table definitions to Avrotize Schema.
79
+ - [`avrotize pq2a`](#convert-parquet-schema-to-avrotize-schema) - Convert Parquet schema to Avrotize Schema.
80
+ - [`avrotize csv2a`](#convert-csv-file-to-avrotize-schema) - Convert CSV file to Avrotize Schema.
81
+ - [`avrotize kstruct2a`](#convert-kafka-connect-schema-to-avrotize-schema) - Convert Kafka Connect Schema to Avrotize Schema.
82
+
83
+ Converting from Avrotize Schema:
84
+
85
+ - [`avrotize a2p`](#convert-avrotize-schema-to-proto-schema) - Convert Avrotize Schema to Protobuf 3 schema.
86
+ - [`avrotize a2j`](#convert-avrotize-schema-to-json-schema) - Convert Avrotize Schema to JSON schema.
87
+ - [`avrotize a2x`](#convert-avrotize-schema-to-xml-schema) - Convert Avrotize Schema to XML schema.
88
+ - [`avrotize a2k`](#convert-avrotize-schema-to-kusto-table-declaration) - Convert Avrotize Schema to Kusto table definition.
89
+ - [`avrotize a2sql`](#convert-avrotize-schema-to-sql-table-definition) - Convert Avrotize Schema to SQL table definition.
90
+ - [`avrotize a2pq`](#convert-avrotize-schema-to-empty-parquet-file) - Convert Avrotize Schema to Parquet or Iceberg schema.
91
+ - [`avrotize a2ib`](#convert-avrotize-schema-to-iceberg-schema) - Convert Avrotize Schema to Iceberg schema.
92
+ - [`avrotize a2mongo`](#convert-avrotize-schema-to-mongodb-schema) - Convert Avrotize Schema to MongoDB schema.
93
+ - [`avrotize a2cassandra`](#convert-avrotize-schema-to-cassandra-schema) - Convert Avrotize Schema to Cassandra schema.
94
+ - [`avrotize a2es`](#convert-avrotize-schema-to-elasticsearch-schema) - Convert Avrotize Schema to Elasticsearch schema.
95
+ - [`avrotize a2dynamodb`](#convert-avrotize-schema-to-dynamodb-schema) - Convert Avrotize Schema to DynamoDB schema.
96
+ - [`avrotize a2cosmos`](#convert-avrotize-schema-to-cosmosdb-schema) - Convert Avrotize Schema to CosmosDB schema.
97
+ - [`avrotize a2couchdb`](#convert-avrotize-schema-to-couchdb-schema) - Convert Avrotize Schema to CouchDB schema.
98
+ - [`avrotize a2firebase`](#convert-avrotize-schema-to-firebase-schema) - Convert Avrotize Schema to Firebase schema.
99
+ - [`avrotize a2hbase`](#convert-avrotize-schema-to-hbase-schema) - Convert Avrotize Schema to HBase schema.
100
+ - [`avrotize a2neo4j`](#convert-avrotize-schema-to-neo4j-schema) - Convert Avrotize Schema to Neo4j schema.
101
+ - [`avrotize a2dp`](#convert-avrotize-schema-to-datapackage-schema) - Convert Avrotize Schema to Datapackage schema.
102
+ - [`avrotize a2md`](#convert-avrotize-schema-to-markdown-documentation) - Convert Avrotize Schema to Markdown documentation.
103
+
104
+ Generate code from Avrotize Schema:
105
+
106
+ - [`avrotize a2cs`](#convert-avrotize-schema-to-c-classes) - Generate C# code from Avrotize Schema.
107
+ - [`avrotize a2java`](#convert-avrotize-schema-to-java-classes) - Generate Java code from Avrotize Schema.
108
+ - [`avrotize a2py`](#convert-avrotize-schema-to-python-classes) - Generate Python code from Avrotize Schema.
109
+ - [`avrotize a2ts`](#convert-avrotize-schema-to-typescript-classes) - Generate TypeScript code from Avrotize Schema.
110
+ - [`avrotize a2js`](#convert-avrotize-schema-to-javascript-classes) - Generate JavaScript code from Avrotize Schema.
111
+ - [`avrotize a2cpp`](#convert-avrotize-schema-to-c-classes) - Generate C++ code from Avrotize Schema.
112
+ - [`avrotize a2go`](#convert-avrotize-schema-to-go-classes) - Generate Go code from Avrotize Schema.
113
+ - [`avrotize a2rust`](#convert-avrotize-schema-to-rust-classes) - Generate Rust code from Avrotize Schema.
114
+
115
+ Generate code from JSON Structure:
116
+
117
+ - [`avrotize s2cs`](#convert-json-structure-to-c-classes) - Generate C# code from JSON Structure schema.
118
+ - [`avrotize s2py`](#convert-json-structure-to-python-classes) - Generate Python code from JSON Structure schema.
119
+ - [`avrotize s2ts`](#convert-json-structure-to-typescript-classes) - Generate TypeScript code from JSON Structure schema.
120
+
121
+ Other commands:
122
+
123
+ - [`avrotize pcf`](#create-the-parsing-canonical-form-pcf-of-an-avrotize-schema) - Create the Parsing Canonical Form (PCF) of an Avrotize Schema.
124
+
125
+ ## Overview
126
+
127
+ You can use Avrotize to convert between Avro/Avrotize Schema and other schema formats like JSON Schema, XML Schema (XSD), Protocol Buffers (Protobuf), ASN.1, and database schema formats like Kusto Data Table Definition (KQL) and SQL Table Definition. That means you can also convert from JSON Schema to Protobuf going via Avrotize Schema.
128
+
129
+ You can also generate C#, Java, TypeScript, JavaScript, and Python code from Avrotize Schema documents. The difference to the native Avro tools is that Avrotize can emit data classes without Avro library dependencies and, optionally, with annotations for JSON serialization libraries like Jackson or System.Text.Json.
130
+
131
+ The tool does not convert data (instances of schemas), only the data structure definitions.
132
+
133
+ Mind that the primary objective of the tool is the conversion of schemas that describe data structures used in applications, databases, and message systems. While the project's internal tests do cover a lot of ground, it is nevertheless not a primary goal of the tool to convert every complex document schema like those used for devops pipeline or system configuration files.
134
+
135
+ ## Why?
136
+
137
+ Data structure definitions are an essential part of data exchange, serialization, and storage. They define the shape and type of data, and they are foundational for tooling and libraries for working with the data. Nearly all data schema languages are coupled to a specific data exchange or storage format, locking the definitions to that format.
138
+
139
+ Avrotize is designed as a tool to "unlock" data definitions from JSON Schema or XML Schema and make them usable in other contexts. The intent is also to lay a foundation for transcoding data from one format to another, by translating the schema definitions as accurately as possible into the schema model of the target format's schema. The transcoding of the data itself requires separate tools that are beyond the scope of this project.
140
+
141
+ The use of the term "data structure definition" and not "data object definition" is quite intentional. The focus of the tool is on data structures that can be used for messaging and eventing payloads, for data serialization, and for database tables, with the goal that those structures can be mapped cleanly from and to common programming language types.
142
+
143
+ Therefore, Avrotize intentionally ignores common techniques to model object-oriented inheritance. For instance, when converting from JSON Schema, all content from `allOf` expressions is merged into a single record type rather than trying to model the inheritance tree in Avro.
144
+
145
+ ## Avrotize Schema
146
+
147
+ Avrotize Schema is a schema model that is a full superset of the popular Apache Avro Schema model. Avrotize Schema is the "pivot point" for this tool. All schemas are converted from and to Avrotize Schema.
148
+
149
+ Since Avrotize Schema is a superset of Avro Schema and uses its extensibility features, every Avrotize Schema is also a valid Avro Schema and vice versa.
150
+
151
+ Why did we pick Avro Schema as the foundational schema model?
152
+
153
+ Avro Schema ...
154
+
155
+ - provides a simple, clean, and concise way to define data structures. It is quite easy to understand and use.
156
+ - is self-contained by design without having or requiring external references. Avro Schema can express complex data structure hierarchies spanning multiple namespace boundaries all in a single file, which neither JSON Schema nor XML Schema nor Protobuf can do.
157
+ - can be resolved by code generators and other tools "top-down" since it enforces dependencies to be ordered such that no forward-referencing occurs.
158
+ - emerged out of the Apache Hadoop ecosystem and is widely used for serialization and storage of data and for data exchange between systems.
159
+ - supports native and logical types that cover the needs of many business and technical use cases.
160
+ - can describe the popular JSON data encoding very well and in a way that always maps cleanly to a wide range of programming languages and systems. In contrast, it's quite easy to inadvertently define a JSON Schema that is very difficult to map to a programming language structure.
161
+ - is itself expressed as JSON. That makes it easy to parse and generate, which is not the case for Protobuf or ASN.1, which require bespoke parsers.
162
+
163
+ > It needs to be noted here that while Avro Schema is great for defining data structures, and data classes generated from Avro Schema using this tool or other tools can be used to with the most popular JSON serialization libraries, the Apache Avro project's own JSON encoding has fairly grave interoperability issues with common usage of JSON. Avrotize defines an alternate JSON encoding
164
+
165
+ in [`avrojson.md`](specs/avrojson.md).
166
+
167
+ Avro Schema does not support all the bells and whistles of XML Schema or JSON Schema, but that is a feature, not a bug, as it ensures the portability of the schemas across different systems and infrastructures. Specifically, Avro Schema does not support many of the data validation features found in JSON Schema or XML Schema. There are no `pattern`, `format`, `minimum`, `maximum`, or `required` keywords in Avro Schema, and Avro does not support conditional validation.
168
+
169
+ In a system where data originates as XML or JSON described by a validating XML Schema or JSON Schema, the assumption we make here is that data will be validated using its native schema language first, and then the Avro Schema will be used for transformation or transfer or storage.
170
+
171
+ ## Adding CloudEvents columns for database tables
172
+
173
+ When converting Avrotize Schema to Kusto Data Table Definition (KQL), SQL Table Definition, or Parquet Schema, the tool can add special columns for [CloudEvents](https://cloudevents.io) attributes. CNCF CloudEvents is a specification for describing event data in a common way.
174
+
175
+ The rationale for adding such columns to database tables is that messages and events commonly separate event metadata from the payload data, while that information is merged when events are projected into a database. The metadata often carries important context information about the event that is not contained in the payload itself. Therefore, the tool can add those columns to the database tables for easy alignment of the message context with the payload when building event stores.
176
+
177
+ ### Convert Proto schema to Avrotize Schema
178
+
179
+ ```bash
180
+ avrotize p2a <path_to_proto_file> [--out <path_to_avro_schema_file>]
181
+ ```
182
+
183
+ Parameters:
184
+
185
+ - `<path_to_proto_file>`: The path to the Protobuf schema file to be converted. If omitted, the file is read from stdin.
186
+ - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
187
+
188
+ Conversion notes:
189
+
190
+ - Proto 2 and Proto 3 syntax are supported.
191
+ - Proto package names are mapped to Avro namespaces. The tool does resolve imports and consolidates all imported types into a single Avrotize Schema file.
192
+ - The tool embeds all 'well-known' Protobuf 3.0 types in Avro format and injects them as needed when the respective types are imported. Only the `Timestamp` type is mapped to the Avro logical type 'timestamp-millis'. The rest of the well-known Protobuf types are kept as Avro record types with the same field names and types.
193
+ - Protobuf allows any scalar type as key in a `map`, Avro does not. When converting from Proto to Avro, the type information for the map keys is ignored.
194
+ - The field numbers in message types are not mapped to the positions of the fields in Avro records. The fields in Avro are ordered as they appear in the Proto schema. Consequently, the Avrotize Schema also ignores the `extensions` and `reserved` keywords in the Proto schema.
195
+ - The `optional` keyword results in an Avro field being nullable (union with the `null` type), while the `required` keyword results in a non-nullable field. The `repeated` keyword results in an Avro field being an array of the field type.
196
+ - The `oneof` keyword in Proto is mapped to an Avro union type.
197
+ - All `options` in the Proto schema are ignored.
198
+
199
+ ### Convert Avrotize Schema to Proto schema
200
+
201
+ ```bash
202
+ avrotize a2p <path_to_avro_schema_file> [--out <path_to_proto_directory>] [--naming <naming_mode>] [--allow-optional]
203
+ ```
204
+
205
+ Parameters:
206
+
207
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
208
+ - `--out`: The path to the Protobuf schema directory to write the conversion result to. If omitted, the output is directed to stdout.
209
+ - `--naming`: (optional) Type naming convention. Choices are `snake`, `camel`, `pascal`.
210
+ - `--allow-optional`: (optional) Enable support for 'optional' fields.
211
+
212
+ Conversion notes:
213
+
214
+ - Avro namespaces are resolved into distinct proto package definitions. The tool will create a new `.proto` file with the package definition and an `import` statement for each namespace found in the Avrotize Schema.
215
+ - Avro type unions `[]` are converted to `oneof` expressions in Proto. Avro allows for maps and arrays in the type union, whereas Proto only supports scalar types and message type references. The tool will therefore emit message types containing a single array or map field for any such case and add it to the containing type, and will also recursively resolve further unions in the array and map values.
216
+ - The sequence of fields in a message follows the sequence of fields in the Avro record. When type unions need to be resolved into `oneof` expressions, the alternative fields need to be assigned field numbers, which will shift the field numbers for any subsequent fields.
217
+
218
+ ### Convert JSON schema to Avrotize Schema
219
+
220
+ ```bash
221
+ avrotize j2a <path_to_json_schema_file> [--out <path_to_avro_schema_file>] [--namespace <avro_schema_namespace>] [--split-top-level-records]
222
+ ```
223
+
224
+ Parameters:
225
+
226
+ - `<path_to_json_schema_file>`: The path to the JSON schema file to be converted. If omitted, the file is read from stdin.
227
+ - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
228
+ - `--namespace`: (optional) The namespace to use in the Avrotize Schema if the JSON schema does not define a namespace.
229
+ - `--split-top-level-records`: (optional) Split top-level records into separate files.
230
+
231
+ Conversion notes:
232
+
233
+ - [JSON Schema Handling in Avrotize](specs/jsonschema.md)
234
+
235
+ ### Convert Avrotize Schema to JSON schema
236
+
237
+ ```bash
238
+ avrotize a2j <path_to_avro_schema_file> [--out <path_to_json_schema_file>] [--naming <naming_mode>]
239
+ ```
240
+
241
+ Parameters:
242
+
243
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
244
+ - `--out`: The path to the JSON schema file to write the conversion result to. If omitted, the output is directed to stdout.
245
+ - `--naming`: (optional) Type naming convention. Choices are `snake`, `camel`, `pascal`, `default`.
246
+
247
+ Conversion notes:
248
+
249
+ - [JSON Schema Handling in Avrotize](specs/jsonschema.md)
250
+
251
+ ### Convert XML Schema (XSD) to Avrotize Schema
252
+
253
+ ```bash
254
+ avrotize x2a <path_to_xsd_file> [--out <path_to_avro_schema_file>] [--namespace <avro_schema_namespace>]
255
+ ```
256
+
257
+ Parameters:
258
+
259
+ - `<path_to_xsd_file>`: The path to the XML schema file to be converted. If omitted, the file is read from stdin.
260
+ - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
261
+ - `--namespace`: (optional) The namespace to use in the Avrotize Schema if the XML schema does not define a namespace.
262
+
263
+ Conversion notes:
264
+
265
+ - All XML Schema constructs are mapped to Avro record types with fields, whereby **both**, elements and attributes, become fields in the record. XML is therefore flattened into fields and this aspect of the structure is not preserved.
266
+ - Avro does not support `xsd:any` as Avro does not support arbitrary typing and must always use a named type. The tool will map `xsd:any` to a field `any` typed as a union that allows scalar values or two levels of array and/or map nesting.
267
+ - `simpleType` declarations that define enums are mapped to `enum` types in Avro. All other facets are ignored and simple types are mapped to the corresponding Avro type.
268
+ - `complexType` declarations that have simple content where a base type is augmented with attributes is mapped to a record type in Avro. Any other facets defined on the complex type are ignored.
269
+ - If the schema defines a single root element, the tool will emit a single Avro record type. If the schema defines multiple root elements, the tool will emit a union of record types, each corresponding to a root element.
270
+ - All fields in the resulting Avrotize Schema are annotated with an `xmlkind` extension attribute that indicates whether the field was an `element` or an `attribute` in the XML schema.
271
+
272
+ ### Convert Avrotize Schema to XML schema
273
+
274
+ ```bash
275
+ avrotize a2x <path_to_avro_schema_file> [--out <path_to_xsd_schema_file>] [--namespace <target_namespace>]
276
+ ```
277
+
278
+ Parameters:
279
+
280
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
281
+ - `--out`: The path to the XML schema file to write the conversion result to. If omitted, the output is directed to stdout.
282
+ - `--namespace`: (optional) Target namespace for the XSD schema.
283
+
284
+ Conversion notes:
285
+
286
+ - Avro record types are mapped to XML Schema complex types with elements.
287
+ - Avro enum types are mapped to XML Schema simple types with restrictions.
288
+ - Avro logical types are mapped to XML Schema simple types with restrictions where required.
289
+ - Avro unions are mapped to standalone XSD simple type definitions with a union restriction if all union types are primitives.
290
+ - Avro unions with complex types are resolved into distinct types for each option that are
291
+
292
+ then joined with a choice.
293
+
294
+ ### Convert ASN.1 schema to Avrotize Schema
295
+
296
+ ```bash
297
+ avrotize asn2a <path_to_asn1_schema_file>[,<path_to_asn1_schema_file>,...] [--out <path_to_avro_schema_file>]
298
+ ```
299
+
300
+ Parameters:
301
+
302
+ - `<path_to_asn1_schema_file>`: The path to the ASN.1 schema file to be converted. The tool supports multiple files in a comma-separated list. If omitted, the file is read from stdin.
303
+ - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
304
+
305
+ Conversion notes:
306
+
307
+ - All ASN.1 types are mapped to Avro record types, enums, and unions. Avro does not support the same level of nesting of types as ASN.1, the tool will map the types to the best fit.
308
+ - The tool will map the following ASN.1 types to Avro types:
309
+ - `SEQUENCE` and `SET` are mapped to Avro record types.
310
+ - `CHOICE` is mapped to an Avro record types with all fields being optional. While the `CHOICE` type technically corresponds to an Avro union, the ASN.1 type has different named fields for each option, which is not a feature of Avro unions.
311
+ - `OBJECT IDENTIFIER` is mapped to an Avro string type.
312
+ - `ENUMERATED` is mapped to an Avro enum type.
313
+ - `SEQUENCE OF` and `SET OF` are mapped to Avro array type.
314
+ - `BIT STRING` is mapped to Avro bytes type.
315
+ - `OCTET STRING` is mapped to Avro bytes type.
316
+ - `INTEGER` is mapped to Avro long type.
317
+ - `REAL` is mapped to Avro double type.
318
+ - `BOOLEAN` is mapped to Avro boolean type.
319
+ - `NULL` is mapped to Avro null type.
320
+ - `UTF8String`, `PrintableString`, `IA5String`, `BMPString`, `NumericString`, `TeletexString`, `VideotexString`, `GraphicString`, `VisibleString`, `GeneralString`, `UniversalString`, `CharacterString`, `T61String` are all mapped to Avro string type.
321
+ - All other ASN.1 types are mapped to Avro string type.
322
+ - The ability to parse ASN.1 schema files is limited and the tool may not be able to parse all ASN.1 files. The tool is based on the Python asn1tools package and is limited to that package's capabilities.
323
+
324
+ ### Convert Kusto table definition to Avrotize Schema
325
+
326
+ ```bash
327
+ avrotize k2a --kusto-uri <kusto_cluster_uri> --kusto-database <kusto_database> [--out <path_to_avro_schema_file>] [--emit-cloudevents-xregistry]
328
+ ```
329
+
330
+ Parameters:
331
+
332
+ - `--kusto-uri`: The URI of the Kusto cluster to connect to.
333
+ - `--kusto-database`: The name of the Kusto database to read the table definitions from.
334
+ - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
335
+ - `--emit-cloudevents-xregistry`: (optional) See discussion below.
336
+
337
+ Conversion notes:
338
+
339
+ - The tool directly connects to the Kusto cluster and reads the table definitions from the specified database. The tool will convert all tables in the database to Avro record types, returned in a top-level type union.
340
+ - Connecting to the Kusto cluster leans on the same authentication mechanisms as the Azure CLI. The tool will use the same authentication context as the Azure CLI if it is installed and authenticated.
341
+ - The tool will map the Kusto column types to Avro types as follows:
342
+ - `bool` is mapped to Avro boolean type.
343
+ - `datetime` is mapped to Avro long type with logical type `timestamp-millis`.
344
+ - `decimal` is mapped to a logical Avro type with the `logicalType` set to `decimal` and the `precision` and `scale` set to the values of the `decimal` type in Kusto.
345
+ - `guid` is mapped to Avro string type.
346
+ - `int` is mapped to Avro int type.
347
+ - `long` is mapped to Avro long type.
348
+ - `real` is mapped to Avro double type.
349
+ - `string` is mapped to Avro string type.
350
+ - `timespan` is mapped to a logical Avro type with the `logicalType` set to `duration`.
351
+ - For `dynamic` columns, the tool will sample the data in the table to determine the structure of the dynamic column. The tool will map the dynamic column to an Avro record type with fields that correspond to the fields found in the dynamic column. If the dynamic column contains nested dynamic columns, the tool will recursively map those to Avro record types. If records with conflicting structures are found in the dynamic column, the tool will emit a union of record types for the dynamic column.
352
+ - If the `--emit-cloudevents-xregistry` option is set, the tool will emit an [xRegistry](http://xregistry.io) registry manifest file with a CloudEvent message definition for each table in the Kusto database and a separate Avro Schema for each table in the embedded schema registry. If one or more tables are found to contain CloudEvent data (as indicated by the presence of the CloudEvents attribute columns), the tool will inspect the content of the `type` (or `__type` or `__type`) columns to determine which CloudEvent types have been stored in the table and will emit a CloudEvent definition and schema for each unique type.
353
+
354
+ ### Convert Avrotize Schema to Kusto table declaration
355
+
356
+ ```bash
357
+ avrotize a2k <path_to_avro_schema_file> [--out <path_to_kusto_kql_file>] [--record-type <record_type>] [--emit-cloudevents-columns] [--emit-cloudevents-dispatch]
358
+ ```
359
+
360
+ Parameters:
361
+
362
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
363
+ - `--out`: The path to the Kusto KQL file to write the conversion result to. If omitted, the output is directed to stdout.
364
+ - `--record-type`: (optional) The name of the Avro record type to convert to a Kusto table.
365
+ - `--emit-cloudevents-columns`: (optional) If set, the tool will add [CloudEvents](https://cloudevents.io) attribute columns to the table: `___id`, `___source`, `___subject`, `___type`, and `___time`.
366
+ - `--emit-cloudevents-dispatch`: (optional) If set, the tool will add a table named `_cloudevents_dispatch` to the script or database, which serves as an ingestion and dispatch table for CloudEvents. The table has columns for the core CloudEvents attributes and a `data` column that holds the CloudEvents data. For each table in the Avrotize Schema, the tool will create an update policy that maps events whose `type` attribute matches the Avro type name to the respective table.
367
+
368
+ Conversion notes:
369
+
370
+ - Only the Avro `record` type can be mapped to a Kusto table. If the Avrotize Schema contains other types (like `enum` or `array`), the tool will ignore them.
371
+ - Only the first `record` type in the Avrotize Schema is converted to a Kusto table. If the Avrotize Schema contains other `record` types, they will be ignored. The `--record-type` option can be used to specify which `record` type to convert.
372
+ - The fields of the record are mapped to columns in the Kusto table. Fields that are records or arrays or maps are mapped to columns of type `dynamic` in the Kusto table.
373
+
374
+ ### Convert Avrotize Schema to SQL Schema
375
+
376
+ ```bash
377
+ avrotize a2sql [input] --out <path_to_sql_script> --dialect <dialect>
378
+ ```
379
+
380
+ Parameters:
381
+
382
+ - `input`: The path to the Avrotize schema file to be converted (or read from stdin if omitted).
383
+ - `--out`: The path to the SQL script file to write the conversion result to.
384
+ - `--dialect`: The SQL dialect (database type) to target. Supported dialects include:
385
+ - `mysql`, `mariadb`, `postgres`, `sqlserver`, `oracle`, `sqlite`, `bigquery`, `snowflake`, `redshift`, `db2`
386
+ - `--emit-cloudevents-columns`: (Optional) Add CloudEvents columns to the SQL table.
387
+
388
+ For detailed conversion rules and type mappings for each SQL dialect, refer to the [SQL Conversion Notes](sqlcodegen.md) document.
389
+
390
+ ### Convert Avrotize Schema to MongoDB schema
391
+
392
+ ```bash
393
+ avrotize a2mongo <path_to_avro_schema_file> [--out <path_to_mongodb_schema>] [--emit-cloudevents-columns]
394
+ ```
395
+
396
+ Parameters:
397
+
398
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
399
+ - `--out`: The path to the MongoDB schema file to write the conversion result to.
400
+ - `--emit-cloudevents-columns`: (optional) If set, the tool will add [CloudEvents](https://cloudevents.io) attribute columns to the MongoDB schema.
401
+
402
+ Conversion notes:
403
+
404
+ - The fields of the Avro record type are mapped to fields in the MongoDB schema. Fields that are records or arrays or maps are mapped to fields of type `object`.
405
+ - The emitted MongoDB schema file is a JSON file that can be used with MongoDB's `mongoimport` tool to create a collection with the specified schema.
406
+
407
+ Here are the "Convert ..." sections for the newly added commands:
408
+
409
+ ### Convert Avrotize schema to Cassandra schema
410
+
411
+ ```bash
412
+ avrotize a2cassandra [input] --out <output_directory> [--emit-cloudevents-columns]
413
+ ```
414
+
415
+ - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
416
+ - `--out`: Output path for the Cassandra schema (required).
417
+ - `--emit-cloudevents-columns`: Add CloudEvents columns to the Cassandra schema (optional, default: false).
418
+
419
+ Refer to the detailed conversion notes for Cassandra in the [NoSQL Conversion Notes](nosqlcodegen.md).
420
+
421
+ ### Convert Avrotize schema to DynamoDB schema
422
+
423
+ ```bash
424
+ avrotize a2dynamodb [input] --out <output_directory> [--emit-cloudevents-columns]
425
+ ```
426
+
427
+ - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
428
+ - `--out`: Output path for the DynamoDB schema (required).
429
+ - `--emit-cloudevents-columns`: Add CloudEvents columns to the DynamoDB schema (optional, default: false).
430
+
431
+ Refer to the detailed conversion notes for DynamoDB in the [NoSQL Conversion Notes](nosqlcodegen.md).
432
+
433
+ ### Convert Avrotize schema to Elasticsearch schema
434
+
435
+ ```bash
436
+ avrotize a2es [input] --out <output_directory> [--emit-cloudevents-columns]
437
+ ```
438
+
439
+ - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
440
+ - `--out`: Output path for the Elasticsearch schema (required).
441
+ - `--emit-cloudevents-columns`: Add CloudEvents columns to the Elasticsearch schema (optional, default: false).
442
+
443
+ Refer to the detailed conversion notes for Elasticsearch in the [NoSQL Conversion Notes](nosqlcodegen.md).
444
+
445
+ ### Convert Avrotize schema to CouchDB schema
446
+
447
+ ```bash
448
+ avrotize a2couchdb [input] --out <output_directory> [--emit-cloudevents-columns]
449
+ ```
450
+
451
+ - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
452
+ - `--out`: Output path for the CouchDB schema (required).
453
+ - `--emit-cloudevents-columns`: Add CloudEvents columns to the CouchDB schema (optional, default: false).
454
+
455
+ Refer to the detailed conversion notes for CouchDB in the [NoSQL Conversion Notes](nosqlcodegen.md).
456
+
457
+ ### Convert Avrotize schema to Neo4j schema
458
+
459
+ ```bash
460
+ avrotize a2neo4j [input] --out <output_directory> [--emit-cloudevents-columns]
461
+ ```
462
+
463
+ - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
464
+ - `--out`: Output path for the Neo4j schema (required).
465
+ - `--emit-cloudevents-columns`: Add CloudEvents columns to the Neo4j schema (optional, default: false).
466
+
467
+ Refer to the detailed conversion notes for Neo4j in the [NoSQL Conversion Notes](nosqlcodegen.md).
468
+
469
+ ### Convert Avrotize schema to Firebase schema
470
+
471
+ ```bash
472
+ avrotize a2firebase [input] --out <output_directory> [--emit-cloudevents-columns]
473
+ ```
474
+
475
+ - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
476
+ - `--out`: Output path for the Firebase schema (required).
477
+ - `--emit-cloudevents-columns`: Add CloudEvents columns to the Firebase schema (optional, default: false).
478
+
479
+ Refer to the detailed conversion notes for Firebase in the [NoSQL Conversion Notes](nosqlcodegen.md).
480
+
481
+ ### Convert Avrotize schema to CosmosDB schema
482
+
483
+ ```bash
484
+ avrotize a2cosmos [input] --out <output_directory> [--emit-cloudevents-columns]
485
+ ```
486
+
487
+ - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
488
+ - `--out`: Output path for the CosmosDB schema (required).
489
+ - `--emit-cloudevents-columns`: Add CloudEvents columns to the CosmosDB schema (optional, default: false).
490
+
491
+ Refer to the detailed conversion notes for CosmosDB in the [NoSQL Conversion Notes](nosqlcodegen.md).
492
+
493
+ ### Convert Avrotize schema to HBase schema
494
+
495
+ ```bash
496
+ avrotize a2hbase [input] --out <output_directory> [--emit-cloudevents-columns]
497
+ ```
498
+
499
+ - `input`: Path to the Avrotize schema file (or read from stdin if omitted).
500
+ - `--out`: Output path for the HBase schema (required).
501
+ - `--emit-cloudevents-columns`: Add CloudEvents columns to the HBase schema (optional, default: false).
502
+
503
+ Refer to the detailed conversion notes for HBase in the [NoSQL Conversion Notes](nosqlcodegen.md).
504
+
505
+ ### Convert Avrotize Schema to empty Parquet file
506
+
507
+ ```bash
508
+ avrotize a2pq <path_to_avro_schema_file> [--out <path_to_parquet_schema_file>] [--record-type <record-type-from-avro>] [--emit-cloudevents-columns]
509
+ ```
510
+
511
+ Parameters:
512
+
513
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
514
+ - `--out`: The path to the Parquet schema file to write the conversion result to. If omitted, the output is directed to stdout.
515
+ - `--record-type`: (optional) The name of the Avro record type to convert to a Parquet schema.
516
+ - `--emit-cloudevents-columns`: (optional) If set, the tool will add [CloudEvents](https://cloudevents.io) attribute columns to the Parquet schema: `__id`, `__source`, `__subject`, `__type`, and `__time`.
517
+
518
+ Conversion notes:
519
+
520
+ - The emitted Parquet file contains only the schema, no data rows.
521
+ - The tool only supports writing Parquet files for Avrotize Schema that describe a single `record` type. If the Avrotize Schema contains a top-level union, the `--record-type` option must be used to specify which record type to emit.
522
+ - The fields of the record are mapped to columns in the Parquet file. Array and record fields are mapped to Parquet nested types. Avro type unions are mapped to structures, not to Parquet unions since those are not supported by the PyArrow library used here.
523
+
524
+ ### Convert Avrotize Schema to Iceberg schema
525
+
526
+ ```bash
527
+ avrotize a2ib <path_to_avro_schema_file> [--out <path_to_iceberg_schema_file>] [--record-type <record-type-from-avro>] [--emit-cloudevents-columns]
528
+ ```
529
+
530
+ Parameters:
531
+
532
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
533
+ - `--out`: The path to the Iceberg schema file to write the conversion result to. If omitted, the output is directed to stdout.
534
+ - `--record-type`: (optional) The name of the Avro record type to convert to an Iceberg schema.
535
+ - `--emit-cloudevents-columns`: (optional) If set, the tool will add [CloudEvents](https://cloudevents.io) attribute columns to the Iceberg schema: `__id`, `__source`, `__subject`, `__type`, and `__time`.
536
+
537
+ Conversion notes:
538
+
539
+ - The emitted Iceberg file contains only the schema, no data rows.
540
+ - The tool only supports writing Iceberg files for Avrotize Schema that describe a single `record` type. If the Avrotize Schema contains a top-level union, the `--record-type` option must be used to specify which record type to emit.
541
+ - The fields of the record are mapped to columns in the Iceberg file. Array and record fields are mapped to Iceberg nested types. Avro type unions are mapped to structures, not to Iceberg unions since those are not supported by the PyArrow library used here.
542
+
543
+ ### Convert Parquet schema to Avrotize Schema
544
+
545
+ ```bash
546
+ avrotize pq2a <path_to_parquet_file> [--out <path_to_avro_schema_file>] [--namespace <avro_schema_namespace>]
547
+ ```
548
+
549
+ Parameters:
550
+
551
+ - `<path_to_parquet_file>`: The path to the Parquet file to be converted. If omitted, the file is read from stdin.
552
+ - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
553
+ - `--namespace`: (optional) The namespace to use in the Avrotize Schema if the Parquet file does not define a namespace.
554
+
555
+ Conversion notes:
556
+
557
+ - The tool reads the schema from the Parquet file and converts it to Avrotize Schema. The data in the Parquet file is not read or converted.
558
+ - The fields of the Parquet schema are mapped to fields in the Avrotize Schema. Nested fields are mapped to nested records in the Avrotize Schema.
559
+
560
+ ### Convert CSV file to Avrotize Schema
561
+
562
+ ```bash
563
+ avrotize csv2a <path_to_csv_file> [--out <path_to_avro_schema_file>] [--namespace <avro_schema_namespace>]
564
+ ```
565
+
566
+ Parameters:
567
+
568
+ - `<path_to_csv_file>`: The path to the CSV file to be converted. If omitted, the file is read from stdin.
569
+ - `--out`: The path to the Avrotize Schema file to write the conversion result to. If omitted, the output is directed to stdout.
570
+ - `--namespace`: (optional) The namespace to use in the Avrotize Schema if the CSV file does not define a namespace.
571
+
572
+ Conversion notes:
573
+
574
+ - The tool reads the CSV file and converts it to Avrotize Schema. The first row of the CSV file is assumed to be the header row, containing the field names.
575
+ - The fields of the CSV file are mapped to fields in the Avrotize Schema. The tool infers the types of the fields from the data in the CSV file.
576
+
577
+ ### Convert Kafka Connect Schema to Avrotize Schema
578
+
579
+ ```bash
580
+ avrotize kstruct2a [input] --out <path_to_avro_schema_file>
581
+ ```
582
+
583
+ Parameters:
584
+
585
+ - `input`: The path to the Kafka Struct file to be converted (or read from stdin if omitted).
586
+ - `--out`: The path to the Avrotize Schema file to write the conversion result to.
587
+ - `--kstruct`: Deprecated: The path to the Kafka Struct file (for backward compatibility).
588
+
589
+ Conversion notes:
590
+
591
+ - The tool converts the Kafka Struct definition to an Avrotize Schema, mapping Kafka data types to their Avro equivalents.
592
+ - Kafka Structs are typically used to define data structures for Kafka Connect and other Kafka-based applications. This command facilitates interoperability by enabling the conversion of these definitions into Avro, which can be further used with various serialization and schema registry tools.
593
+
594
+ ### Convert Avrotize Schema to C# classes
595
+
596
+ ```bash
597
+ avrotize a2cs <path_to_avro_schema_file> [--out <path_to_csharp_dir>] [--namespace <csharp_namespace>] [--avro-annotation] [--system_text_json_annotation] [--newtonsoft-json-annotation] [--pascal-properties]
598
+ ```
599
+
600
+ Parameters:
601
+
602
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
603
+ - `--out`: The path to the directory to write the C# classes to. Required.
604
+ - `--namespace`: (optional) The namespace to use in the C# classes.
605
+ - `--avro-annotation`: (optional) Use Avro annotations.
606
+ - `--system_text_json_annotation`: (optional) Use System.Text.Json annotations.
607
+ - `--newtonsoft-json-annotation`: (optional) Use Newtonsoft.Json annotations.
608
+ - `--pascal-properties`: (optional) Use PascalCase properties.
609
+
610
+ Conversion notes:
611
+
612
+ - The tool generates C# classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a C# class.
613
+ - The fields of the record are mapped to properties in the C# class. Nested records are mapped to nested classes in the C# class.
614
+ - The tool supports adding annotations to the properties in the C# class. The `--avro-annotation` option adds Avro annotations, the `--system_text_json_annotation` option adds System.Text.Json annotations, and the `--newtonsoft-json-annotation` option adds Newtonsoft.Json annotations.
615
+ - The `--pascal-properties` option changes the naming convention of the properties to PascalCase.
616
+
617
+ ### Convert Avrotize Schema to Java classes
618
+
619
+ ```bash
620
+ avrotize a2java <path_to_avro_schema_file> [--out <path_to_java_dir>] [--package <java_package>] [--avro-annotation] [--jackson-annotation] [--pascal-properties]
621
+ ```
622
+
623
+ Parameters:
624
+
625
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
626
+ - `--out`: The path to the directory to write the Java classes to. Required.
627
+ - `--package`: (optional) The package to use in the Java classes.
628
+ - `--avro-annotation`: (optional) Use Avro annotations.
629
+ - `--jackson-annotation`: (optional) Use Jackson annotations.
630
+ - `--pascal-properties`: (optional) Use PascalCase properties.
631
+
632
+ Conversion notes:
633
+
634
+ - The tool generates Java classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Java class.
635
+ - The fields of the record are mapped to properties in the Java class. Nested records are mapped to nested classes in the Java class.
636
+ - The tool supports adding annotations to the properties in the Java class. The `--avro-annotation` option adds Avro annotations, and the `--jackson-annotation` option adds Jackson annotations.
637
+ - The `--pascal-properties` option changes the naming convention of the properties to PascalCase.
638
+
639
+ ### Convert Avrotize Schema to Python classes
640
+
641
+ ```bash
642
+ avrotize a2py <path_to_avro_schema_file> [--out <path_to_python_dir>] [--package <python_package>] [--dataclasses-json-annotation] [--avro-annotation]
643
+ ```
644
+
645
+ Parameters:
646
+
647
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
648
+ - `--out`: The path to the directory to write the Python classes to. Required.
649
+ - `--package`: (optional) The package to use in the Python classes.
650
+ - `--dataclasses-json-annotation`: (optional) Use dataclasses-json annotations.
651
+ - `--avro-annotation`: (optional) Use Avro annotations.
652
+
653
+ Conversion notes:
654
+
655
+ - The tool generates Python classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Python class.
656
+ - The fields of the record are mapped to properties in the Python class. Nested records are mapped to nested classes in the Python class.
657
+ - The tool supports adding annotations to the properties in the Python class. The `--dataclasses-json-annotation` option adds dataclasses-json annotations, and the `--avro-annotation` option adds Avro annotations.
658
+
659
+ ### Convert Avrotize Schema to TypeScript classes
660
+
661
+ ```bash
662
+ avrotize a2ts <path_to_avro_schema_file> [--out <path_to_typescript_dir>] [--package <typescript_package>] [--avro-annotation] [--typedjson-annotation]
663
+ ```
664
+
665
+ Parameters:
666
+
667
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
668
+ - `--out`: The path to the directory to write the TypeScript classes to. Required.
669
+ - `--package`: (optional) The package to use in the TypeScript classes.
670
+ - `--avro-annotation`: (optional) Use Avro annotations.
671
+ - `--typedjson-annotation`: (optional) Use TypedJSON annotations.
672
+
673
+ Conversion notes:
674
+
675
+ - The tool generates TypeScript classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a TypeScript class.
676
+ - The fields of the record are mapped to properties in the TypeScript class. Nested records are mapped to nested classes in the TypeScript class.
677
+ - The tool supports adding annotations to the properties in the TypeScript class. The `--avro-annotation` option adds Avro annotations, and the `--typedjson-annotation` option adds TypedJSON annotations.
678
+
679
+ ### Convert JSON Structure to TypeScript classes
680
+
681
+ ```bash
682
+ avrotize s2ts <path_to_structure_schema_file> [--out <path_to_typescript_dir>] [--package <typescript_package>] [--typedjson-annotation] [--avro-annotation]
683
+ ```
684
+
685
+ Parameters:
686
+
687
+ - `<path_to_structure_schema_file>`: The path to the JSON Structure schema file to be converted. If omitted, the file is read from stdin.
688
+ - `--out`: The path to the directory to write the TypeScript classes to. Required.
689
+ - `--package`: (optional) The TypeScript package name for the generated project.
690
+ - `--typedjson-annotation`: (optional) Use TypedJSON annotations for JSON serialization support.
691
+ - `--avro-annotation`: (optional) Add Avro binary serialization support with embedded Structure schema.
692
+
693
+ Conversion notes:
694
+
695
+ - The tool generates TypeScript classes from JSON Structure schema. Each object type in the JSON Structure schema is converted to a TypeScript class.
696
+ - Supports all JSON Structure Core types including:
697
+ - **Primitive types**: string, number, boolean, null
698
+ - **Extended types**: binary, int8-128, uint8-128, float8/float/double, decimal, date, datetime, time, duration, uuid, uri, jsonpointer
699
+ - **Compound types**: object, array, set, map, tuple, any, choice (unions)
700
+ - JSON Structure features are supported:
701
+ - **$ref references**: Type references are resolved and generated as separate classes
702
+ - **$extends inheritance**: Base class properties are included in derived classes
703
+ - **$offers/$uses add-ins**: Add-in properties are merged into classes that use them
704
+ - **Abstract types**: Marked with `abstract` keyword in TypeScript
705
+ - **Required/optional properties**: Required properties are non-nullable, optional properties are nullable
706
+ - **Choice types**: Converted to TypeScript union types
707
+ - The generated project includes:
708
+ - TypeScript source files in `src/` directory
709
+ - `package.json` with dependencies
710
+ - `tsconfig.json` for TypeScript compilation
711
+ - `.gitignore` file
712
+ - `index.ts` for exporting all generated types
713
+ - The TypeScript code can be compiled using `npm run build` (requires `npm install` first)
714
+ - For more details on JSON Structure handling, see [jsonstructure.md](jsonstructure.md)
715
+
716
+ ### Convert Avrotize Schema to JavaScript classes
717
+
718
+ ```bash
719
+ avrotize a2js <path_to_avro_schema_file> [--out <path_to_javascript_dir>] [--package <javascript_package>] [--avro-annotation]
720
+ ```
721
+
722
+ Parameters:
723
+
724
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
725
+ - `--out`: The path to the directory to write the JavaScript classes to. Required.
726
+ - `--package`: (optional) The package to use in the JavaScript classes.
727
+ - `--avro-annotation`: (optional) Use Avro annotations.
728
+
729
+ Conversion notes:
730
+
731
+ - The tool generates JavaScript classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a JavaScript class.
732
+ - The fields of the record are mapped to properties in the JavaScript class. Nested records are mapped to nested classes in the JavaScript class.
733
+ - The tool supports adding annotations to the properties in the JavaScript class. The `--avro-annotation` option adds Avro annotations.
734
+
735
+ ### Convert Avrotize Schema to C++ classes
736
+
737
+ ```bash
738
+ avrotize a2cpp <path_to_avro_schema_file> [--out <path_to_cpp_dir>] [--namespace <cpp_namespace>] [--avro-annotation] [--json-annotation]
739
+ ```
740
+
741
+ Parameters:
742
+
743
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
744
+ - `--out`: The path to the directory to write the C++ classes to. Required.
745
+ - `--namespace`: (optional) The namespace to use in the C++ classes.
746
+ - `--avro-annotation`: (optional) Use Avro annotations.
747
+ - `--json-annotation`: (optional) Use JSON annotations.
748
+
749
+ Conversion notes:
750
+
751
+ - The tool generates C++ classes from the Avrotize Schema. Each record type in the Av
752
+
753
+ rotize Schema is converted to a C++ class.
754
+
755
+ - The fields of the record are mapped to properties in the C++ class. Nested records are mapped to nested classes in the C++ class.
756
+ - The tool supports adding annotations to the properties in the C++ class. The `--avro-annotation` option adds Avro annotations, and the `--json-annotation` option adds JSON annotations.
757
+
758
+ ### Convert Avrotize Schema to Go classes
759
+
760
+ ```bash
761
+ avrotize a2go <path_to_avro_schema_file> [--out <path_to_go_dir>] [--package <go_package>] [--avro-annotation] [--json-annotation] [--package-site <go_package_site>] [--package-username <go_package_username>]
762
+ ```
763
+
764
+ Parameters:
765
+
766
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
767
+ - `--out`: The path to the directory to write the Go classes to. Required.
768
+ - `--package`: (optional) The package to use in the Go classes.
769
+ - `--package-site`: (optional) The package site to use in the Go classes.
770
+ - `--package-username`: (optional) The package username to use in the Go classes.
771
+ - `--avro-annotation`: (optional) Use Avro annotations.
772
+ - `--json-annotation`: (optional) Use JSON annotations.
773
+
774
+ Conversion notes:
775
+
776
+ - The tool generates Go classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Go class.
777
+ - The fields of the record are mapped to properties in the Go class. Nested records are mapped to nested classes in the Go class.
778
+ - The tool supports adding annotations to the properties in the Go class. The `--avro-annotation` option adds Avro annotations, and the `--json-annotation` option adds JSON annotations.
779
+
780
+ ### Convert Avrotize Schema to Rust classes
781
+
782
+ ```bash
783
+ avrotize a2rust <path_to_avro_schema_file> [--out <path_to_rust_dir>] [--package <rust_package>] [--avro-annotation] [--serde-annotation]
784
+ ```
785
+
786
+ Parameters:
787
+
788
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
789
+ - `--out`: The path to the directory to write the Rust classes to. Required.
790
+ - `--package`: (optional) The package to use in the Rust classes.
791
+ - `--avro-annotation`: (optional) Use Avro annotations.
792
+ - `--serde-annotation`: (optional) Use Serde annotations.
793
+
794
+ Conversion notes:
795
+
796
+ - The tool generates Rust classes from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Rust class.
797
+ - The fields of the record are mapped to properties in the Rust class. Nested records are mapped to nested classes in the Rust class.
798
+ - The tool supports adding annotations to the properties in the Rust class. The `--avro-annotation` option adds Avro annotations, and the `--serde-annotation` option adds Serde annotations.
799
+
800
+ ### Convert Avrotize Schema to Datapackage schema
801
+
802
+ ```bash
803
+ avrotize a2dp <path_to_avro_schema_file> [--out <path_to_datapackage_file>] [--record-type <record-type-from-avro>]
804
+ ```
805
+
806
+ Parameters:
807
+
808
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
809
+ - `--out`: The path to the Datapackage schema file to write the conversion result to. If omitted, the output is directed to stdout.
810
+ - `--record-type`: (optional) The name of the Avro record type to convert to a Datapackage schema.
811
+
812
+ Conversion notes:
813
+
814
+ - The tool generates a Datapackage schema from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Datapackage resource.
815
+ - The fields of the record are mapped to fields in the Datapackage resource. Nested records are mapped to nested resources in the Datapackage.
816
+
817
+ ### Convert Avrotize Schema to Markdown documentation
818
+
819
+ ```bash
820
+ avrotize a2md <path_to_avro_schema_file> [--out <path_to_markdown_file>]
821
+ ```
822
+
823
+ Parameters:
824
+
825
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
826
+ - `--out`: The path to the Markdown file to write the conversion result to. If omitted, the output is directed to stdout.
827
+
828
+ Conversion notes:
829
+
830
+ - The tool generates Markdown documentation from the Avrotize Schema. Each record type in the Avrotize Schema is converted to a Markdown section.
831
+ - The fields of the record are documented in a table in the Markdown section. Nested records are documented in nested sections in the Markdown file.
832
+
833
+ ### Create the Parsing Canonical Form (PCF) of an Avrotize schema
834
+
835
+ ```bash
836
+ avrotize pcf <path_to_avro_schema_file>
837
+ ```
838
+
839
+ Parameters:
840
+
841
+ - `<path_to_avro_schema_file>`: The path to the Avrotize Schema file to be converted. If omitted, the file is read from stdin.
842
+
843
+ Conversion notes:
844
+
845
+ - The tool generates the Parsing Canonical Form (PCF) of the Avrotize Schema. The PCF is a normalized form of the schema that is used for schema comparison and compatibility checking.
846
+ - The PCF is a JSON object that is written to stdout.
847
+
848
+ This document provides an overview of the usage and functionality of Avrotize. For more detailed information, please refer to the [Avrotize Schema documentation](specs/avrotize-schema.md) and the individual command help messages.