cdxf 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
cdxf-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Muntaser Syed
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
cdxf-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,157 @@
1
+ Metadata-Version: 2.4
2
+ Name: cdxf
3
+ Version: 0.1.0
4
+ Summary: Compact Data Exchange Format -- universal binary interchange for JSON, YAML, XML, and TOML
5
+ Author-email: Muntaser Syed <jemsbhai@gmail.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/jemsbhai/cdxf
8
+ Project-URL: Repository, https://github.com/jemsbhai/cdxf
9
+ Project-URL: Issues, https://github.com/jemsbhai/cdxf/issues
10
+ Keywords: serialization,binary,interchange,json,yaml,xml,toml,cbor
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
20
+ Classifier: Topic :: File Formats
21
+ Requires-Python: >=3.10
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Requires-Dist: cbor2>=5.4
25
+ Requires-Dist: ruamel.yaml>=0.17
26
+ Requires-Dist: tomlkit>=0.11
27
+ Dynamic: license-file
28
+
29
+ # CDXF — Compact Data Exchange Format
30
+
31
+ A universal binary interchange format whose information model is a provable superset of JSON, YAML, XML, and TOML, enabling lossless round-trip encoding of documents from any of these format families.
32
+
33
+ ## Status
34
+
35
+ **Alpha.** The specification and reference implementation are functional with 408 tests passing. All four format bridges (JSON, YAML, XML, TOML) are complete with full round-trip fidelity verified on a 43-file benchmark corpus.
36
+
37
+ ## Motivation
38
+
39
+ Every existing binary serialization format is anchored to a single text format's information model. CBOR and MessagePack encode the JSON data model. EXI and Fast Infoset encode the XML Information Set. Nothing encodes YAML's graph-structured representation or TOML's typed configuration model. And critically, no binary format preserves all of them through a single unified information model.
40
+
41
+ CDXF fills that gap.
42
+
43
+ ## What CDXF preserves that others lose
44
+
45
+ Empirically verified via automated tests (EXP-001 Feature Preservation Matrix):
46
+
47
+ | Construct | CDXF | CBOR | MsgPack | BSON | Ion |
48
+ |---|:---:|:---:|:---:|:---:|:---:|
49
+ | Map key order | ✓ | ✓ | ✓ | ✓ | ✗ |
50
+ | Non-string map keys | ✓ | ✓ | ✗ | ✗ | ✗ |
51
+ | Comments | ✓ | ✗ | ✗ | ✗ | ✗ |
52
+ | Anchors/Aliases (graph) | ✓ | ✗ | ✗ | ✗ | ✗ |
53
+ | Merge keys | ✓ | ✗ | ✗ | ✗ | ✗ |
54
+ | Multi-document streams | ✓ | ✗ | ✗ | ✗ | ✗ |
55
+ | XML elements/attributes | ✓ | ✗ | ✗ | ✗ | ✗ |
56
+ | XML namespaces | ✓ | ✗ | ✗ | ✗ | ✗ |
57
+ | XML mixed content | ✓ | ✗ | ✗ | ✗ | ✗ |
58
+ | Processing instructions | ✓ | ✗ | ✗ | ✗ | ✗ |
59
+ | Typed timestamps | ✓ | ✓ | ✗ | ✓ | ✗ |
60
+ | Typed date/time (local) | ✓ | ✗ | ✗ | ✗ | ✗ |
61
+ | **Total** | **12/12** | **3/12** | **1/12** | **2/12** | **0/12** |
62
+
63
+ ## Installation
64
+
65
+ ```bash
66
+ pip install cdxf
67
+ ```
68
+
69
+ ## CLI Usage
70
+
71
+ ```bash
72
+ # Encode a JSON file to CDXF binary
73
+ cdxf encode config.json # -> config.cdxf
74
+
75
+ # Decode back to text (auto-detects source format)
76
+ cdxf decode config.cdxf # -> config.json
77
+
78
+ # Convert between formats via CDXF
79
+ cdxf convert data.yaml --to json # -> data.json
80
+ cdxf convert config.json --to toml # -> config.toml
81
+
82
+ # Inspect a file
83
+ cdxf info config.json
84
+ # File: config.json
85
+ # Type: JSON text
86
+ # Text size: 2,946 bytes
87
+ # CDXF size: 2,001 bytes
88
+ # Ratio: 0.679
89
+ # Node counts: {'Map': 9, 'Scalar': 114, 'Sequence': 2}
90
+ ```
91
+
92
+ ## Python API
93
+
94
+ ```python
95
+ from cdxf.bridges import from_json, to_json, from_yaml, to_yaml
96
+ from cdxf.bridges import from_xml, to_xml, from_toml, to_toml
97
+ from cdxf.codec import encode, decode
98
+
99
+ # JSON round-trip through binary
100
+ stream = from_json('{"name": "Alice", "age": 30}')
101
+ binary = encode(stream) # compact CBOR-based binary
102
+ restored = decode(binary)
103
+ print(to_json(restored)) # '{"name": "Alice", "age": 30}'
104
+
105
+ # Cross-format conversion: YAML → CDXF → JSON
106
+ stream = from_yaml("name: Alice\nage: 30\n")
107
+ print(to_json(stream)) # '{"name": "Alice", "age": 30}'
108
+
109
+ # YAML with anchors and comments — preserved through binary
110
+ yaml_doc = """
111
+ # Server defaults
112
+ defaults: &defaults
113
+ timeout: 30
114
+ retries: 3
115
+ production:
116
+ <<: *defaults
117
+ timeout: 60
118
+ """
119
+ stream = from_yaml(yaml_doc)
120
+ binary = encode(stream) # anchors, comments, merge keys preserved
121
+ restored = decode(binary)
122
+ print(to_yaml(restored)) # anchors and comments survive
123
+
124
+ # XML with namespaces and mixed content
125
+ xml_doc = '<p xmlns="http://www.w3.org/1999/xhtml">Hello <b>world</b>!</p>'
126
+ stream = from_xml(xml_doc)
127
+ binary = encode(stream) # namespaces, mixed content preserved
128
+ restored = decode(binary)
129
+ print(to_xml(restored)) # faithful reconstruction
130
+ ```
131
+
132
+ ## Size Efficiency
133
+
134
+ Median CDXF/text size ratios on a 43-file benchmark corpus:
135
+
136
+ | Format | Median | Interpretation |
137
+ |---|---|---|
138
+ | JSON | 0.66 | 34% smaller than text |
139
+ | TOML | 0.75 | 25% smaller |
140
+ | YAML | 0.82 | 18% smaller |
141
+ | XML | 1.26 | 26% larger (namespace URIs stored explicitly) |
142
+
143
+ For JSON data, CDXF shorthand mode produces byte-identical output to standard CBOR — zero overhead.
144
+
145
+ ## Documentation
146
+
147
+ - [`docs/information_model.md`](docs/information_model.md) — Formal specification: 9 node kinds, 2 conformance levels, format mapping proofs
148
+ - [`docs/binary_encoding.md`](docs/binary_encoding.md) — CBOR-based wire format with 16 semantic tags
149
+ - [`docs/literature_survey_universal_binary_interchange.md`](docs/literature_survey_universal_binary_interchange.md) — Comprehensive gap analysis of existing binary formats
150
+
151
+ ## License
152
+
153
+ MIT. See [LICENSE](LICENSE).
154
+
155
+ ## Author
156
+
157
+ Muntaser Syed ([@jemsbhai](https://github.com/jemsbhai)) — Florida Institute of Technology
cdxf-0.1.0/README.md ADDED
@@ -0,0 +1,129 @@
1
+ # CDXF — Compact Data Exchange Format
2
+
3
+ A universal binary interchange format whose information model is a provable superset of JSON, YAML, XML, and TOML, enabling lossless round-trip encoding of documents from any of these format families.
4
+
5
+ ## Status
6
+
7
+ **Alpha.** The specification and reference implementation are functional with 408 tests passing. All four format bridges (JSON, YAML, XML, TOML) are complete with full round-trip fidelity verified on a 43-file benchmark corpus.
8
+
9
+ ## Motivation
10
+
11
+ Every existing binary serialization format is anchored to a single text format's information model. CBOR and MessagePack encode the JSON data model. EXI and Fast Infoset encode the XML Information Set. Nothing encodes YAML's graph-structured representation or TOML's typed configuration model. And critically, no binary format preserves all of them through a single unified information model.
12
+
13
+ CDXF fills that gap.
14
+
15
+ ## What CDXF preserves that others lose
16
+
17
+ Empirically verified via automated tests (EXP-001 Feature Preservation Matrix):
18
+
19
+ | Construct | CDXF | CBOR | MsgPack | BSON | Ion |
20
+ |---|:---:|:---:|:---:|:---:|:---:|
21
+ | Map key order | ✓ | ✓ | ✓ | ✓ | ✗ |
22
+ | Non-string map keys | ✓ | ✓ | ✗ | ✗ | ✗ |
23
+ | Comments | ✓ | ✗ | ✗ | ✗ | ✗ |
24
+ | Anchors/Aliases (graph) | ✓ | ✗ | ✗ | ✗ | ✗ |
25
+ | Merge keys | ✓ | ✗ | ✗ | ✗ | ✗ |
26
+ | Multi-document streams | ✓ | ✗ | ✗ | ✗ | ✗ |
27
+ | XML elements/attributes | ✓ | ✗ | ✗ | ✗ | ✗ |
28
+ | XML namespaces | ✓ | ✗ | ✗ | ✗ | ✗ |
29
+ | XML mixed content | ✓ | ✗ | ✗ | ✗ | ✗ |
30
+ | Processing instructions | ✓ | ✗ | ✗ | ✗ | ✗ |
31
+ | Typed timestamps | ✓ | ✓ | ✗ | ✓ | ✗ |
32
+ | Typed date/time (local) | ✓ | ✗ | ✗ | ✗ | ✗ |
33
+ | **Total** | **12/12** | **3/12** | **1/12** | **2/12** | **0/12** |
34
+
35
+ ## Installation
36
+
37
+ ```bash
38
+ pip install cdxf
39
+ ```
40
+
41
+ ## CLI Usage
42
+
43
+ ```bash
44
+ # Encode a JSON file to CDXF binary
45
+ cdxf encode config.json # -> config.cdxf
46
+
47
+ # Decode back to text (auto-detects source format)
48
+ cdxf decode config.cdxf # -> config.json
49
+
50
+ # Convert between formats via CDXF
51
+ cdxf convert data.yaml --to json # -> data.json
52
+ cdxf convert config.json --to toml # -> config.toml
53
+
54
+ # Inspect a file
55
+ cdxf info config.json
56
+ # File: config.json
57
+ # Type: JSON text
58
+ # Text size: 2,946 bytes
59
+ # CDXF size: 2,001 bytes
60
+ # Ratio: 0.679
61
+ # Node counts: {'Map': 9, 'Scalar': 114, 'Sequence': 2}
62
+ ```
63
+
64
+ ## Python API
65
+
66
+ ```python
67
+ from cdxf.bridges import from_json, to_json, from_yaml, to_yaml
68
+ from cdxf.bridges import from_xml, to_xml, from_toml, to_toml
69
+ from cdxf.codec import encode, decode
70
+
71
+ # JSON round-trip through binary
72
+ stream = from_json('{"name": "Alice", "age": 30}')
73
+ binary = encode(stream) # compact CBOR-based binary
74
+ restored = decode(binary)
75
+ print(to_json(restored)) # '{"name": "Alice", "age": 30}'
76
+
77
+ # Cross-format conversion: YAML → CDXF → JSON
78
+ stream = from_yaml("name: Alice\nage: 30\n")
79
+ print(to_json(stream)) # '{"name": "Alice", "age": 30}'
80
+
81
+ # YAML with anchors and comments — preserved through binary
82
+ yaml_doc = """
83
+ # Server defaults
84
+ defaults: &defaults
85
+ timeout: 30
86
+ retries: 3
87
+ production:
88
+ <<: *defaults
89
+ timeout: 60
90
+ """
91
+ stream = from_yaml(yaml_doc)
92
+ binary = encode(stream) # anchors, comments, merge keys preserved
93
+ restored = decode(binary)
94
+ print(to_yaml(restored)) # anchors and comments survive
95
+
96
+ # XML with namespaces and mixed content
97
+ xml_doc = '<p xmlns="http://www.w3.org/1999/xhtml">Hello <b>world</b>!</p>'
98
+ stream = from_xml(xml_doc)
99
+ binary = encode(stream) # namespaces, mixed content preserved
100
+ restored = decode(binary)
101
+ print(to_xml(restored)) # faithful reconstruction
102
+ ```
103
+
104
+ ## Size Efficiency
105
+
106
+ Median CDXF/text size ratios on a 43-file benchmark corpus:
107
+
108
+ | Format | Median | Interpretation |
109
+ |---|---|---|
110
+ | JSON | 0.66 | 34% smaller than text |
111
+ | TOML | 0.75 | 25% smaller |
112
+ | YAML | 0.82 | 18% smaller |
113
+ | XML | 1.26 | 26% larger (namespace URIs stored explicitly) |
114
+
115
+ For JSON data, CDXF shorthand mode produces byte-identical output to standard CBOR — zero overhead.
116
+
117
+ ## Documentation
118
+
119
+ - [`docs/information_model.md`](docs/information_model.md) — Formal specification: 9 node kinds, 2 conformance levels, format mapping proofs
120
+ - [`docs/binary_encoding.md`](docs/binary_encoding.md) — CBOR-based wire format with 16 semantic tags
121
+ - [`docs/literature_survey_universal_binary_interchange.md`](docs/literature_survey_universal_binary_interchange.md) — Comprehensive gap analysis of existing binary formats
122
+
123
+ ## License
124
+
125
+ MIT. See [LICENSE](LICENSE).
126
+
127
+ ## Author
128
+
129
+ Muntaser Syed ([@jemsbhai](https://github.com/jemsbhai)) — Florida Institute of Technology
@@ -0,0 +1,47 @@
1
+ [build-system]
2
+ requires = ["setuptools>=68.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "cdxf"
7
+ version = "0.1.0"
8
+ description = "Compact Data Exchange Format -- universal binary interchange for JSON, YAML, XML, and TOML"
9
+ readme = "README.md"
10
+ license = {text = "MIT"}
11
+ requires-python = ">=3.10"
12
+ authors = [
13
+ {name = "Muntaser Syed", email = "jemsbhai@gmail.com"},
14
+ ]
15
+ keywords = ["serialization", "binary", "interchange", "json", "yaml", "xml", "toml", "cbor"]
16
+ classifiers = [
17
+ "Development Status :: 3 - Alpha",
18
+ "Intended Audience :: Developers",
19
+ "License :: OSI Approved :: MIT License",
20
+ "Programming Language :: Python :: 3",
21
+ "Programming Language :: Python :: 3.10",
22
+ "Programming Language :: Python :: 3.11",
23
+ "Programming Language :: Python :: 3.12",
24
+ "Programming Language :: Python :: 3.13",
25
+ "Topic :: Software Development :: Libraries :: Python Modules",
26
+ "Topic :: File Formats",
27
+ ]
28
+ dependencies = [
29
+ "cbor2>=5.4",
30
+ "ruamel.yaml>=0.17",
31
+ "tomlkit>=0.11",
32
+ ]
33
+
34
+ [project.scripts]
35
+ cdxf = "cdxf.cli:entry_point"
36
+
37
+ [project.urls]
38
+ Homepage = "https://github.com/jemsbhai/cdxf"
39
+ Repository = "https://github.com/jemsbhai/cdxf"
40
+ Issues = "https://github.com/jemsbhai/cdxf/issues"
41
+
42
+ [tool.setuptools.packages.find]
43
+ where = ["src"]
44
+
45
+ [tool.pytest.ini_options]
46
+ testpaths = ["tests"]
47
+ addopts = "-v"
cdxf-0.1.0/setup.cfg ADDED
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,51 @@
1
+ """CDXF -- Compact Data Exchange Format.
2
+
3
+ A universal binary interchange format whose information model is a provable
4
+ superset of JSON, YAML, XML, and TOML.
5
+ """
6
+
7
+ __version__ = "0.1.0"
8
+
9
+ from cdxf.model import (
10
+ Stream,
11
+ Document,
12
+ Map,
13
+ Sequence,
14
+ Scalar,
15
+ Element,
16
+ Attribute,
17
+ Comment,
18
+ ProcessingInstruction,
19
+ Directive,
20
+ TagAnnotation,
21
+ Anchor,
22
+ Alias,
23
+ ScalarType,
24
+ SourceFormat,
25
+ )
26
+
27
+ from cdxf.codec import encode, decode, Encoder, Decoder
28
+
29
+ __all__ = [
30
+ # Model
31
+ "Stream",
32
+ "Document",
33
+ "Map",
34
+ "Sequence",
35
+ "Scalar",
36
+ "Element",
37
+ "Attribute",
38
+ "Comment",
39
+ "ProcessingInstruction",
40
+ "Directive",
41
+ "TagAnnotation",
42
+ "Anchor",
43
+ "Alias",
44
+ "ScalarType",
45
+ "SourceFormat",
46
+ # Codec
47
+ "encode",
48
+ "decode",
49
+ "Encoder",
50
+ "Decoder",
51
+ ]
@@ -0,0 +1,13 @@
1
+ """CDXF format bridges — convert between text formats and CDXF model."""
2
+
3
+ from cdxf.bridges.json_bridge import from_json, to_json
4
+ from cdxf.bridges.yaml_bridge import from_yaml, to_yaml
5
+ from cdxf.bridges.xml_bridge import from_xml, to_xml
6
+ from cdxf.bridges.toml_bridge import from_toml, to_toml
7
+
8
+ __all__ = [
9
+ "from_json", "to_json",
10
+ "from_yaml", "to_yaml",
11
+ "from_xml", "to_xml",
12
+ "from_toml", "to_toml",
13
+ ]
@@ -0,0 +1,122 @@
1
+ """JSON bridge — convert between JSON text and CDXF model.
2
+
3
+ Functions:
4
+ from_json(text) -> Stream
5
+ to_json(stream, indent=None) -> str
6
+ """
7
+
8
+ from __future__ import annotations
9
+
10
+ import json
11
+ from typing import Any
12
+
13
+ from cdxf.model import (
14
+ Comment,
15
+ Document,
16
+ Map,
17
+ Scalar,
18
+ ScalarType,
19
+ Sequence,
20
+ SourceFormat,
21
+ Stream,
22
+ )
23
+
24
+
25
+ def from_json(text: str) -> Stream:
26
+ """Parse JSON text into a CDXF Stream.
27
+
28
+ Parameters
29
+ ----------
30
+ text : str
31
+ Valid JSON text.
32
+
33
+ Returns
34
+ -------
35
+ Stream
36
+ A single-document CDXF Stream with source_format_hint=JSON.
37
+ """
38
+ raw = json.loads(text)
39
+ root = _from_native(raw)
40
+ doc = Document(root=root, source_format_hint=SourceFormat.JSON)
41
+ return Stream(documents=[doc])
42
+
43
+
44
+ def to_json(stream: Stream, *, indent: int | None = None) -> str:
45
+ """Convert a CDXF Stream to JSON text.
46
+
47
+ Uses the first document in the stream. Comments and CDXF-specific
48
+ annotations are silently dropped (JSON cannot represent them).
49
+
50
+ Parameters
51
+ ----------
52
+ stream : Stream
53
+ A CDXF Stream.
54
+ indent : int or None
55
+ If set, pretty-print with this indentation level.
56
+
57
+ Returns
58
+ -------
59
+ str
60
+ Valid JSON text.
61
+ """
62
+ if not stream.documents:
63
+ return "null"
64
+ root = stream.documents[0].root
65
+ native = _to_native(root)
66
+ return json.dumps(native, indent=indent, ensure_ascii=False)
67
+
68
+
69
+ # -------------------------------------------------------------------
70
+ # Internal: Python native types ↔ CDXF model
71
+ # -------------------------------------------------------------------
72
+
73
+ def _from_native(value: Any) -> Scalar | Map | Sequence:
74
+ """Convert a Python value (from json.loads) to a CDXF node."""
75
+ if value is None:
76
+ return Scalar(ScalarType.NULL, None)
77
+ if isinstance(value, bool):
78
+ return Scalar(ScalarType.BOOLEAN, value)
79
+ if isinstance(value, int):
80
+ return Scalar(ScalarType.INTEGER, value)
81
+ if isinstance(value, float):
82
+ return Scalar(ScalarType.FLOAT, value)
83
+ if isinstance(value, str):
84
+ return Scalar(ScalarType.STRING, value)
85
+ if isinstance(value, dict):
86
+ entries = []
87
+ for k, v in value.items():
88
+ key_node = Scalar(ScalarType.STRING, k)
89
+ value_node = _from_native(v)
90
+ entries.append((key_node, value_node))
91
+ return Map(entries=entries)
92
+ if isinstance(value, list):
93
+ items = [_from_native(item) for item in value]
94
+ return Sequence(items=items)
95
+ raise ValueError(f"Unsupported JSON value type: {type(value)}")
96
+
97
+
98
+ def _to_native(node) -> Any:
99
+ """Convert a CDXF node to a Python native value for json.dumps."""
100
+ if isinstance(node, Scalar):
101
+ return node.value
102
+ if isinstance(node, Map):
103
+ result = {}
104
+ for entry in node.entries:
105
+ # Skip comments — JSON can't represent them
106
+ if isinstance(entry, Comment):
107
+ continue
108
+ key, value = entry
109
+ # JSON keys must be strings
110
+ key_str = key.value if isinstance(key, Scalar) else str(key)
111
+ result[key_str] = _to_native(value)
112
+ return result
113
+ if isinstance(node, Sequence):
114
+ return [
115
+ _to_native(item)
116
+ for item in node.items
117
+ if not isinstance(item, Comment)
118
+ ]
119
+ # Fallback for types JSON can't represent
120
+ if hasattr(node, "value"):
121
+ return node.value
122
+ return str(node)