dmx-compress 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,401 @@
1
+ Metadata-Version: 2.4
2
+ Name: dmx-compress
3
+ Version: 0.3.0
4
+ Summary: DMX — Delta Multiplexed Model Format. Near-lossless neural network weight compression.
5
+ Author-email: "William J. Riley" <bill.riley@gmail.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/willjriley/dmx
8
+ Project-URL: Repository, https://github.com/willjriley/dmx
9
+ Project-URL: Issues, https://github.com/willjriley/dmx/issues
10
+ Keywords: compression,neural-network,model-compression,checkpoint,safetensors,deep-learning
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
19
+ Classifier: Topic :: System :: Archiving :: Compression
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ Requires-Dist: torch>=2.0
23
+ Requires-Dist: numpy
24
+ Requires-Dist: zstandard
25
+ Requires-Dist: safetensors
26
+ Provides-Extra: gpu
27
+ Requires-Dist: torch>=2.0; extra == "gpu"
28
+ Provides-Extra: lpc
29
+
30
+ # DMX — Delta Multiplexed Model Format
31
+
32
+ **A new compression format for neural network weights.**
33
+
34
+ ```
35
+ Original: 9.1 GB (SVD-XT, FP32 — 80% includes FP32→FP16 conversion)
36
+ DMX: 1.8 GB
37
+ ```
38
+
39
+ ```
40
+ Original: 7.2 GB (Wan 2.2 14B shard, FP32 — 79.5% includes FP32→FP16)
41
+ DMX: 1.5 GB (142/142 tensors verified)
42
+ ```
43
+
44
+ ```
45
+ Original: 16 GB (Llama 3 8B, FP16 — 55% pure FP16 compression)
46
+ DMX: ~7.2 GB (+0.16% perplexity on wikitext-2)
47
+ ```
48
+
49
+ ### Try it now
50
+
51
+ ```bash
52
+ pip install dmx-compress
53
+ dmx compress your_model.safetensors compressed.dmx
54
+ ```
55
+
56
+ ### Download pre-compressed models now
57
+
58
+ | Model | Original | DMX | Savings | Verified |
59
+ |-------|----------|-----|---------|----------|
60
+ | [Wan 1.3B](https://huggingface.co/Senat1/dmx-wan-1.3b) | 2.7 GB | 1.1 GB | 60% | 825/825 tensors |
61
+ | [Wan 2.2 shard](https://huggingface.co/Senat1/dmx-wan2.2-shard6) | 7.2 GB | 1.5 GB | 79.5% | 142/142 tensors |
62
+ | [SVD-XT](https://huggingface.co/Senat1/dmx-svd-xt) | 9.1 GB | 1.8 GB | 80% | Roundtrip verified |
63
+
64
+ ---
65
+
66
+ ### Why this matters for frontier training
67
+
68
+ **From first principles:** In high-precision training, a full checkpoint is effectively a near-complete copy of the model state — weights (BF16/FP32) plus optimizer states (often another 2-4x the size). Each save is massive. Teams are forced to make checkpoints sparse (every few thousand steps) to keep storage and I/O under control. This is not a bug in the tools — it is a direct consequence of the numbers involved.
69
+
70
+ **The operational reality:** Frontier training runs are now routinely $50M-$200M+ each. While accelerators dominate the budget, checkpoint storage, bandwidth, and recovery time are real, recurring costs. Infra teams track these "quiet" expenses closely.
71
+
72
+ **Where DMX changes the equation:** One high-precision baseline anchor + many tiny, exact integer deltas with zero error accumulation (see [Test 8](#zero-error-accumulation-in-delta-chains-test-8) below). 200 checkpoints of a Llama 70B-class model are projected to drop from ~28 TB raw to ~3 TB while remaining mathematically safe for resumption, branching, and analysis.
73
+
74
+ | Aspect | Current Status Quo | With DMX Delta Chains | Benefit |
75
+ |--------|-------------------|----------------------|---------|
76
+ | Checkpoint frequency | Sparse (forced by cost) | Dense and safe | Better science and debugging |
77
+ | Storage for 200 ckpts (70B) | ~28 TB | ~3 TB (projected) | ~9x reduction |
78
+ | Resumption fidelity | Full copy required | Exact integer chain | Zero accumulation error (measured) |
79
+ | Fine-tune distribution | Full copy per variant | Small delta per variant | 80% savings (measured on TinyLlama 1.1B) |
80
+
81
+ This doesn't reduce the dominant cost (compute), but it meaningfully lowers a real operational friction point that every large lab deals with. It gives researchers and engineers far more usable history than was previously practical.
82
+
83
+ ---
84
+
85
+ ## What is DMX?
86
+
87
+ DMX is a near-lossless post-training compression format for neural network weights, optimized for storage and distribution. It reduces model file sizes 55-80% while preserving model quality (+0.03-0.16% perplexity).
88
+
89
+ - **No retraining required** — compress any pretrained safetensors model
90
+ - **Reversible** — decompress back to the original format
91
+ - **Broad compatibility** — tested on LLMs, diffusion models, video models, encoder-decoder
92
+
93
+ ### Storage and transfer comparison
94
+
95
+ DMX is focused on reducing model size for storage and network transfer — not runtime inference. Here's how a 140 GB model (Llama 3 70B, FP16) compares across compression approaches:
96
+
97
+ | Method | Compressed Size | Savings | Quality Loss | Purpose |
98
+ |--------|----------------|---------|-------------|---------|
99
+ | **safetensors** | 140 GB | 0% | None | Original format |
100
+ | **gzip** | ~134 GB | ~4% | None | Generic compression (barely helps on floats) |
101
+ | **zstd-19** | ~129 GB | ~8% | None | Better generic compression (still limited) |
102
+ | **DFloat11** | ~98 GB | ~30% | None (lossless) | Lossless NN weight compression |
103
+ | **ZipNN** | ~94 GB | ~33% | None (lossless) | Lossless NN weight compression |
104
+ | **DMX M=7** | ~63 GB | ~55% | +0.03% PPL | **Near-original quality, high compression** |
105
+ | **DMX M=6** | ~56 GB | ~60% | +0.16% PPL | **Aggressive storage compression** |
106
+
107
+ For reference, quantized inference formats like GGUF Q8 (~50%) and Q4 (~75%) achieve similar or greater compression but are designed for a different purpose — running models directly at reduced precision with fused kernels. DMX and GGUF serve different needs and are not interchangeable.
108
+
109
+ If lossless is enough, use DFloat11 or ZipNN. If you need to run inference at lower precision, use GGUF. If you need high compression with near-original quality for storage and distribution, that's where DMX lives.
110
+
111
+ | Without DMX | With DMX |
112
+ |-------------|----------|
113
+ | Llama 3 70B: 140 GB download | ~36 GB download |
114
+ | 4-5 models on 1 TB | 10+ models on 1 TB |
115
+
116
+ ### Training & DevOps use cases
117
+
118
+ Beyond individual model compression, DMX's aligned quantization enables delta encoding between related model files — useful for training infrastructure and model distribution at scale.
119
+
120
+ | Use Case | How DMX Helps |
121
+ |----------|--------------|
122
+ | **Checkpoint storage** | Delta-compress consecutive checkpoints (87.3% measured savings on GPT-2, validated on TinyLlama 1.1B). Both near-lossless (int16) and practically lossless (int32, error below FP32 noise floor) modes available. |
123
+ | **Model distribution** | Distribute fine-tune variants as small deltas from a shared base model |
124
+ | **Crash recovery** | Smaller checkpoints = faster reload from storage after GPU failure |
125
+ | **Model versioning** | Aligned integer space enables meaningful diffs between model versions |
126
+
127
+ ### Why not just use existing versioning tools?
128
+
129
+ Every existing ML versioning tool treats model files as opaque blobs:
130
+
131
+ | Tool | Version tracking | Understands weight structure | Delta between versions |
132
+ |------|:-:|:-:|:-:|
133
+ | Git LFS / DVC | ✓ | ✗ | ✗ (full copy each version) |
134
+ | HuggingFace Hub | ✓ | ✗ | ✗ (full copy each version) |
135
+ | W&B / MLflow | ✓ | ✗ | ✗ (full copy each version) |
136
+ | xdelta (binary diff) | ✗ | ✗ | 8.5% savings (noise) |
137
+ | **DMX** | Planned | **✓** | **80-87% savings** |
138
+
139
+ The difference: subtracting two model files in raw float produces noise (IEEE 754 bit layout destroys numerical proximity). DMX's aligned quantization creates a coordinate system where subtraction produces clean, sparse integers — enabling meaningful diffs, efficient deltas, and 80-87% compression between related models.
140
+
141
+ These capabilities are under active development. See [Research Directions](#research-directions) for details and experimental results.
142
+
143
+ ## Key Properties
144
+
145
+ - **Up to 80% compression** on FP32 models (SVD-XT: 9.1 GB -> 1.8 GB, verified roundtrip)
146
+ - **60-74% compression** on FP16 models (Llama 3 8B, Mistral 7B, Wan 1.3B)
147
+ - **55-60% near-lossless compression** on FP32 models (GPT-2, Phi-2 — +0.12-0.22% PPL)
148
+ - **GPU-accelerated decompression**: 13.8x faster than CPU with `--gpu` flag
149
+ - **Tested on**: LLMs (GPT-2, Llama 3, TinyLlama), diffusion (Wan, SVD-XT), encoder-decoder (T5)
150
+ - **No training required**: pure post-training compression, works on any pretrained model
151
+
152
+ ## How It Works
153
+
154
+ ### BFP Mode (for FP16/BF16 models — recommended)
155
+ ```
156
+ Standard FP16: 16 bits per weight (5-bit exponent wasted on unused dynamic range)
157
+ DMX BFP: ~7 bits per weight (shared exponent per group + truncated mantissa + entropy coding)
158
+ ```
159
+
160
+ Trained weights cluster in a narrow magnitude range — 74% use only 3 of 31 possible exponents. DMX shares one exponent per group of 32 values, eliminating wasted dynamic range, then entropy-codes the mantissa stream.
161
+
162
+ ### int16 Mode (for FP32 models — near-lossless)
163
+ ```
164
+ Standard FP32: 32 bits per weight
165
+ DMX int16: ~13 bits per weight (aligned cross-layer quantization + entropy coding)
166
+ ```
167
+
168
+ Integer quantization as a preprocessing step (not a lossy final format) transforms float weights into a representation where entropy coding is effective. Aligned cross-layer quantization enforces a global coordinate system across layers, enabling structured compression.
169
+
170
+ ## Installation
171
+
172
+ ```bash
173
+ pip install dmx-compress
174
+ ```
175
+
176
+ Or from source:
177
+ ```bash
178
+ git clone https://github.com/willjriley/dmx.git && cd dmx && pip install -e .
179
+ ```
180
+
181
+ **Requirements:** Python 3.10+, PyTorch 2.0+. GPU (CUDA) is optional — automatically used when available for faster compression and decompression.
182
+
183
+ ## Quick Start
184
+
185
+ ```bash
186
+ # Compress any safetensors model (auto-detects FP16 vs FP32)
187
+ dmx compress model.safetensors model.dmx --mode auto
188
+
189
+ # Practically lossless compression (FP32 models — error below FP32 noise floor)
190
+ dmx compress model.safetensors model.dmx --mode int32
191
+
192
+ # Decompress back to safetensors (auto-uses GPU if available)
193
+ dmx decompress model.dmx model.safetensors
194
+
195
+ # Verify roundtrip quality (with JSON report)
196
+ dmx verify model.safetensors model.dmx --report verify.json
197
+
198
+ # View compression info
199
+ dmx info model.dmx
200
+ ```
201
+
202
+ ### Delta compression (checkpoint / model versioning)
203
+
204
+ ```bash
205
+ # Delta-compress a checkpoint against a base (near-lossless, ~87% savings)
206
+ dmx delta-compress base.safetensors checkpoint.safetensors delta.dmxd
207
+
208
+ # Practically lossless delta (error below FP32 noise floor, ~87% savings)
209
+ dmx delta-compress base.safetensors checkpoint.safetensors delta.dmxd --precision int32
210
+
211
+ # Reconstruct checkpoint from base + delta
212
+ dmx delta-reconstruct base.safetensors delta.dmxd restored.safetensors
213
+
214
+ # View delta file info (sparsity, compression, per-component breakdown)
215
+ dmx delta-info delta.dmxd
216
+ ```
217
+
218
+ ### Example: Compress and verify a model from HuggingFace
219
+
220
+ ```bash
221
+ # Download a model
222
+ pip install huggingface_hub
223
+ huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir ./wan_model
224
+
225
+ # Compress it
226
+ dmx compress ./wan_model/model.safetensors wan_compressed.dmx
227
+
228
+ # Decompress and verify
229
+ dmx verify ./wan_model/model.safetensors wan_compressed.dmx --report report.json
230
+ ```
231
+
232
+ ## Decompression Speed
233
+
234
+ | Model | Mode | CPU | GPU (--gpu) | Speedup |
235
+ |-------|------|-----|-------------|---------|
236
+ | Wan 1.3B | BFP | 185s | 13.4s | 13.8x |
237
+ | SVD-XT | BFP | 281s | 22.3s | 12.5x |
238
+ | SVD-XT | int16 | 10.5s | -- | CPU-bound |
239
+
240
+ *Benchmarked on RTX 4090 Laptop, Python 3.13. BFP CPU bottleneck is numpy bit manipulation; GPU path uses PyTorch CUDA ops. A native C/CUDA decoder would be 10-50x faster still.*
241
+
242
+ ## Benchmarks
243
+
244
+ ### BFP Mode (FP16 models)
245
+
246
+ | Model | Type | Original | DMX | Savings | Quality |
247
+ |-------|------|----------|-----|---------|---------|
248
+ | Llama 3 8B | LLM | 16 GB | ~7.2 GB | **55%** | **+0.16% PPL** (wikitext-2) |
249
+ | Wan 2.2 shard | Video | 7.2 GB | 1.5 GB | **79.5%** | 142/142 tensors pass |
250
+ | Wan 1.3B | Diffusion | 2.7 GB | 1.1 GB | **60%** | 825/825 tensors pass |
251
+ | SVD-XT | Video | 9.1 GB | 1.8 GB | **80%** | Verified roundtrip |
252
+
253
+ *Note: SVD-XT 80% includes FP32->FP16 conversion. Wan 2.2 79.5% is on FP32 source with BFP.*
254
+
255
+ ### BFP Quality-per-Bit (Llama 3 8B, wikitext-2, 289K tokens)
256
+
257
+ | Config | Bits/Weight | Perplexity | vs FP16 |
258
+ |--------|-----------|------------|---------|
259
+ | FP16 baseline | 16.0 | 5.4958 | -- |
260
+ | BFP(M=8) | 9.25 | 5.4964 | +0.01% |
261
+ | BFP(M=7) | 8.25 | 5.4973 | **+0.03%** |
262
+ | BFP(M=6) | 7.25 | 5.5045 | +0.16% |
263
+ | *GGUF Q8_0 (ref)* | *8.50* | *~5.55-5.58* | *~1.0-1.5% (different purpose — inference format)* |
264
+
265
+ ### int16 Mode (FP32 models)
266
+
267
+ | Model | Type | Original | DMX | Savings | PPL Change |
268
+ |-------|------|----------|-----|---------|------------|
269
+ | SVD-XT | Video | 8.9 GB | 4.0 GB | **55.5%** | Lossless |
270
+ | GPT-2 | LLM | 475 MB | 201 MB | **57.7%** | +0.22% |
271
+ | Phi-2 | LLM | 10.6 GB | 4.2 GB | **60.1%** | +0.12% |
272
+
273
+ ### Why DMX beats generic compression
274
+
275
+ | Method | Bits/value | Notes |
276
+ |--------|-----------|-------|
277
+ | gzip on safetensors | ~15.5 | Raw floats look like noise |
278
+ | zstd level 19 | 14.06 | Dictionary matching, no prediction |
279
+ | **DMX int16 + entropy** | **11.45** | Aligned quantization enables structured entropy coding |
280
+ | **DMX BFP + zstd** | **~4.2** | Shared exponent eliminates wasted dynamic range |
281
+
282
+ ## Pre-Compressed Models (Try It Now)
283
+
284
+ Download DMX-compressed models and decompress them yourself:
285
+
286
+ | Model | Original | DMX | Savings | Verified | Link |
287
+ |-------|----------|-----|---------|----------|------|
288
+ | Wan 1.3B (Diffusion) | 2.7 GB | 1.1 GB | 60% | 825/825 tensors | [Download](https://huggingface.co/Senat1/dmx-wan-1.3b) |
289
+ | Wan 2.2 14B Shard 6 | 7.2 GB | 1.5 GB | 79.5% | 142/142 tensors | [Download](https://huggingface.co/Senat1/dmx-wan2.2-shard6) |
290
+ | SVD-XT (Video) | 9.1 GB | 1.8 GB | 80% | Roundtrip verified | [Download](https://huggingface.co/Senat1/dmx-svd-xt) |
291
+
292
+ Each includes a JSON verification report with SHA-256 hashes and per-tensor cosine similarity scores.
293
+
294
+ ## Format Specification
295
+
296
+ See [spec/dmx_spec_v1.md](spec/dmx_spec_v1.md) for the complete format specification.
297
+
298
+ ## Paper
299
+
300
+ [DMX: Delta Multiplexed Compression for Neural Network Model Weights (PDF)](https://github.com/willjriley/dmx/raw/main/paper/DMX_Paper.pdf) — click to download
301
+
302
+ ## Background
303
+
304
+ DMX is based on the principle that floating-point weights should be transformed into multiple statistically distinct, independently modeled entropy domains prior to compression. Trained neural network weights exhibit extreme exponent clustering — 74% of FP16 values use only 3 of 31 possible exponents, wasting 2.4 bits per value. DMX decomposes the floating-point representation into separate exponent and mantissa streams, each with distinct statistical properties that benefit from independent entropy coding. For FP32 models, aligned cross-layer quantization enforces a global coordinate system across layers, enabling additional integer-domain compression. The format auto-profiles each model to select the optimal compression strategy per component.
305
+
306
+ ## Validated Results: Checkpoint Delta Compression
307
+
308
+ All results are measured on real data using an NVIDIA A100-SXM4-80GB. Scripts are in `experiments/checkpoint_delta/`.
309
+
310
+ ### Compression across architectures
311
+
312
+ | Model | Architecture | Params | Consecutive Delta Zeros | Entropy (bits) | Measured Savings |
313
+ |-------|-------------|-------:|:----------------------:|:--------------:|:----------------:|
314
+ | GPT-2 | Decoder-only | 163M | 33-67% | 1.76-3.02 | **87.3%** (measured, 498→63 MB) |
315
+ | T5-small | Encoder-decoder | 110M | **89-94%** | **0.49-0.85** | Not yet measured in bytes |
316
+ | TinyLlama | Decoder-only | 1.1B | — | — | **80%** (measured, fine-tune base→chat) |
317
+
318
+ Delta compression works across model architectures. T5 encoder-decoder shows higher sparsity than decoder-only models. Real-byte compression for T5 is pending.
319
+
320
+ ### Precision tiers
321
+
322
+ Both tiers achieve comparable compression — the aligned quantization produces similar entropy regardless of bit width:
323
+
324
+ | Tier | Consecutive Entropy | Compression | Error | Use Case |
325
+ |------|-------------------:|:----------:|:-----:|----------|
326
+ | **int16 aligned** | 0.6-1.3 bits | **87%** | +0.06% RelL2 | Maximum compression |
327
+ | **int32 aligned** | 1.0-1.2 bits | **~87%** | 1.87e-7 RelL2 | Practically lossless (error below FP32 noise floor) |
328
+ | Raw bit XOR (no alignment) | 14-16 bits | 8.5% | Bit-exact | Baseline — alignment is essential |
329
+
330
+ ### Full checkpoint including optimizer states
331
+
332
+ Training checkpoints include model weights + Adam optimizer states (momentum + variance), typically 3x the weight size. Validated on GPT-2 124M, 1000 training steps:
333
+
334
+ | Component | % of Checkpoint | Delta Sparsity | Entropy | Compression |
335
+ |-----------|:-:|:-:|:-:|:-:|
336
+ | Weights | 33% | 55-66% zeros | 1.8-2.6 bits | ~84% |
337
+ | Momentum (exp_avg) | 33% | 28-30% zeros | 7.5-9.0 bits | ~53% |
338
+ | Variance (exp_avg_sq) | 33% | **91-92% zeros** | **0.6 bits** | **~96%** |
339
+ | **Full checkpoint** | 100% | — | — | **~79%** |
340
+
341
+ ### Safety for training resumption
342
+
343
+ Training from a DMX-reconstructed checkpoint produces **0.042% loss difference** compared to the original — negligible for any practical purpose:
344
+
345
+ ```
346
+ Step | Original | DMX Recon | Diff
347
+ 1 | 0.783582 | 0.784023 | 0.00044072
348
+ 51 | 1.098088 | 1.098552 | 0.00046420
349
+ 91 | 0.537082 | 0.537364 | 0.00028241
350
+
351
+ Final avg loss (last 20 steps): 0.042% difference
352
+ ```
353
+
354
+ ### Zero error accumulation in delta chains (Test 8)
355
+
356
+ Chained reconstruction (base → delta1 → delta2 → ... → deltaN) produces **identical** results to direct reconstruction (base + deltaN) — verified to 10 decimal places across both int16 and int32 modes. This is not an approximation: delta application is exact integer arithmetic, so error is mathematically constant regardless of chain length. Re-anchoring is needed only for delta *size* control, never for error control.
357
+
358
+ ### Fine-tune variant compression
359
+
360
+ TinyLlama 1.1B base → chat fine-tune: **80% savings** (876 MB delta vs 4.4 GB full copy). Store the base model once, distribute each fine-tune variant as a small delta.
361
+
362
+ ### Projected savings at frontier scale
363
+
364
+ These are **projections** extrapolated from observed sparsity and scaling behavior across GPT-2 (163M), T5 (110M), and TinyLlama (1.1B). The underlying property — small per-step weight updates due to SGD dynamics — is scale-invariant, but real-byte validation at 70B+ scale is in progress.
365
+
366
+ | Scenario | Raw Storage | Projected DMX | Projected Savings |
367
+ |----------|------------|--------------|:-----------------:|
368
+ | 200 checkpoints of Llama 70B (weights only) | 28 TB | ~3.6 TB | ~87% |
369
+ | 200 checkpoints of Llama 70B (full w/ optimizer) | 84 TB | ~18 TB | ~79% |
370
+ | 20 fine-tune variants of Llama 70B | 2.8 TB | ~700 GB | ~75% |
371
+
372
+ **Caveats**: Validation on 8B+ models with frontier training schedules is in progress. Momentum compression (53%) was measured on wikitext-2; diverse training data may yield 40-45%, reducing full-checkpoint savings to ~70-73%. The 87% weight compression is measured on GPT-2; larger models may differ.
373
+
374
+ ---
375
+
376
+ ## Research Directions
377
+
378
+ DMX's underlying compression technique applies to structured floating-point data beyond individual model files. These are active research areas, not yet proven at scale. We welcome collaboration.
379
+
380
+ **1. Training checkpoint compression (highest priority).** Frontier training produces hundreds of near-identical high-precision checkpoints. Aligned cross-layer quantization enables efficient delta encoding between them. Early results are in the [Validated Results](#validated-results-checkpoint-delta-compression) section above. Key finding: alignment is critical — without it, deltas show almost no sparsity and compress poorly.
381
+
382
+ **2. Model family distribution.** Storing fine-tuned variants (chat, code, reasoning, etc.) as small deltas from a shared base model. Early result: TinyLlama base → chat = 80% savings (876 MB vs 4.4 GB).
383
+
384
+ **3. Scientific and sensor data.** Early tests on NOAA weather data show similar exponent clustering, suggesting potential applications in climate, seismic, and satellite data.
385
+
386
+ ## License & Patent
387
+
388
+ **Code:** MIT License — free to use, modify, and distribute.
389
+
390
+ **Methods:** Patent Pending (U.S. Provisional Applications filed April 2026). The patented methods cover aligned cross-layer quantization for neural network weight compression and stream-separated block floating point encoding with independent entropy coding. Personal, academic, and open-source use is unrestricted. Commercial use of the patented methods may require a license from the inventor — contact bill.riley@gmail.com.
391
+
392
+ ## Citation
393
+
394
+ ```bibtex
395
+ @software{riley2026dmx,
396
+ author = {Riley, William J},
397
+ title = {DMX: Delta Multiplexed Model Format},
398
+ year = {2026},
399
+ url = {https://github.com/willjriley/dmx}
400
+ }
401
+ ```