bitneural32 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,18 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Aizhee
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
13
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
14
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
15
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
16
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
17
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
18
+ SOFTWARE.
@@ -0,0 +1,422 @@
1
+ Metadata-Version: 2.4
2
+ Name: bitneural32
3
+ Version: 0.0.1
4
+ Summary: BitNeural32: 1.58-bit Ternary Neural Network Compiler & QAT Library for ESP32
5
+ Author-email: Aizhee <aizharjamilano@gmail.com>
6
+ Maintainer-email: Aizhee <aizharjamilano@gmail.com>
7
+ License: MIT
8
+ Project-URL: Homepage, https://github.com/Aizhee/python-bitneural32
9
+ Project-URL: Repository, https://github.com/Aizhee/python-bitneural32.git
10
+ Project-URL: Documentation, https://github.com/Aizhee/python-bitneural32/wiki
11
+ Keywords: ternary,neural-network,ESP32,bitnet,quantization,embedded-ml,qat
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
16
+ Classifier: Topic :: Software Development :: Embedded Systems
17
+ Classifier: License :: OSI Approved :: MIT License
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.9
20
+ Classifier: Programming Language :: Python :: 3.10
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Operating System :: OS Independent
24
+ Requires-Python: <4,>=3.9
25
+ Description-Content-Type: text/markdown
26
+ License-File: LICENSE
27
+ Requires-Dist: keras>=3.0.0
28
+ Requires-Dist: tensorflow>=2.16.0
29
+ Requires-Dist: numpy<2.0,>=1.21.0
30
+ Provides-Extra: dev
31
+ Requires-Dist: pytest>=7.0; extra == "dev"
32
+ Requires-Dist: pytest-cov>=4.0; extra == "dev"
33
+ Requires-Dist: black>=23.0; extra == "dev"
34
+ Requires-Dist: flake8>=6.0; extra == "dev"
35
+ Requires-Dist: mypy>=1.0; extra == "dev"
36
+ Provides-Extra: docs
37
+ Requires-Dist: sphinx>=5.0; extra == "docs"
38
+ Requires-Dist: sphinx-rtd-theme>=1.2; extra == "docs"
39
+ Dynamic: license-file
40
+
41
+ # BitNeural32: 1.58-Bit Ternary Neural Network Compiler for ESP32
42
+
43
+ [![PyPI](https://img.shields.io/pypi/v/bitneural32.svg)](https://pypi.org/project/bitneural32/)
44
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
45
+ [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
46
+
47
+ A Python library for training, quantizing, and compiling neural networks to ultra-efficient 1.58-bit (ternary) format for deployment on ESP32 microcontrollers.
48
+
49
+ > See also: [BitNeural32 Inference Library](https://github.com/aizhee/arduino-bitneural32)
50
+
51
+ ## Features
52
+
53
+ **1.58-Bit Quantization**: Extreme compression—weights packed as 2-bit values (4 weights per byte) using ternary {-1, 0, 1}
54
+
55
+ **Quantization-Aware Training (QAT)**: Custom Keras layers that apply quantization during training for better post-export accuracy
56
+
57
+ **Production-Ready Compiler**: Convert Keras models to optimized C bytecode with automatic weight flattening, packing, and metadata generation
58
+
59
+ **Inference Metrics**: Estimate inference time, RAM usage, and Flash size for different ESP32 variants (ESP32, ESP32-S3, ESP32-C3)
60
+
61
+ **15+ Layer Types**: Dense, Conv1D, Conv2D, LSTM, GRU, ReLU, LeakyReLU, Softmax, Sigmoid, Tanh, MaxPooling1D, Flatten, Dropout, and more
62
+
63
+ **Type Safe**: Full Python 3.9+ support with comprehensive type hints
64
+
65
+ ## Installation
66
+
67
+ ### From PyPI (recommended)
68
+
69
+ ```bash
70
+ pip install bitneural32
71
+ ```
72
+
73
+ ### Requirements
74
+
75
+ - **Python**: 3.9 or higher
76
+ - **Keras**: 3.0+
77
+ - **TensorFlow**: 2.16+ (or standalone Keras 3.x)
78
+ - **NumPy**: 1.21+
79
+
80
+ ## Quick Start
81
+
82
+ ### 1. Train with Quantization-Aware Training (Recommended)
83
+
84
+ ```python
85
+ import numpy as np
86
+ import keras
87
+ from bitneural32.qat import TernaryDense, TernaryConv1D
88
+
89
+ # Build a QAT model
90
+ model = keras.Sequential([
91
+ TernaryConv1D(filters=32, kernel_size=5, padding='same', input_shape=(100, 1)),
92
+ keras.layers.ReLU(),
93
+ keras.layers.MaxPooling1D(2),
94
+ keras.layers.Flatten(),
95
+ TernaryDense(64),
96
+ keras.layers.ReLU(),
97
+ TernaryDense(10, activation='softmax')
98
+ ])
99
+
100
+ # Train normally—quantization happens automatically
101
+ model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
102
+ X_train = np.random.randn(1000, 100, 1).astype('float32')
103
+ Y_train = keras.utils.to_categorical(np.random.randint(0, 10, 1000), 10)
104
+ model.fit(X_train, Y_train, epochs=10, batch_size=32, verbose=1)
105
+
106
+ # Save for export
107
+ model.save('qat_model.keras')
108
+ ```
109
+
110
+ ### 2. Compile to ESP32 Bytecode
111
+
112
+ ```python
113
+ from bitneural32.compiler import BitNeuralCompiler
114
+
115
+ # Load and compile
116
+ compiler = BitNeuralCompiler(board_type='ESP32-S3')
117
+ compiled_model = keras.models.load_model('qat_model.keras')
118
+ compiler.compile_model(compiled_model, input_data=X_train)
119
+ compiler.save_c_header('model_data.h', include_metrics=True)
120
+
121
+ # View metrics
122
+ report = compiler.get_compilation_report()
123
+ print(report)
124
+ ```
125
+
126
+ Output example:
127
+ ```
128
+ {
129
+ "board_type": "ESP32-S3",
130
+ "total_size_bytes": 24576,
131
+ "num_layers": 8,
132
+ "inference_time_ms": 12.5,
133
+ "ram_usage_bytes": 1024,
134
+ "total_macs": 2500000,
135
+ "layers": [...]
136
+ }
137
+ ```
138
+
139
+ ### 3. Run on ESP32
140
+
141
+ Include the generated header in your C firmware:
142
+
143
+ ```c
144
+ #include "bitneural.h"
145
+ #include "model_data.h"
146
+
147
+ void app_main() {
148
+ bn_init(); // Register all kernels
149
+
150
+ float input[100] = {...};
151
+ float output[10];
152
+
153
+ bn_run_inference(model_data, input, output);
154
+ printf("Prediction: %d\n", argmax(output, 10));
155
+ }
156
+ ```
157
+
158
+ ## API Reference
159
+
160
+ ### QAT Layers
161
+
162
+ All custom QAT layers support standard Keras layer interfaces and compile seamlessly:
163
+
164
+ #### `TernaryDense(units, **kwargs)`
165
+ Fully-connected layer with ternary quantization.
166
+
167
+ ```python
168
+ layer = TernaryDense(64, activation='relu')
169
+ ```
170
+
171
+ #### `TernaryConv1D(filters, kernel_size, strides=1, padding='same', **kwargs)`
172
+ 1D convolution optimized for single-channel inputs (e.g., time-series).
173
+
174
+ ```python
175
+ layer = TernaryConv1D(32, kernel_size=5, padding='same')
176
+ ```
177
+
178
+ #### `TernaryConv2D(filters, kernel_size, strides=1, padding='same', **kwargs)`
179
+ 2D convolution supporting multi-channel inputs and outputs.
180
+
181
+ ```python
182
+ layer = TernaryConv2D(16, kernel_size=3, padding='same')
183
+ ```
184
+
185
+ #### `TernaryLSTM(units, return_sequences=False, **kwargs)`
186
+ LSTM recurrent layer with quantized weights and float32 biases.
187
+
188
+ ```python
189
+ layer = TernaryLSTM(32, return_sequences=True)
190
+ ```
191
+
192
+ #### `TernaryGRU(units, return_sequences=False, **kwargs)`
193
+ GRU recurrent layer with quantized weights and float32 biases.
194
+
195
+ ```python
196
+ layer = TernaryGRU(32, return_sequences=False)
197
+ ```
198
+
199
+ ### Compiler API
200
+
201
+ #### `BitNeuralCompiler(model=None, board_type='ESP32')`
202
+
203
+ **Parameters**:
204
+ - `board_type` (str): Target ESP32 variant ('ESP32', 'ESP32-S3', 'ESP32-C3')
205
+
206
+ **Methods**:
207
+
208
+ - `compile_model(model, input_data=None, allow_metrics=False)`: Compile a Keras model
209
+ - `save_c_header(filepath, include_metrics=False)`: Export to C header file
210
+ - `get_compilation_report()`: Get human-readable report (dict)
211
+ - `export_model(filepath, allow_metrics=False)`: Convenience export function
212
+
213
+ **Example**:
214
+ ```python
215
+ compiler = BitNeuralCompiler(board_type='ESP32-S3')
216
+ compiler.compile_model(model, input_data=X_train, allow_metrics=True)
217
+ compiler.save_c_header('model.h', include_metrics=True)
218
+ ```
219
+
220
+ ### Quantization Utilities
221
+
222
+ #### `quantize_weights_ternary(weights)`
223
+ Quantize float32 weights to {-1, 0, 1} using median-based thresholding.
224
+
225
+ ```python
226
+ from bitneural32.quantize import quantize_weights_ternary
227
+ quantized = quantize_weights_ternary(np.random.randn(100, 100))
228
+ ```
229
+
230
+ #### `pack_weights_2bit(quantized_weights)`
231
+ Pack ternary weights into 2-bit format (4 weights per byte).
232
+
233
+ ```python
234
+ from bitneural32.quantize import pack_weights_2bit
235
+ packed = pack_weights_2bit(quantized)
236
+ ```
237
+
238
+ ## Architecture Overview
239
+
240
+ ### Quantization Strategy
241
+
242
+ BitNeural32 uses **ternary quantization**:
243
+
244
+ 1. **Median-based thresholding**: Set threshold = median(|weights|)
245
+ 2. **Ternary encoding**:
246
+ - Weight > threshold → 1
247
+ - Weight < -threshold → -1
248
+ - Otherwise → 0
249
+ 3. **2-bit packing**: 4 weights per byte (2 bits each)
250
+
251
+ **Encoding**:
252
+ - `00` → 0
253
+ - `01` → 1
254
+ - `10` → -1
255
+ - `11` → reserved
256
+
257
+ ### QAT Training
258
+
259
+ Quantization-aware training applies quantization in-the-loop:
260
+
261
+ 1. **Forward pass**: Weights quantized to {-1, 0, 1} with learnable scale
262
+ 2. **Backward pass**: Straight-through estimator (STE) for gradient computation
263
+ 3. **Result**: Network adapts to quantization → 2-5% higher accuracy after export
264
+
265
+ ### Compilation Pipeline
266
+
267
+ ```
268
+ Keras Model
269
+
270
+ [Per-Layer Compilation]
271
+
272
+ Weight Flattening (layer-specific order)
273
+
274
+ Ternary Quantization + 2-Bit Packing
275
+
276
+ Binary Blob Generation
277
+
278
+ C Header Export
279
+
280
+ model_data.h (ready for ESP32 inclusion)
281
+ ```
282
+
283
+ ## Performance Characteristics
284
+
285
+ ### Memory Footprint
286
+
287
+ **Example: 10→64→32→10 network**
288
+
289
+ | Format | Size |
290
+ |--------|------|
291
+ | Float32 | 40 KB |
292
+ | Ternary (1.58-bit) | 2.5 KB |
293
+ | **Compression** | **94%** |
294
+
295
+ ### Inference Speed (ESP32 @ 240 MHz)
296
+
297
+ | Layer Type | Input→Output | Approx. Time |
298
+ |-----------|------------|--------------|
299
+ | Dense | 1000→1000 | 10-50 ms |
300
+ | Conv1D | 100 inputs, 32 filters, kernel 5 | 5-20 ms |
301
+ | Conv2D | 28×28→14×14, 32 filters | 20-100 ms |
302
+ | LSTM | 32 hidden, 50 timesteps | 15-80 ms |
303
+ | Full Network | 10→64→32→10 | 1-5 ms |
304
+
305
+ ## Supported Layers
306
+
307
+ | Layer | QAT Version | Notes |
308
+ |-------|------------|-------|
309
+ | Dense | TernaryDense | ✅ Full support |
310
+ | Conv1D | TernaryConv1D | ✅ Mono-channel optimized |
311
+ | Conv2D | TernaryConv2D | ✅ Multi-channel support |
312
+ | LSTM | TernaryLSTM | ✅ Quantized kernel & recurrent |
313
+ | GRU | TernaryGRU | ✅ Quantized kernel & recurrent |
314
+ | ReLU | Standard | ✅ No quantization needed |
315
+ | LeakyReLU | Standard | ✅ Works as-is |
316
+ | Softmax | Standard | ✅ Uses float32 for stability |
317
+ | Sigmoid | Standard | ✅ Fast Padé approximation on ESP32 |
318
+ | Tanh | Standard | ✅ Fast Padé approximation on ESP32 |
319
+ | MaxPooling1D | Standard | ✅ No quantization |
320
+ | Flatten | Standard | ✅ Memory layout only |
321
+ | Dropout | Standard | ✅ No-op at inference |
322
+
323
+ ## Tips & Best Practices
324
+
325
+ ### Model Design
326
+
327
+ - **Start with QAT layers** for better accuracy after quantization
328
+ - **Use smaller models**: Ternary networks benefit from depth over width
329
+ - **Avoid BatchNormalization** before quantized layers (fuse into weights)
330
+ - **Use ReLU/LeakyReLU** for better quantization robustness
331
+
332
+ ### Training
333
+
334
+ - **Learning rate**: Use 10× lower LR than standard training
335
+ - **Epochs**: Train 20-50% longer to adapt to quantization
336
+ - **Batch size**: 32-128 works well for most models
337
+ - **Monitor accuracy**: QAT models may drop 1-3% initially, then recover
338
+
339
+ ### Compilation
340
+
341
+ - **Always provide input_data**: Needed for input normalization statistics
342
+ - **Check metrics**: Use `allow_metrics=True` to estimate ESP32 performance
343
+ - **Board selection**: ESP32-S3 has more RAM; ESP32-C3 is power-efficient
344
+
345
+ ### Deployment
346
+
347
+ - **Test on target hardware**: Simulator timings differ from real ESP32
348
+ - **Use dual-core**: Enable Core 1 for real-time audio/sensor processing
349
+ - **Monitor UART**: Check inference logs for bottlenecks
350
+
351
+ ## Examples
352
+
353
+ Complete examples available in the [GitHub repository](https://github.com/yourusername/bitneural32):
354
+
355
+ - `examples/mnist_qat.py` - MNIST classification with QAT
356
+ - `examples/audio_keyword_spotting.py` - Keyword spotting on audio
357
+ - `examples/time_series_forecasting.py` - LSTM forecasting
358
+ - `examples/esp32_firmware.c` - Complete ESP32 implementation
359
+
360
+ ## Troubleshooting
361
+
362
+ ### "Unsupported layer type"
363
+
364
+ Make sure you're using QAT versions or standard Keras layers. If custom layer:
365
+ ```python
366
+ # Add to compiler mapping
367
+ from bitneural32.compiler import BitNeuralCompiler
368
+ BitNeuralCompiler.LAYER_COMPILER_MAP['MyLayer'] = MyLayerCompiler()
369
+ ```
370
+
371
+ ### Model accuracy drops significantly after quantization
372
+
373
+ - Use QAT layers instead of post-training quantization
374
+ - Train longer (2-3× epochs)
375
+ - Lower learning rate by 10×
376
+ - Use warm-up training (standard float → gradual quantization)
377
+
378
+ ### Compiled model is too large
379
+
380
+ - Reduce model size (fewer filters/units)
381
+ - Use depthwise separable convolutions
382
+ - Remove dense layers, use global pooling instead
383
+ - Prune weights before compilation
384
+
385
+ ### ESP32 inference is slow
386
+
387
+ - Check clock speed (set to 240 MHz max)
388
+ - Profile with `bn_run_inference()` timing
389
+ - Use Conv1D instead of Dense for temporal data
390
+ - Consider smaller input resolution
391
+
392
+
393
+ ## Citation
394
+
395
+ If you use BitNeural32 in your research, please cite:
396
+
397
+ ```bibtex
398
+ @software{bitneural32,
399
+ title = {BitNeural32: 1.58-Bit Ternary Neural Network Compiler for ESP32},
400
+ author = {Aizhee},
401
+ year = {2025},
402
+ url = {https://github.com/aizhee/python-bitneural32}
403
+ }
404
+ ```
405
+
406
+ ## License
407
+
408
+ MIT License - See [LICENSE](LICENSE) file for details.
409
+
410
+ ## References
411
+
412
+ - **BitNet Paper**: [arxiv.org/abs/2310.11453](https://arxiv.org/abs/2310.11453)
413
+ - **Ternary Networks**: [arxiv.org/abs/1605.01740](https://arxiv.org/abs/1605.01740)
414
+ - **ESP32 Docs**: [docs.espressif.com](https://docs.espressif.com)
415
+ - **Keras API**: [keras.io](https://keras.io)
416
+
417
+ ---
418
+
419
+
420
+ **Made with ❤️ by Aizhee for embedded machine learning**
421
+
422
+ [![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/O4O0XNVKI)