grilly 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- grilly-0.1.0/CHANGELOG.md +83 -0
- grilly-0.1.0/LICENSE +21 -0
- grilly-0.1.0/MANIFEST.in +28 -0
- grilly-0.1.0/PKG-INFO +342 -0
- grilly-0.1.0/README.md +296 -0
- grilly-0.1.0/grilly.egg-info/PKG-INFO +342 -0
- grilly-0.1.0/grilly.egg-info/SOURCES.txt +36 -0
- grilly-0.1.0/grilly.egg-info/dependency_links.txt +1 -0
- grilly-0.1.0/grilly.egg-info/requires.txt +25 -0
- grilly-0.1.0/grilly.egg-info/top_level.txt +1 -0
- grilly-0.1.0/pyproject.toml +115 -0
- grilly-0.1.0/setup.cfg +4 -0
- grilly-0.1.0/tests/test_adamw.py +341 -0
- grilly-0.1.0/tests/test_attention.py +60 -0
- grilly-0.1.0/tests/test_backward.py +284 -0
- grilly-0.1.0/tests/test_batchnorm2d.py +398 -0
- grilly-0.1.0/tests/test_conv2d.py +414 -0
- grilly-0.1.0/tests/test_core.py +117 -0
- grilly-0.1.0/tests/test_device_manager.py +134 -0
- grilly-0.1.0/tests/test_gemm_backward.py +287 -0
- grilly-0.1.0/tests/test_gpu_operations.py +279 -0
- grilly-0.1.0/tests/test_gru.py +318 -0
- grilly-0.1.0/tests/test_huggingface_bridge.py +142 -0
- grilly-0.1.0/tests/test_huggingface_t5.py +429 -0
- grilly-0.1.0/tests/test_integration.py +108 -0
- grilly-0.1.0/tests/test_integration_vulkan.py +180 -0
- grilly-0.1.0/tests/test_learning.py +79 -0
- grilly-0.1.0/tests/test_lr_scheduler.py +359 -0
- grilly-0.1.0/tests/test_lstm.py +333 -0
- grilly-0.1.0/tests/test_memory_operations.py +58 -0
- grilly-0.1.0/tests/test_multimodal.py +438 -0
- grilly-0.1.0/tests/test_pooling.py +490 -0
- grilly-0.1.0/tests/test_pytorch_compat.py +140 -0
- grilly-0.1.0/tests/test_pytorch_ops.py +197 -0
- grilly-0.1.0/tests/test_sentence_transformers_gpu.py +198 -0
- grilly-0.1.0/tests/test_snn.py +179 -0
- grilly-0.1.0/tests/test_tensor_conversion.py +180 -0
- grilly-0.1.0/tests/test_vulkan_tensor.py +132 -0
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to Grilly will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [0.1.0] - 2026-01-31
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
- Initial release of Grilly framework
|
|
12
|
+
- Vulkan compute backend for GPU acceleration
|
|
13
|
+
- Support for AMD, NVIDIA, and Intel GPUs via Vulkan
|
|
14
|
+
- Spiking Neural Network (SNN) operations
|
|
15
|
+
- LIF (Leaky Integrate-and-Fire) neurons
|
|
16
|
+
- GIF (Generalized Integrate-and-Fire) neurons
|
|
17
|
+
- STDP (Spike-Timing-Dependent Plasticity)
|
|
18
|
+
- Hebbian learning
|
|
19
|
+
- Synaptic connections
|
|
20
|
+
- Continuous-to-spike and spike-to-continuous bridges
|
|
21
|
+
- Feedforward Neural Network (FNN) operations
|
|
22
|
+
- Linear layers with multiple activation functions
|
|
23
|
+
- Activations: ReLU, GELU, SiLU, SoftMax, SoftPlus, SwiGLU, GEGLU, ReGLU, RoSwish, GCU
|
|
24
|
+
- Layer normalization, RMS normalization, batch normalization
|
|
25
|
+
- Flash Attention 2 with RoPE support
|
|
26
|
+
- Convolutional networks (Conv2D, MaxPool2D, AvgPool2D)
|
|
27
|
+
- LSTM cells
|
|
28
|
+
- Learning algorithms
|
|
29
|
+
- EWC (Elastic Weight Consolidation)
|
|
30
|
+
- NLMS (Normalized Least Mean Squares) with ensemble support
|
|
31
|
+
- Fisher Information Matrix computation
|
|
32
|
+
- Natural gradients
|
|
33
|
+
- Adam optimizer
|
|
34
|
+
- Whitening transforms
|
|
35
|
+
- Memory operations
|
|
36
|
+
- FAISS-based similarity search (distance, top-k, IVF, k-means)
|
|
37
|
+
- GPU-accelerated memory read/write
|
|
38
|
+
- Context aggregation
|
|
39
|
+
- Memory injection (concatenation, gating, residual)
|
|
40
|
+
- Capsule networks with dentate gyrus expansion
|
|
41
|
+
- Transformer support
|
|
42
|
+
- VulkanSentenceTransformer for embedding models
|
|
43
|
+
- Architecture-specific optimizations (BERT, GPT, T5, RoBERTa, DistilBERT, MPNet, XLM-RoBERTa, ALBERT)
|
|
44
|
+
- HuggingFace model bridge (load weights without PyTorch runtime)
|
|
45
|
+
- Fused operations (QKV projection, linear+activation)
|
|
46
|
+
- Prosody-modulated attention
|
|
47
|
+
- Specialized operations
|
|
48
|
+
- Place and time cells
|
|
49
|
+
- Theta-gamma encoding
|
|
50
|
+
- FFT operations (bit-reversal, butterfly, magnitude, power spectrum)
|
|
51
|
+
- Domain adaptation (classification, routing, expert combination)
|
|
52
|
+
- Semantic and affective encoding
|
|
53
|
+
- 137 GLSL compute shaders (138 compiled SPIR-V)
|
|
54
|
+
- LoRA (Low-Rank Adaptation) for efficient fine-tuning with backward pass
|
|
55
|
+
- Gradient checkpointing for memory optimization
|
|
56
|
+
- CPU fallback for unsupported operations
|
|
57
|
+
- Comprehensive test suite with GPU/CPU markers
|
|
58
|
+
|
|
59
|
+
### Features
|
|
60
|
+
- **Hardware Agnostic**: Works on AMD RX 6750 XT, NVIDIA RTX, Intel Arc
|
|
61
|
+
- **No PyTorch Runtime**: Load HuggingFace models as pure Vulkan tensors
|
|
62
|
+
- **Memory Efficient**: 12GB VRAM fine-tuning via LoRA
|
|
63
|
+
- **Bio-Inspired**: SNN operations for neuromorphic computing
|
|
64
|
+
- **Production Ready**: FastAPI integration examples
|
|
65
|
+
|
|
66
|
+
### Documentation
|
|
67
|
+
- Installation guide (uv and pip)
|
|
68
|
+
- Architecture-specific shader guide
|
|
69
|
+
- CLAUDE.md for AI assistant integration
|
|
70
|
+
- Example notebooks for common use cases
|
|
71
|
+
|
|
72
|
+
### Known Issues
|
|
73
|
+
- Some advanced SNN features still in development (PLIF, LSNN, Izhikevich)
|
|
74
|
+
- ANN→SNN conversion tool pending
|
|
75
|
+
- Multi-GPU support not yet implemented
|
|
76
|
+
|
|
77
|
+
### Upcoming (0.2.0)
|
|
78
|
+
- PLIF (Parametric LIF) neurons
|
|
79
|
+
- LSNN (Long Short-Term Memory neurons)
|
|
80
|
+
- Surrogate gradients for SNN training
|
|
81
|
+
- ANN→SNN conversion tool
|
|
82
|
+
- Multi-GPU distributed training
|
|
83
|
+
- More architecture-specific optimizations
|
grilly-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 GrillCheese AI
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
grilly-0.1.0/MANIFEST.in
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# Include documentation
|
|
2
|
+
include README.md
|
|
3
|
+
include LICENSE
|
|
4
|
+
include CHANGELOG.md
|
|
5
|
+
include CLAUDE.md
|
|
6
|
+
|
|
7
|
+
# Include all shader files
|
|
8
|
+
recursive-include grilly/shaders *.glsl
|
|
9
|
+
recursive-include grilly/shaders *.comp
|
|
10
|
+
recursive-include grilly/shaders *.spv
|
|
11
|
+
|
|
12
|
+
# Include configuration files
|
|
13
|
+
include pyproject.toml
|
|
14
|
+
|
|
15
|
+
# Exclude development files
|
|
16
|
+
exclude .gitignore
|
|
17
|
+
exclude .git*
|
|
18
|
+
recursive-exclude * __pycache__
|
|
19
|
+
recursive-exclude * *.py[co]
|
|
20
|
+
recursive-exclude * .pytest_cache
|
|
21
|
+
recursive-exclude * .venv
|
|
22
|
+
recursive-exclude * .mypy_cache
|
|
23
|
+
recursive-exclude * .ruff_cache
|
|
24
|
+
|
|
25
|
+
# Exclude tests and examples from package (keep in source)
|
|
26
|
+
prune grilly/tests
|
|
27
|
+
prune examples
|
|
28
|
+
prune datasets
|
grilly-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,342 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: grilly
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: GPU-accelerated neural network operations using Vulkan compute shaders
|
|
5
|
+
Author-email: Nick <nick@grillcheeseai.com>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://grillcheeseai.com
|
|
8
|
+
Project-URL: Repository, https://github.com/grillcheese-ai/grilly
|
|
9
|
+
Project-URL: Documentation, https://docs.grillcheeseai.com/grilly
|
|
10
|
+
Keywords: vulkan,gpu,neural-network,snn,compute-shaders,gpu-acceleration
|
|
11
|
+
Classifier: Development Status :: 3 - Alpha
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
19
|
+
Classifier: Topic :: Scientific/Engineering :: Mathematics
|
|
20
|
+
Requires-Python: >=3.10
|
|
21
|
+
Description-Content-Type: text/markdown
|
|
22
|
+
License-File: LICENSE
|
|
23
|
+
Requires-Dist: numba>=0.63.1
|
|
24
|
+
Requires-Dist: numpy>=1.24.0
|
|
25
|
+
Requires-Dist: pytest>=9.0.2
|
|
26
|
+
Requires-Dist: pytest-asyncio>=1.3.0
|
|
27
|
+
Requires-Dist: pytest-benchmark>=5.2.3
|
|
28
|
+
Requires-Dist: pyvma>=2.0.0.1
|
|
29
|
+
Requires-Dist: sentence-transformers>=5.2.0
|
|
30
|
+
Requires-Dist: torch>=2.10.0
|
|
31
|
+
Requires-Dist: transformers>=4.57.6
|
|
32
|
+
Requires-Dist: vulkan>=1.3.0
|
|
33
|
+
Provides-Extra: dev
|
|
34
|
+
Requires-Dist: pytest>=7.4.0; extra == "dev"
|
|
35
|
+
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
|
|
36
|
+
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
|
|
37
|
+
Requires-Dist: ruff>=0.1.0; extra == "dev"
|
|
38
|
+
Requires-Dist: black>=23.7.0; extra == "dev"
|
|
39
|
+
Requires-Dist: isort>=5.12.0; extra == "dev"
|
|
40
|
+
Requires-Dist: mypy>=1.5.0; extra == "dev"
|
|
41
|
+
Provides-Extra: accel
|
|
42
|
+
Requires-Dist: numba>=0.59.0; extra == "accel"
|
|
43
|
+
Provides-Extra: all
|
|
44
|
+
Requires-Dist: grilly[accel,dev]; extra == "all"
|
|
45
|
+
Dynamic: license-file
|
|
46
|
+
|
|
47
|
+
# Grilly
|
|
48
|
+
|
|
49
|
+
GPU-accelerated neural network framework using Vulkan compute shaders. Supports AMD, NVIDIA, and Intel GPUs.
|
|
50
|
+
|
|
51
|
+
## Features
|
|
52
|
+
|
|
53
|
+
### Neural Network Operations
|
|
54
|
+
- **Feedforward Networks**: Linear layers, activations (ReLU, GELU, SiLU, SoftMax, SwiGLU, RoSwish, GCU)
|
|
55
|
+
- **Convolutional Networks**: Conv2D, MaxPool2D, AvgPool2D, BatchNorm2D (forward and backward)
|
|
56
|
+
- **Recurrent Networks**: LSTM cells
|
|
57
|
+
- **Attention Mechanisms**: Flash Attention 2, multi-head attention, RoPE, prosody modulation
|
|
58
|
+
- **Normalization**: LayerNorm, RMSNorm, BatchNorm
|
|
59
|
+
- **Activations**: GELU, SiLU, ReLU, SoftMax, SoftPlus, SwiGLU, GEGLU, ReGLU, RoSwish, GCU
|
|
60
|
+
- **Fused Operations**: Linear+activation fusion, QKV projection, layer normalization+linear
|
|
61
|
+
|
|
62
|
+
### Spiking Neural Networks
|
|
63
|
+
- **Neuron Models**: LIF (Leaky Integrate-and-Fire), GIF (Generalized Integrate-and-Fire)
|
|
64
|
+
- **Learning**: STDP (Spike-Timing-Dependent Plasticity), Hebbian learning
|
|
65
|
+
- **Synaptic Dynamics**: Forward propagation, STDP traces, weight updates
|
|
66
|
+
- **Bridges**: Continuous-to-spike, spike-to-continuous conversion
|
|
67
|
+
- **Operations**: SNN matmul, softmax, readout, expert readout
|
|
68
|
+
|
|
69
|
+
### Memory & Retrieval
|
|
70
|
+
- **Memory Operations**: Read, write, context aggregation
|
|
71
|
+
- **Memory Injection**: Concatenation, gating, residual connections
|
|
72
|
+
- **Capsule Networks**: Capsule projection, dentate gyrus sparse expansion
|
|
73
|
+
- **FAISS Integration**: Distance computation, top-k selection, IVF filtering, quantization, k-means
|
|
74
|
+
|
|
75
|
+
### Learning Algorithms
|
|
76
|
+
- **Optimization**: Adam, natural gradients, Fisher information matrix
|
|
77
|
+
- **Continual Learning**: EWC (Elastic Weight Consolidation), Fisher penalties
|
|
78
|
+
- **Adaptive Filtering**: NLMS (Normalized Least Mean Squares), ensemble, prediction
|
|
79
|
+
- **Regularization**: Dropout, whitening transforms
|
|
80
|
+
|
|
81
|
+
### Specialized Operations
|
|
82
|
+
- **Place & Time Cells**: Spatial encoding, temporal encoding, theta-gamma oscillations
|
|
83
|
+
- **FFT**: Bit-reversal, butterfly operations, magnitude, power spectrum
|
|
84
|
+
- **Domain Adaptation**: Domain classification, routing, expert combination
|
|
85
|
+
- **Embeddings**: Lookup, position encoding, attention, FFN, pooling, normalization
|
|
86
|
+
- **Loss Functions**: Cross-entropy, BCE, contrastive loss
|
|
87
|
+
- **Semantic Encoding**: Affect MLP, affective processing
|
|
88
|
+
|
|
89
|
+
### Transformer Support
|
|
90
|
+
- **Architecture-Specific Optimizations**: BERT, GPT, T5, RoBERTa, DistilBERT, MPNet, XLM-RoBERTa, ALBERT
|
|
91
|
+
- **HuggingFace Bridge**: Load pre-trained models without PyTorch runtime
|
|
92
|
+
- **Model Components**: Multi-head attention, positional encoding, layer normalization
|
|
93
|
+
- **Fine-Tuning**: LoRA (Low-Rank Adaptation), gradient checkpointing
|
|
94
|
+
|
|
95
|
+
### LoRA Fine-Tuning
|
|
96
|
+
- Parameter-efficient fine-tuning for transformers
|
|
97
|
+
- Backward pass support for LoRA layers
|
|
98
|
+
- Memory-efficient training on 12GB VRAM
|
|
99
|
+
|
|
100
|
+
## Installation
|
|
101
|
+
|
|
102
|
+
### From PyPI (when published)
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
pip install grilly
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### From Source
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
git clone https://github.com/grillcheese-ai/grilly.git
|
|
112
|
+
cd grilly
|
|
113
|
+
make install
|
|
114
|
+
|
|
115
|
+
# Or with development dependencies
|
|
116
|
+
make install-dev
|
|
117
|
+
|
|
118
|
+
# Or manually
|
|
119
|
+
pip install -e .
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
## Requirements
|
|
123
|
+
|
|
124
|
+
- Python >= 3.10
|
|
125
|
+
- Vulkan drivers
|
|
126
|
+
- NumPy >= 1.24.0
|
|
127
|
+
- Supported GPUs: AMD (tested on RX 6750 XT), NVIDIA, Intel Arc
|
|
128
|
+
|
|
129
|
+
## Quick Start
|
|
130
|
+
|
|
131
|
+
```python
|
|
132
|
+
import grilly
|
|
133
|
+
import numpy as np
|
|
134
|
+
|
|
135
|
+
# Initialize compute backend
|
|
136
|
+
backend = grilly.Compute()
|
|
137
|
+
|
|
138
|
+
# Spiking neural network example
|
|
139
|
+
input_current = np.random.randn(1000).astype(np.float32)
|
|
140
|
+
membrane = np.zeros(1000, dtype=np.float32)
|
|
141
|
+
refractory = np.zeros(1000, dtype=np.float32)
|
|
142
|
+
|
|
143
|
+
membrane, refractory, spikes = backend.snn.lif_step(
|
|
144
|
+
input_current, membrane, refractory,
|
|
145
|
+
dt=0.001, tau_mem=20.0, v_thresh=1.0
|
|
146
|
+
)
|
|
147
|
+
|
|
148
|
+
# Feedforward network example
|
|
149
|
+
x = np.random.randn(32, 384).astype(np.float32)
|
|
150
|
+
weight = np.random.randn(384, 128).astype(np.float32)
|
|
151
|
+
bias = np.zeros(128, dtype=np.float32)
|
|
152
|
+
|
|
153
|
+
output = backend.fnn.linear(x, weight, bias)
|
|
154
|
+
activated = backend.fnn.swiglu(output)
|
|
155
|
+
|
|
156
|
+
# Flash Attention 2
|
|
157
|
+
q = np.random.randn(32, 8, 64, 64).astype(np.float32) # (batch, heads, seq, dim)
|
|
158
|
+
k = np.random.randn(32, 8, 64, 64).astype(np.float32)
|
|
159
|
+
v = np.random.randn(32, 8, 64, 64).astype(np.float32)
|
|
160
|
+
|
|
161
|
+
attention_out = backend.attention.flash_attention2(q, k, v)
|
|
162
|
+
|
|
163
|
+
# FAISS similarity search
|
|
164
|
+
query = np.random.randn(1, 384).astype(np.float32)
|
|
165
|
+
database = np.random.randn(10000, 384).astype(np.float32)
|
|
166
|
+
|
|
167
|
+
distances = backend.faiss.compute_distances(query, database)
|
|
168
|
+
top_k_distances, top_k_indices = backend.faiss.topk(distances, k=10)
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
## API Reference
|
|
172
|
+
|
|
173
|
+
### Core Interfaces
|
|
174
|
+
|
|
175
|
+
- `grilly.Compute()` - Main compute backend (alias for VulkanCompute)
|
|
176
|
+
- `grilly.SNNCompute()` - High-level spiking neural network interface
|
|
177
|
+
- `grilly.Learning()` - Learning algorithms (EWC, NLMS, etc.)
|
|
178
|
+
|
|
179
|
+
### Backend Namespaces
|
|
180
|
+
|
|
181
|
+
- `backend.snn.*` - Spiking neural network operations
|
|
182
|
+
- `backend.fnn.*` - Feedforward network operations
|
|
183
|
+
- `backend.attention.*` - Attention mechanisms
|
|
184
|
+
- `backend.memory.*` - Memory operations
|
|
185
|
+
- `backend.faiss.*` - Vector similarity search
|
|
186
|
+
- `backend.learning.*` - Learning algorithms
|
|
187
|
+
- `backend.cells.*` - Place and time cells
|
|
188
|
+
|
|
189
|
+
## Shader Statistics
|
|
190
|
+
|
|
191
|
+
- Total GLSL shaders: 137
|
|
192
|
+
- Compiled SPIR-V shaders: 138
|
|
193
|
+
- Categories: 12+ operation types
|
|
194
|
+
|
|
195
|
+
## Compiling Shaders
|
|
196
|
+
|
|
197
|
+
Shaders are pre-compiled and included. To recompile:
|
|
198
|
+
|
|
199
|
+
```bash
|
|
200
|
+
# Compile all shaders (cross-platform)
|
|
201
|
+
make compile-shaders
|
|
202
|
+
|
|
203
|
+
# Verify compilation
|
|
204
|
+
make verify-shaders
|
|
205
|
+
|
|
206
|
+
# Or manually:
|
|
207
|
+
# Windows: .\scripts\compile_all_shaders.ps1
|
|
208
|
+
# Linux/Mac: ./compile_shaders.sh
|
|
209
|
+
|
|
210
|
+
# Single shader
|
|
211
|
+
glslc shader.glsl -o spv/shader.spv
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
## GPU Selection
|
|
215
|
+
|
|
216
|
+
```bash
|
|
217
|
+
# Set GPU index (if multiple GPUs)
|
|
218
|
+
export VK_GPU_INDEX=0
|
|
219
|
+
|
|
220
|
+
# Enable debug logging
|
|
221
|
+
export GRILLY_DEBUG=1
|
|
222
|
+
|
|
223
|
+
# Allow CPU fallback
|
|
224
|
+
export ALLOW_CPU_VULKAN=1
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
## Testing
|
|
228
|
+
|
|
229
|
+
```bash
|
|
230
|
+
# All tests
|
|
231
|
+
make test
|
|
232
|
+
|
|
233
|
+
# CPU-only tests (skip GPU)
|
|
234
|
+
make test-cpu
|
|
235
|
+
|
|
236
|
+
# GPU tests only
|
|
237
|
+
make test-gpu
|
|
238
|
+
|
|
239
|
+
# With coverage report
|
|
240
|
+
make test-coverage
|
|
241
|
+
|
|
242
|
+
# Or use pytest directly
|
|
243
|
+
pytest grilly/tests/ -v
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
## Architecture
|
|
247
|
+
|
|
248
|
+
Grilly uses Vulkan compute shaders for cross-platform GPU acceleration. Each operation is implemented as a GLSL compute shader compiled to SPIR-V bytecode.
|
|
249
|
+
|
|
250
|
+
### Design Principles
|
|
251
|
+
|
|
252
|
+
- Pure Vulkan backend (no CUDA dependency)
|
|
253
|
+
- Hardware-agnostic (AMD, NVIDIA, Intel)
|
|
254
|
+
- Zero-copy GPU memory operations
|
|
255
|
+
- Minimal CPU-GPU transfers
|
|
256
|
+
- CPU fallback for unsupported operations
|
|
257
|
+
|
|
258
|
+
## Performance
|
|
259
|
+
|
|
260
|
+
Tested on AMD RX 6750 XT (12GB VRAM):
|
|
261
|
+
- LIF neuron simulation: 1M neurons at >1000 FPS
|
|
262
|
+
- Flash Attention 2: 32 batch, 8 heads, 512 seq length at ~50ms
|
|
263
|
+
- FAISS top-k: 10K vectors, 384D, k=10 at ~5ms
|
|
264
|
+
|
|
265
|
+
## Examples
|
|
266
|
+
|
|
267
|
+
See `examples/` directory for detailed usage:
|
|
268
|
+
- Transformer fine-tuning with LoRA
|
|
269
|
+
- Spiking neural network training
|
|
270
|
+
- FAISS similarity search
|
|
271
|
+
- Continual learning with EWC
|
|
272
|
+
|
|
273
|
+
## Development
|
|
274
|
+
|
|
275
|
+
### Quick Start
|
|
276
|
+
|
|
277
|
+
```bash
|
|
278
|
+
# Clone and setup
|
|
279
|
+
git clone https://github.com/grillcheese-ai/grilly.git
|
|
280
|
+
cd grilly
|
|
281
|
+
|
|
282
|
+
# Install with dev dependencies
|
|
283
|
+
make install-dev
|
|
284
|
+
|
|
285
|
+
# Run tests
|
|
286
|
+
make test
|
|
287
|
+
|
|
288
|
+
# Format code
|
|
289
|
+
make format
|
|
290
|
+
|
|
291
|
+
# Run linters
|
|
292
|
+
make lint
|
|
293
|
+
|
|
294
|
+
# Build package
|
|
295
|
+
make build
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
### Project Structure
|
|
299
|
+
|
|
300
|
+
```
|
|
301
|
+
grilly/
|
|
302
|
+
├── backend/ # Vulkan backend implementation
|
|
303
|
+
├── nn/ # High-level neural network modules
|
|
304
|
+
├── shaders/ # GLSL compute shaders
|
|
305
|
+
│ └── spv/ # Compiled SPIR-V bytecode
|
|
306
|
+
├── tests/ # Test suite
|
|
307
|
+
├── utils/ # HuggingFace bridge, utilities
|
|
308
|
+
└── Makefile # Build automation
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
### Makefile Commands
|
|
312
|
+
|
|
313
|
+
Run `make help` to see all available commands:
|
|
314
|
+
- `make install` - Install package
|
|
315
|
+
- `make test` - Run tests
|
|
316
|
+
- `make compile-shaders` - Compile shaders
|
|
317
|
+
- `make build` - Build distribution
|
|
318
|
+
- `make publish-test` - Publish to Test PyPI
|
|
319
|
+
- `make publish` - Publish to PyPI
|
|
320
|
+
- `make format` - Format code
|
|
321
|
+
- `make lint` - Run linters
|
|
322
|
+
- `make clean` - Clean build artifacts
|
|
323
|
+
|
|
324
|
+
### Contributing
|
|
325
|
+
|
|
326
|
+
1. Fork the repository
|
|
327
|
+
2. Create a feature branch
|
|
328
|
+
3. Add tests for new features
|
|
329
|
+
4. Run `make check` to verify
|
|
330
|
+
5. Submit a pull request
|
|
331
|
+
|
|
332
|
+
## License
|
|
333
|
+
|
|
334
|
+
MIT License - see LICENSE file for details.
|
|
335
|
+
|
|
336
|
+
## References
|
|
337
|
+
|
|
338
|
+
- Vulkan Compute Shaders: https://www.khronos.org/vulkan/
|
|
339
|
+
- Flash Attention 2: https://arxiv.org/abs/2307.08691
|
|
340
|
+
- STDP Learning: Bi & Poo (1998)
|
|
341
|
+
- EWC: Kirkpatrick et al. (2017)
|
|
342
|
+
- LoRA: Hu et al. (2021)
|