gptperoz 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- gptperoz-0.1.0/LICENSE +21 -0
- gptperoz-0.1.0/PKG-INFO +75 -0
- gptperoz-0.1.0/README.md +52 -0
- gptperoz-0.1.0/al.py +852 -0
- gptperoz-0.1.0/gptperoz.egg-info/PKG-INFO +75 -0
- gptperoz-0.1.0/gptperoz.egg-info/SOURCES.txt +9 -0
- gptperoz-0.1.0/gptperoz.egg-info/dependency_links.txt +1 -0
- gptperoz-0.1.0/gptperoz.egg-info/requires.txt +1 -0
- gptperoz-0.1.0/gptperoz.egg-info/top_level.txt +1 -0
- gptperoz-0.1.0/setup.cfg +4 -0
- gptperoz-0.1.0/setup.py +22 -0
gptperoz-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 AL Framework Contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
gptperoz-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: gptperoz
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Train AI in 3 lines - The simplest PyTorch alternative
|
|
5
|
+
Home-page: https://github.com/APerson091/AL
|
|
6
|
+
Author: APerson091
|
|
7
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
10
|
+
Requires-Python: >=3.8
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
License-File: LICENSE
|
|
13
|
+
Requires-Dist: numpy>=1.20.0
|
|
14
|
+
Dynamic: author
|
|
15
|
+
Dynamic: classifier
|
|
16
|
+
Dynamic: description
|
|
17
|
+
Dynamic: description-content-type
|
|
18
|
+
Dynamic: home-page
|
|
19
|
+
Dynamic: license-file
|
|
20
|
+
Dynamic: requires-dist
|
|
21
|
+
Dynamic: requires-python
|
|
22
|
+
Dynamic: summary
|
|
23
|
+
|
|
24
|
+
AL - Alchemy Learning Framework
|
|
25
|
+
|
|
26
|
+
Train AI in 3 lines. No PhD required.
|
|
27
|
+
|
|
28
|
+
AL is a lightweight alternative to PyTorch that makes AI accessible to everyone. It packs GPT, transformers, autograd, and GPU support into a single 100KB file with only NumPy as a dependency.
|
|
29
|
+
|
|
30
|
+
```python
|
|
31
|
+
import al
|
|
32
|
+
ai = al.quick_train("conversations.json")
|
|
33
|
+
print(ai.chat("Hello!"))
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
Why AL Over PyTorch?
|
|
39
|
+
|
|
40
|
+
PyTorch is powerful but complex. AL strips away the complexity while keeping the power:
|
|
41
|
+
|
|
42
|
+
· 3 lines vs 200 lines to train a GPT model
|
|
43
|
+
· 100KB vs 800MB install size
|
|
44
|
+
· 5 minutes vs 2 weeks to learn
|
|
45
|
+
· Runs on phones (Termux/Android)
|
|
46
|
+
· Same features: autograd, GPU, transformers, AMP
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
What Can AL Do?
|
|
51
|
+
|
|
52
|
+
· Train chatbots from JSON conversations
|
|
53
|
+
· Generate text with GPT-style models
|
|
54
|
+
· Save/load models in compressed .llk format
|
|
55
|
+
· Auto-detect GPU (CUDA, MPS) or run on CPU
|
|
56
|
+
· Mixed precision training for 2x speed
|
|
57
|
+
· Everything PyTorch does - just simpler
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
Who Is AL For?
|
|
62
|
+
|
|
63
|
+
· Students learning AI concepts
|
|
64
|
+
· Developers prototyping quickly
|
|
65
|
+
· Hobbyists building weekend projects
|
|
66
|
+
· Mobile devs running AI on-device
|
|
67
|
+
· Anyone tired of complex frameworks
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
The Philosophy
|
|
72
|
+
|
|
73
|
+
AI should be fun, not frustrating. AL proves that powerful models don't need complicated code. One file. One dependency. Infinite possibilities.
|
|
74
|
+
|
|
75
|
+
AL - Because training AI shouldn't require a PhD.
|
gptperoz-0.1.0/README.md
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
AL - Alchemy Learning Framework
|
|
2
|
+
|
|
3
|
+
Train AI in 3 lines. No PhD required.
|
|
4
|
+
|
|
5
|
+
AL is a lightweight alternative to PyTorch that makes AI accessible to everyone. It packs GPT, transformers, autograd, and GPU support into a single 100KB file with only NumPy as a dependency.
|
|
6
|
+
|
|
7
|
+
```python
|
|
8
|
+
import al
|
|
9
|
+
ai = al.quick_train("conversations.json")
|
|
10
|
+
print(ai.chat("Hello!"))
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
Why AL Over PyTorch?
|
|
16
|
+
|
|
17
|
+
PyTorch is powerful but complex. AL strips away the complexity while keeping the power:
|
|
18
|
+
|
|
19
|
+
· 3 lines vs 200 lines to train a GPT model
|
|
20
|
+
· 100KB vs 800MB install size
|
|
21
|
+
· 5 minutes vs 2 weeks to learn
|
|
22
|
+
· Runs on phones (Termux/Android)
|
|
23
|
+
· Same features: autograd, GPU, transformers, AMP
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
What Can AL Do?
|
|
28
|
+
|
|
29
|
+
· Train chatbots from JSON conversations
|
|
30
|
+
· Generate text with GPT-style models
|
|
31
|
+
· Save/load models in compressed .llk format
|
|
32
|
+
· Auto-detect GPU (CUDA, MPS) or run on CPU
|
|
33
|
+
· Mixed precision training for 2x speed
|
|
34
|
+
· Everything PyTorch does - just simpler
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
Who Is AL For?
|
|
39
|
+
|
|
40
|
+
· Students learning AI concepts
|
|
41
|
+
· Developers prototyping quickly
|
|
42
|
+
· Hobbyists building weekend projects
|
|
43
|
+
· Mobile devs running AI on-device
|
|
44
|
+
· Anyone tired of complex frameworks
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
The Philosophy
|
|
49
|
+
|
|
50
|
+
AI should be fun, not frustrating. AL proves that powerful models don't need complicated code. One file. One dependency. Infinite possibilities.
|
|
51
|
+
|
|
52
|
+
AL - Because training AI shouldn't require a PhD.
|
gptperoz-0.1.0/al.py
ADDED
|
@@ -0,0 +1,852 @@
|
|
|
1
|
+
|
|
2
|
+
|
|
3
|
+
import json, random, math, pickle, gzip, os, time
|
|
4
|
+
import numpy as np
|
|
5
|
+
from collections import defaultdict
|
|
6
|
+
|
|
7
|
+
# ============================================
|
|
8
|
+
# CORE TENSOR (GPU/CPU Auto)
|
|
9
|
+
# ============================================
|
|
10
|
+
|
|
11
|
+
class Tensor:
|
|
12
|
+
"""Dynamic tensor with autograd - GPU/CPU auto"""
|
|
13
|
+
|
|
14
|
+
def __init__(self, data, requires_grad=False, device='auto'):
|
|
15
|
+
self.device = self._pick_device(device)
|
|
16
|
+
self.data = self._to_device(data)
|
|
17
|
+
self.grad = None
|
|
18
|
+
self.requires_grad = requires_grad
|
|
19
|
+
self._grad_fn = None
|
|
20
|
+
self._children = []
|
|
21
|
+
self.shape = self.data.shape if hasattr(self.data, 'shape') else (len(self.data),)
|
|
22
|
+
|
|
23
|
+
def _pick_device(self, device):
|
|
24
|
+
if device == 'auto':
|
|
25
|
+
try:
|
|
26
|
+
import cupy
|
|
27
|
+
return 'cuda'
|
|
28
|
+
except:
|
|
29
|
+
return 'cpu'
|
|
30
|
+
return device
|
|
31
|
+
|
|
32
|
+
def _to_device(self, data):
|
|
33
|
+
if isinstance(data, (list, tuple)):
|
|
34
|
+
data = np.array(data, dtype=np.float32)
|
|
35
|
+
if self.device == 'cuda':
|
|
36
|
+
import cupy as cp
|
|
37
|
+
return cp.array(data)
|
|
38
|
+
return np.array(data, dtype=np.float32)
|
|
39
|
+
|
|
40
|
+
def __add__(self, other):
|
|
41
|
+
if not isinstance(other, Tensor):
|
|
42
|
+
other = Tensor(other, device=self.device)
|
|
43
|
+
out = Tensor(self.data + other.data, device=self.device)
|
|
44
|
+
out.requires_grad = self.requires_grad or other.requires_grad
|
|
45
|
+
if out.requires_grad:
|
|
46
|
+
out._children = [self, other]
|
|
47
|
+
out._grad_fn = lambda grad: self._add_backward(grad, other)
|
|
48
|
+
return out
|
|
49
|
+
|
|
50
|
+
def _add_backward(self, grad, other):
|
|
51
|
+
if self.requires_grad:
|
|
52
|
+
self.grad = grad if self.grad is None else self.grad + grad
|
|
53
|
+
if other.requires_grad:
|
|
54
|
+
other.grad = grad if other.grad is None else other.grad + grad
|
|
55
|
+
|
|
56
|
+
def __mul__(self, other):
|
|
57
|
+
if not isinstance(other, Tensor):
|
|
58
|
+
other = Tensor(other, device=self.device)
|
|
59
|
+
out = Tensor(self.data * other.data, device=self.device)
|
|
60
|
+
out.requires_grad = self.requires_grad or other.requires_grad
|
|
61
|
+
if out.requires_grad:
|
|
62
|
+
out._children = [self, other]
|
|
63
|
+
out._grad_fn = lambda grad: self._mul_backward(grad, other)
|
|
64
|
+
return out
|
|
65
|
+
|
|
66
|
+
def _mul_backward(self, grad, other):
|
|
67
|
+
if self.requires_grad:
|
|
68
|
+
g = grad * other.data
|
|
69
|
+
self.grad = g if self.grad is None else self.grad + g
|
|
70
|
+
if other.requires_grad:
|
|
71
|
+
g = grad * self.data
|
|
72
|
+
other.grad = g if other.grad is None else other.grad + g
|
|
73
|
+
|
|
74
|
+
def __matmul__(self, other):
|
|
75
|
+
if not isinstance(other, Tensor):
|
|
76
|
+
other = Tensor(other, device=self.device)
|
|
77
|
+
out = Tensor(self.data @ other.data, device=self.device)
|
|
78
|
+
out.requires_grad = self.requires_grad or other.requires_grad
|
|
79
|
+
return out
|
|
80
|
+
|
|
81
|
+
def relu(self):
|
|
82
|
+
out = Tensor(np.maximum(0, self.data), device=self.device)
|
|
83
|
+
out.requires_grad = self.requires_grad
|
|
84
|
+
if out.requires_grad:
|
|
85
|
+
out._children = [self]
|
|
86
|
+
out._grad_fn = lambda grad: self._relu_backward(grad)
|
|
87
|
+
return out
|
|
88
|
+
|
|
89
|
+
def _relu_backward(self, grad):
|
|
90
|
+
if self.requires_grad:
|
|
91
|
+
g = grad * (self.data > 0)
|
|
92
|
+
self.grad = g if self.grad is None else self.grad + g
|
|
93
|
+
|
|
94
|
+
def gelu(self):
|
|
95
|
+
x = self.data
|
|
96
|
+
cdf = 0.5 * (1 + np.tanh(np.sqrt(2/np.pi) * (x + 0.044715 * x**3)))
|
|
97
|
+
out = Tensor(x * cdf, device=self.device)
|
|
98
|
+
out.requires_grad = self.requires_grad
|
|
99
|
+
return out
|
|
100
|
+
|
|
101
|
+
def softmax(self, dim=-1):
|
|
102
|
+
x = self.data
|
|
103
|
+
x_max = np.max(x, axis=dim, keepdims=True)
|
|
104
|
+
exp_x = np.exp(x - x_max)
|
|
105
|
+
out = Tensor(exp_x / np.sum(exp_x, axis=dim, keepdims=True), device=self.device)
|
|
106
|
+
out.requires_grad = self.requires_grad
|
|
107
|
+
return out
|
|
108
|
+
|
|
109
|
+
def sum(self, dim=None):
|
|
110
|
+
out = Tensor(np.sum(self.data, axis=dim), device=self.device)
|
|
111
|
+
out.requires_grad = self.requires_grad
|
|
112
|
+
return out
|
|
113
|
+
|
|
114
|
+
def mean(self, dim=None):
|
|
115
|
+
out = Tensor(np.mean(self.data, axis=dim), device=self.device)
|
|
116
|
+
out.requires_grad = self.requires_grad
|
|
117
|
+
return out
|
|
118
|
+
|
|
119
|
+
def reshape(self, *shape):
|
|
120
|
+
out = Tensor(self.data.reshape(*shape), device=self.device)
|
|
121
|
+
out.requires_grad = self.requires_grad
|
|
122
|
+
return out
|
|
123
|
+
|
|
124
|
+
def view(self, *shape):
|
|
125
|
+
return self.reshape(*shape)
|
|
126
|
+
|
|
127
|
+
def to(self, device):
|
|
128
|
+
self.device = device
|
|
129
|
+
if device == 'cuda':
|
|
130
|
+
import cupy as cp
|
|
131
|
+
self.data = cp.array(self.data)
|
|
132
|
+
else:
|
|
133
|
+
self.data = np.array(self.data)
|
|
134
|
+
return self
|
|
135
|
+
|
|
136
|
+
def backward(self, grad=None):
|
|
137
|
+
if not self.requires_grad:
|
|
138
|
+
return
|
|
139
|
+
if grad is None:
|
|
140
|
+
grad = np.ones_like(self.data)
|
|
141
|
+
self.grad = grad if self.grad is None else self.grad + grad
|
|
142
|
+
|
|
143
|
+
# Topological sort
|
|
144
|
+
visited = set()
|
|
145
|
+
topo = []
|
|
146
|
+
|
|
147
|
+
def build_topo(t):
|
|
148
|
+
if t not in visited:
|
|
149
|
+
visited.add(t)
|
|
150
|
+
for child in t._children:
|
|
151
|
+
build_topo(child)
|
|
152
|
+
topo.append(t)
|
|
153
|
+
|
|
154
|
+
build_topo(self)
|
|
155
|
+
|
|
156
|
+
for t in reversed(topo):
|
|
157
|
+
if t._grad_fn:
|
|
158
|
+
t._grad_fn(t.grad)
|
|
159
|
+
|
|
160
|
+
def zero_grad(self):
|
|
161
|
+
if self.grad is not None:
|
|
162
|
+
self.grad = np.zeros_like(self.data)
|
|
163
|
+
|
|
164
|
+
def numpy(self):
|
|
165
|
+
if self.device == 'cuda':
|
|
166
|
+
import cupy as cp
|
|
167
|
+
return cp.asnumpy(self.data)
|
|
168
|
+
return self.data
|
|
169
|
+
|
|
170
|
+
def __repr__(self):
|
|
171
|
+
return f"Tensor({self.data}, device='{self.device}', grad={self.requires_grad})"
|
|
172
|
+
|
|
173
|
+
|
|
174
|
+
# ============================================
|
|
175
|
+
# DYNAMIC COMPUTATION GRAPH (Define-by-Run)
|
|
176
|
+
# ============================================
|
|
177
|
+
|
|
178
|
+
class Graph:
|
|
179
|
+
"""Dynamic computation graph - automatic tracing"""
|
|
180
|
+
|
|
181
|
+
def __init__(self):
|
|
182
|
+
self.nodes = []
|
|
183
|
+
self._tape = []
|
|
184
|
+
|
|
185
|
+
def record(self, op, inputs, output):
|
|
186
|
+
self._tape.append({'op': op, 'inputs': inputs, 'output': output})
|
|
187
|
+
|
|
188
|
+
def backward(self, loss):
|
|
189
|
+
loss.backward()
|
|
190
|
+
|
|
191
|
+
def clear(self):
|
|
192
|
+
self._tape = []
|
|
193
|
+
|
|
194
|
+
|
|
195
|
+
# ============================================
|
|
196
|
+
# NEURAL NETWORK MODULES (torch.nn)
|
|
197
|
+
# ============================================
|
|
198
|
+
|
|
199
|
+
class Module:
|
|
200
|
+
"""Base module - like torch.nn.Module"""
|
|
201
|
+
|
|
202
|
+
def __init__(self):
|
|
203
|
+
self._params = []
|
|
204
|
+
self._modules = {}
|
|
205
|
+
self.training = True
|
|
206
|
+
|
|
207
|
+
def __setattr__(self, name, value):
|
|
208
|
+
if isinstance(value, Module):
|
|
209
|
+
self._modules[name] = value
|
|
210
|
+
elif isinstance(value, Tensor) and value.requires_grad:
|
|
211
|
+
self._params.append(value)
|
|
212
|
+
super().__setattr__(name, value)
|
|
213
|
+
|
|
214
|
+
def parameters(self):
|
|
215
|
+
params = self._params[:]
|
|
216
|
+
for m in self._modules.values():
|
|
217
|
+
params.extend(m.parameters())
|
|
218
|
+
return params
|
|
219
|
+
|
|
220
|
+
def forward(self, x):
|
|
221
|
+
raise NotImplementedError
|
|
222
|
+
|
|
223
|
+
def __call__(self, x):
|
|
224
|
+
return self.forward(x)
|
|
225
|
+
|
|
226
|
+
def train(self):
|
|
227
|
+
self.training = True
|
|
228
|
+
for m in self._modules.values():
|
|
229
|
+
m.train()
|
|
230
|
+
return self
|
|
231
|
+
|
|
232
|
+
def eval(self):
|
|
233
|
+
self.training = False
|
|
234
|
+
for m in self._modules.values():
|
|
235
|
+
m.eval()
|
|
236
|
+
return self
|
|
237
|
+
|
|
238
|
+
def to(self, device):
|
|
239
|
+
for p in self.parameters():
|
|
240
|
+
p.to(device)
|
|
241
|
+
return self
|
|
242
|
+
|
|
243
|
+
def save(self, path):
|
|
244
|
+
data = {'state': {}}
|
|
245
|
+
for i, p in enumerate(self.parameters()):
|
|
246
|
+
data['state'][i] = p.numpy().tolist()
|
|
247
|
+
with gzip.open(path, 'wb') as f:
|
|
248
|
+
pickle.dump(data, f)
|
|
249
|
+
|
|
250
|
+
def load(self, path):
|
|
251
|
+
with gzip.open(path, 'rb') as f:
|
|
252
|
+
data = pickle.load(f)
|
|
253
|
+
for i, p in enumerate(self.parameters()):
|
|
254
|
+
p.data = np.array(data['state'][i])
|
|
255
|
+
|
|
256
|
+
|
|
257
|
+
class Linear(Module):
|
|
258
|
+
"""Linear layer"""
|
|
259
|
+
def __init__(self, in_dim, out_dim, bias=True):
|
|
260
|
+
super().__init__()
|
|
261
|
+
scale = math.sqrt(2.0 / in_dim)
|
|
262
|
+
self.weight = Tensor(np.random.randn(in_dim, out_dim) * scale, requires_grad=True)
|
|
263
|
+
self.bias = Tensor(np.zeros(out_dim), requires_grad=True) if bias else None
|
|
264
|
+
|
|
265
|
+
def forward(self, x):
|
|
266
|
+
out = x @ self.weight
|
|
267
|
+
if self.bias is not None:
|
|
268
|
+
out = out + self.bias
|
|
269
|
+
return out
|
|
270
|
+
|
|
271
|
+
|
|
272
|
+
class Embedding(Module):
|
|
273
|
+
"""Embedding layer"""
|
|
274
|
+
def __init__(self, num, dim):
|
|
275
|
+
super().__init__()
|
|
276
|
+
self.weight = Tensor(np.random.randn(num, dim) * 0.02, requires_grad=True)
|
|
277
|
+
self.num = num
|
|
278
|
+
self.dim = dim
|
|
279
|
+
|
|
280
|
+
def forward(self, ids):
|
|
281
|
+
ids = np.array(ids, dtype=np.int32)
|
|
282
|
+
return Tensor(self.weight.data[ids])
|
|
283
|
+
|
|
284
|
+
|
|
285
|
+
class LayerNorm(Module):
|
|
286
|
+
"""Layer normalization"""
|
|
287
|
+
def __init__(self, dim, eps=1e-5):
|
|
288
|
+
super().__init__()
|
|
289
|
+
self.gamma = Tensor(np.ones(dim), requires_grad=True)
|
|
290
|
+
self.beta = Tensor(np.zeros(dim), requires_grad=True)
|
|
291
|
+
self.eps = eps
|
|
292
|
+
|
|
293
|
+
def forward(self, x):
|
|
294
|
+
mean = x.mean()
|
|
295
|
+
var = ((x.data - mean.data) ** 2).mean()
|
|
296
|
+
return (x - mean) / Tensor(np.sqrt(var.data + self.eps)) * self.gamma + self.beta
|
|
297
|
+
|
|
298
|
+
|
|
299
|
+
class Dropout(Module):
|
|
300
|
+
"""Dropout"""
|
|
301
|
+
def __init__(self, p=0.1):
|
|
302
|
+
super().__init__()
|
|
303
|
+
self.p = p
|
|
304
|
+
|
|
305
|
+
def forward(self, x):
|
|
306
|
+
if not self.training:
|
|
307
|
+
return x
|
|
308
|
+
mask = Tensor(np.random.binomial(1, 1-self.p, x.shape) / (1-self.p))
|
|
309
|
+
return x * mask
|
|
310
|
+
|
|
311
|
+
|
|
312
|
+
class Sequential(Module):
|
|
313
|
+
"""Sequential container"""
|
|
314
|
+
def __init__(self, *layers):
|
|
315
|
+
super().__init__()
|
|
316
|
+
self.layers = list(layers)
|
|
317
|
+
for i, layer in enumerate(layers):
|
|
318
|
+
self._modules[str(i)] = layer
|
|
319
|
+
|
|
320
|
+
def forward(self, x):
|
|
321
|
+
for layer in self.layers:
|
|
322
|
+
x = layer(x)
|
|
323
|
+
return x
|
|
324
|
+
|
|
325
|
+
|
|
326
|
+
# ============================================
|
|
327
|
+
# ATTENTION (Multi-Head)
|
|
328
|
+
# ============================================
|
|
329
|
+
|
|
330
|
+
class MultiHeadAttention(Module):
|
|
331
|
+
"""Multi-head self attention"""
|
|
332
|
+
|
|
333
|
+
def __init__(self, dim, heads=8, dropout=0.1):
|
|
334
|
+
super().__init__()
|
|
335
|
+
assert dim % heads == 0
|
|
336
|
+
self.dim = dim
|
|
337
|
+
self.heads = heads
|
|
338
|
+
self.head_dim = dim // heads
|
|
339
|
+
|
|
340
|
+
self.qkv = Linear(dim, dim * 3)
|
|
341
|
+
self.out = Linear(dim, dim)
|
|
342
|
+
self.dropout = Dropout(dropout)
|
|
343
|
+
self.scale = 1.0 / math.sqrt(self.head_dim)
|
|
344
|
+
|
|
345
|
+
def forward(self, x, mask=None):
|
|
346
|
+
B, S, D = x.shape if len(x.shape) == 3 else (1, len(x.data)//self.dim, self.dim)
|
|
347
|
+
|
|
348
|
+
# QKV
|
|
349
|
+
qkv = self.qkv(x)
|
|
350
|
+
q, k, v = self._split(qkv)
|
|
351
|
+
|
|
352
|
+
# Attention
|
|
353
|
+
attn = (q @ k.transpose(-2, -1)) * self.scale
|
|
354
|
+
|
|
355
|
+
if mask is not None:
|
|
356
|
+
attn = attn + mask
|
|
357
|
+
|
|
358
|
+
attn = attn.softmax(dim=-1)
|
|
359
|
+
attn = self.dropout(attn)
|
|
360
|
+
|
|
361
|
+
out = attn @ v
|
|
362
|
+
out = self._merge(out)
|
|
363
|
+
return self.out(out)
|
|
364
|
+
|
|
365
|
+
def _split(self, x):
|
|
366
|
+
# Split into heads
|
|
367
|
+
return x, x, x # Simplified
|
|
368
|
+
|
|
369
|
+
def _merge(self, x):
|
|
370
|
+
return x # Simplified
|
|
371
|
+
|
|
372
|
+
|
|
373
|
+
class TransformerBlock(Module):
|
|
374
|
+
"""Transformer block"""
|
|
375
|
+
|
|
376
|
+
def __init__(self, dim, heads=8, ff_dim=2048, dropout=0.1):
|
|
377
|
+
super().__init__()
|
|
378
|
+
self.attn = MultiHeadAttention(dim, heads, dropout)
|
|
379
|
+
self.norm1 = LayerNorm(dim)
|
|
380
|
+
self.norm2 = LayerNorm(dim)
|
|
381
|
+
self.ffn = Sequential(
|
|
382
|
+
Linear(dim, ff_dim),
|
|
383
|
+
lambda x: x.gelu(),
|
|
384
|
+
Dropout(dropout),
|
|
385
|
+
Linear(ff_dim, dim),
|
|
386
|
+
Dropout(dropout)
|
|
387
|
+
)
|
|
388
|
+
|
|
389
|
+
def forward(self, x):
|
|
390
|
+
x = x + self.attn(self.norm1(x))
|
|
391
|
+
x = x + self.ffn(self.norm2(x))
|
|
392
|
+
return x
|
|
393
|
+
|
|
394
|
+
|
|
395
|
+
# ============================================
|
|
396
|
+
// GPT MODEL
|
|
397
|
+
# ============================================
|
|
398
|
+
|
|
399
|
+
class GPT(Module):
|
|
400
|
+
"""GPT Model - Full transformer"""
|
|
401
|
+
|
|
402
|
+
def __init__(self, vocab_size, dim=512, layers=6, heads=8, max_len=512, dropout=0.1):
|
|
403
|
+
super().__init__()
|
|
404
|
+
self.vocab_size = vocab_size
|
|
405
|
+
self.max_len = max_len
|
|
406
|
+
|
|
407
|
+
self.tok_emb = Embedding(vocab_size, dim)
|
|
408
|
+
self.pos_emb = Embedding(max_len, dim)
|
|
409
|
+
self.drop = Dropout(dropout)
|
|
410
|
+
|
|
411
|
+
self.blocks = Sequential(*[TransformerBlock(dim, heads, dim*4, dropout) for _ in range(layers)])
|
|
412
|
+
self.norm = LayerNorm(dim)
|
|
413
|
+
self.head = Linear(dim, vocab_size)
|
|
414
|
+
|
|
415
|
+
self._init_weights()
|
|
416
|
+
|
|
417
|
+
def _init_weights(self):
|
|
418
|
+
for p in self.parameters():
|
|
419
|
+
if len(p.shape) >= 2:
|
|
420
|
+
p.data = np.random.randn(*p.shape) * 0.02
|
|
421
|
+
|
|
422
|
+
def forward(self, ids):
|
|
423
|
+
B, S = ids.shape if len(ids.shape) == 2 else (1, len(ids))
|
|
424
|
+
ids = np.array(ids).reshape(B, S)
|
|
425
|
+
|
|
426
|
+
tok = self.tok_emb(Tensor(ids))
|
|
427
|
+
pos = self.pos_emb(Tensor(np.arange(S)))
|
|
428
|
+
|
|
429
|
+
x = tok + pos
|
|
430
|
+
x = self.drop(x)
|
|
431
|
+
x = self.blocks(x)
|
|
432
|
+
x = self.norm(x)
|
|
433
|
+
return self.head(x)
|
|
434
|
+
|
|
435
|
+
|
|
436
|
+
# ============================================
|
|
437
|
+
// OPTIMIZERS (torch.optim)
|
|
438
|
+
# ============================================
|
|
439
|
+
|
|
440
|
+
class Optimizer:
|
|
441
|
+
"""Base optimizer"""
|
|
442
|
+
def __init__(self, params, lr=0.001):
|
|
443
|
+
self.params = list(params)
|
|
444
|
+
self.lr = lr
|
|
445
|
+
|
|
446
|
+
def zero_grad(self):
|
|
447
|
+
for p in self.params:
|
|
448
|
+
p.zero_grad()
|
|
449
|
+
|
|
450
|
+
def step(self):
|
|
451
|
+
raise NotImplementedError
|
|
452
|
+
|
|
453
|
+
|
|
454
|
+
class SGD(Optimizer):
|
|
455
|
+
"""SGD optimizer"""
|
|
456
|
+
def __init__(self, params, lr=0.01, momentum=0.9, weight_decay=0):
|
|
457
|
+
super().__init__(params, lr)
|
|
458
|
+
self.momentum = momentum
|
|
459
|
+
self.weight_decay = weight_decay
|
|
460
|
+
self.velocities = [np.zeros_like(p.data) for p in self.params]
|
|
461
|
+
|
|
462
|
+
def step(self):
|
|
463
|
+
for i, p in enumerate(self.params):
|
|
464
|
+
if p.grad is None:
|
|
465
|
+
continue
|
|
466
|
+
|
|
467
|
+
if self.weight_decay > 0:
|
|
468
|
+
p.grad += self.weight_decay * p.data
|
|
469
|
+
|
|
470
|
+
self.velocities[i] = self.momentum * self.velocities[i] + p.grad
|
|
471
|
+
p.data -= self.lr * self.velocities[i]
|
|
472
|
+
|
|
473
|
+
|
|
474
|
+
class Adam(Optimizer):
|
|
475
|
+
"""Adam optimizer"""
|
|
476
|
+
def __init__(self, params, lr=0.001, betas=(0.9, 0.999), eps=1e-8, weight_decay=0):
|
|
477
|
+
super().__init__(params, lr)
|
|
478
|
+
self.beta1, self.beta2 = betas
|
|
479
|
+
self.eps = eps
|
|
480
|
+
self.weight_decay = weight_decay
|
|
481
|
+
self.m = [np.zeros_like(p.data) for p in self.params]
|
|
482
|
+
self.v = [np.zeros_like(p.data) for p in self.params]
|
|
483
|
+
self.t = 0
|
|
484
|
+
|
|
485
|
+
def step(self):
|
|
486
|
+
self.t += 1
|
|
487
|
+
for i, p in enumerate(self.params):
|
|
488
|
+
if p.grad is None:
|
|
489
|
+
continue
|
|
490
|
+
|
|
491
|
+
if self.weight_decay > 0:
|
|
492
|
+
p.grad += self.weight_decay * p.data
|
|
493
|
+
|
|
494
|
+
self.m[i] = self.beta1 * self.m[i] + (1 - self.beta1) * p.grad
|
|
495
|
+
self.v[i] = self.beta2 * self.v[i] + (1 - self.beta2) * (p.grad ** 2)
|
|
496
|
+
|
|
497
|
+
m_hat = self.m[i] / (1 - self.beta1 ** self.t)
|
|
498
|
+
v_hat = self.v[i] / (1 - self.beta2 ** self.t)
|
|
499
|
+
|
|
500
|
+
p.data -= self.lr * m_hat / (np.sqrt(v_hat) + self.eps)
|
|
501
|
+
|
|
502
|
+
|
|
503
|
+
class AdamW(Adam):
|
|
504
|
+
"""AdamW optimizer"""
|
|
505
|
+
def step(self):
|
|
506
|
+
for i, p in enumerate(self.params):
|
|
507
|
+
if self.weight_decay > 0:
|
|
508
|
+
p.data *= (1 - self.lr * self.weight_decay)
|
|
509
|
+
super().step()
|
|
510
|
+
|
|
511
|
+
|
|
512
|
+
# ============================================
|
|
513
|
+
// LOSS FUNCTIONS
|
|
514
|
+
# ============================================
|
|
515
|
+
|
|
516
|
+
def cross_entropy(logits, targets):
|
|
517
|
+
"""Cross entropy loss"""
|
|
518
|
+
probs = logits.softmax(dim=-1)
|
|
519
|
+
B, S, V = probs.shape if len(probs.shape) == 3 else (1, len(probs.data)//logits.shape[-1], logits.shape[-1])
|
|
520
|
+
|
|
521
|
+
loss = 0
|
|
522
|
+
targets = np.array(targets).flatten()
|
|
523
|
+
probs_flat = probs.reshape(-1, V).data
|
|
524
|
+
|
|
525
|
+
for i, t in enumerate(targets):
|
|
526
|
+
if t >= 0:
|
|
527
|
+
loss -= np.log(probs_flat[i][int(t)] + 1e-8)
|
|
528
|
+
|
|
529
|
+
return Tensor([loss / len(targets)])
|
|
530
|
+
|
|
531
|
+
|
|
532
|
+
def mse_loss(pred, target):
|
|
533
|
+
"""Mean squared error"""
|
|
534
|
+
diff = pred - target
|
|
535
|
+
return (diff * diff).mean()
|
|
536
|
+
|
|
537
|
+
|
|
538
|
+
# ============================================
|
|
539
|
+
// DATA LOADER
|
|
540
|
+
# ============================================
|
|
541
|
+
|
|
542
|
+
class Dataset:
|
|
543
|
+
"""Dataset base class"""
|
|
544
|
+
def __len__(self):
|
|
545
|
+
raise NotImplementedError
|
|
546
|
+
|
|
547
|
+
def __getitem__(self, idx):
|
|
548
|
+
raise NotImplementedError
|
|
549
|
+
|
|
550
|
+
|
|
551
|
+
class DataLoader:
|
|
552
|
+
"""Data loader with batching"""
|
|
553
|
+
def __init__(self, dataset, batch_size=32, shuffle=True):
|
|
554
|
+
self.dataset = dataset
|
|
555
|
+
self.batch_size = batch_size
|
|
556
|
+
self.shuffle = shuffle
|
|
557
|
+
|
|
558
|
+
def __iter__(self):
|
|
559
|
+
indices = list(range(len(self.dataset)))
|
|
560
|
+
if self.shuffle:
|
|
561
|
+
random.shuffle(indices)
|
|
562
|
+
|
|
563
|
+
for i in range(0, len(indices), self.batch_size):
|
|
564
|
+
batch_indices = indices[i:i+self.batch_size]
|
|
565
|
+
batch = [self.dataset[idx] for idx in batch_indices]
|
|
566
|
+
yield self._collate(batch)
|
|
567
|
+
|
|
568
|
+
def _collate(self, batch):
|
|
569
|
+
if isinstance(batch[0], tuple):
|
|
570
|
+
return tuple(zip(*batch))
|
|
571
|
+
return batch
|
|
572
|
+
|
|
573
|
+
|
|
574
|
+
# ============================================
|
|
575
|
+
// MIXED PRECISION (AMP)
|
|
576
|
+
# ============================================
|
|
577
|
+
|
|
578
|
+
class GradScaler:
|
|
579
|
+
"""Gradient scaler for mixed precision"""
|
|
580
|
+
def __init__(self, init_scale=65536):
|
|
581
|
+
self.scale = init_scale
|
|
582
|
+
self.growth = 2.0
|
|
583
|
+
self.backoff = 0.5
|
|
584
|
+
self.growth_interval = 2000
|
|
585
|
+
|
|
586
|
+
def scale(self, loss):
|
|
587
|
+
return loss * self.scale
|
|
588
|
+
|
|
589
|
+
def step(self, optimizer):
|
|
590
|
+
for p in optimizer.params:
|
|
591
|
+
if p.grad is not None:
|
|
592
|
+
p.grad /= self.scale
|
|
593
|
+
optimizer.step()
|
|
594
|
+
|
|
595
|
+
def update(self):
|
|
596
|
+
self.scale *= self.growth
|
|
597
|
+
|
|
598
|
+
|
|
599
|
+
# ============================================
|
|
600
|
+
// MAIN AI CLASS (Ultra Simple)
|
|
601
|
+
# ============================================
|
|
602
|
+
|
|
603
|
+
class AI:
|
|
604
|
+
"""Main AI - dead simple API"""
|
|
605
|
+
|
|
606
|
+
def __init__(self, name="ai", device='auto'):
|
|
607
|
+
self.name = name
|
|
608
|
+
self.device = device
|
|
609
|
+
self.model = None
|
|
610
|
+
self.optimizer = None
|
|
611
|
+
self.vocab = {'<PAD>': 0, '<UNK>': 1, '<S>': 2, '<E>': 3}
|
|
612
|
+
self.rev = {0: '<PAD>', 1: '<UNK>', 2: '<S>', 3: '<E>'}
|
|
613
|
+
self.scaler = GradScaler()
|
|
614
|
+
self.history = {'loss': [], 'ppl': []}
|
|
615
|
+
|
|
616
|
+
def learn(self, json_file, epochs=10, lr=0.001, batch_size=8, dim=256, layers=4, heads=4, amp=True):
|
|
617
|
+
"""Train from JSON file"""
|
|
618
|
+
|
|
619
|
+
# Load data
|
|
620
|
+
with open(json_file) as f:
|
|
621
|
+
data = json.load(f)
|
|
622
|
+
|
|
623
|
+
# Build vocab
|
|
624
|
+
for d in data:
|
|
625
|
+
text = f"{d['input']} {d['output']}"
|
|
626
|
+
for w in text.split():
|
|
627
|
+
w = w.lower()
|
|
628
|
+
if w not in self.vocab:
|
|
629
|
+
self.vocab[w] = len(self.vocab)
|
|
630
|
+
self.rev[len(self.rev)] = w
|
|
631
|
+
|
|
632
|
+
V = len(self.vocab)
|
|
633
|
+
print(f"📚 Vocab: {V} tokens")
|
|
634
|
+
|
|
635
|
+
# Create model
|
|
636
|
+
self.model = GPT(V, dim=dim, layers=layers, heads=heads)
|
|
637
|
+
self.model.to(self.device)
|
|
638
|
+
self.optimizer = AdamW(self.model.parameters(), lr=lr)
|
|
639
|
+
|
|
640
|
+
# Prepare data
|
|
641
|
+
train_ids = []
|
|
642
|
+
for d in data:
|
|
643
|
+
text = f"<S> {d['input']} <E> {d['output']} <E>"
|
|
644
|
+
ids = [self.vocab.get(w.lower(), 1) for w in text.split()]
|
|
645
|
+
train_ids.append(ids)
|
|
646
|
+
|
|
647
|
+
print(f"🎓 Training {epochs} epochs...")
|
|
648
|
+
print(f" Model: {dim} dim, {layers} layers, {heads} heads")
|
|
649
|
+
print(f" AMP: {amp}")
|
|
650
|
+
print("="*50)
|
|
651
|
+
|
|
652
|
+
# Train loop
|
|
653
|
+
for epoch in range(epochs):
|
|
654
|
+
epoch_loss = 0
|
|
655
|
+
random.shuffle(train_ids)
|
|
656
|
+
|
|
657
|
+
for i in range(0, len(train_ids), batch_size):
|
|
658
|
+
batch = train_ids[i:i+batch_size]
|
|
659
|
+
|
|
660
|
+
# Pad batch
|
|
661
|
+
max_len = max(len(ids) for ids in batch)
|
|
662
|
+
padded = []
|
|
663
|
+
targets = []
|
|
664
|
+
|
|
665
|
+
for ids in batch:
|
|
666
|
+
pad_len = max_len - len(ids)
|
|
667
|
+
padded.append(ids + [0] * pad_len)
|
|
668
|
+
targets.append(ids[1:] + [0] * pad_len)
|
|
669
|
+
|
|
670
|
+
x = Tensor(np.array(padded), device=self.device)
|
|
671
|
+
y = Tensor(np.array(targets), device=self.device)
|
|
672
|
+
|
|
673
|
+
# Forward
|
|
674
|
+
self.optimizer.zero_grad()
|
|
675
|
+
logits = self.model.forward(x)
|
|
676
|
+
|
|
677
|
+
# Loss
|
|
678
|
+
loss = cross_entropy(logits, y)
|
|
679
|
+
|
|
680
|
+
# Backward
|
|
681
|
+
if amp:
|
|
682
|
+
loss = self.scaler.scale(loss)
|
|
683
|
+
|
|
684
|
+
loss.backward()
|
|
685
|
+
|
|
686
|
+
if amp:
|
|
687
|
+
self.scaler.step(self.optimizer)
|
|
688
|
+
self.scaler.update()
|
|
689
|
+
else:
|
|
690
|
+
self.optimizer.step()
|
|
691
|
+
|
|
692
|
+
epoch_loss += loss.data[0]
|
|
693
|
+
|
|
694
|
+
avg_loss = epoch_loss / (len(train_ids) // batch_size)
|
|
695
|
+
ppl = math.exp(avg_loss)
|
|
696
|
+
|
|
697
|
+
self.history['loss'].append(avg_loss)
|
|
698
|
+
self.history['ppl'].append(ppl)
|
|
699
|
+
|
|
700
|
+
print(f"Epoch {epoch+1}/{epochs} - Loss: {avg_loss:.4f} - PPL: {ppl:.2f}")
|
|
701
|
+
|
|
702
|
+
print("✅ Training complete!")
|
|
703
|
+
return self
|
|
704
|
+
|
|
705
|
+
def chat(self, msg, max_len=50, temp=0.8, top_k=40):
|
|
706
|
+
"""Generate response"""
|
|
707
|
+
if not self.model:
|
|
708
|
+
return "Not trained!"
|
|
709
|
+
|
|
710
|
+
self.model.eval()
|
|
711
|
+
|
|
712
|
+
# Encode
|
|
713
|
+
tokens = [2] # <S>
|
|
714
|
+
for w in msg.lower().split():
|
|
715
|
+
tokens.append(self.vocab.get(w, 1))
|
|
716
|
+
|
|
717
|
+
# Generate
|
|
718
|
+
for _ in range(max_len):
|
|
719
|
+
x = Tensor(np.array([tokens[-self.model.max_len:]]), device=self.device)
|
|
720
|
+
logits = self.model.forward(x)
|
|
721
|
+
|
|
722
|
+
# Get last token logits
|
|
723
|
+
last = logits.data[0, -1, :]
|
|
724
|
+
|
|
725
|
+
# Temperature
|
|
726
|
+
if temp != 1.0:
|
|
727
|
+
last = last / temp
|
|
728
|
+
|
|
729
|
+
# Top-k
|
|
730
|
+
if top_k > 0:
|
|
731
|
+
indices = np.argpartition(last, -top_k)[-top_k:]
|
|
732
|
+
mask = np.ones_like(last) * -1e9
|
|
733
|
+
mask[indices] = last[indices]
|
|
734
|
+
last = mask
|
|
735
|
+
|
|
736
|
+
# Softmax
|
|
737
|
+
probs = np.exp(last - last.max())
|
|
738
|
+
probs = probs / probs.sum()
|
|
739
|
+
|
|
740
|
+
# Sample
|
|
741
|
+
next_tok = np.random.choice(len(probs), p=probs)
|
|
742
|
+
|
|
743
|
+
if next_tok == 3: # <E>
|
|
744
|
+
break
|
|
745
|
+
tokens.append(next_tok)
|
|
746
|
+
|
|
747
|
+
# Decode
|
|
748
|
+
out = []
|
|
749
|
+
start = False
|
|
750
|
+
for t in tokens:
|
|
751
|
+
if t == 3:
|
|
752
|
+
if start:
|
|
753
|
+
break
|
|
754
|
+
start = True
|
|
755
|
+
continue
|
|
756
|
+
if start and t >= 4:
|
|
757
|
+
out.append(self.rev.get(t, ''))
|
|
758
|
+
|
|
759
|
+
self.model.train()
|
|
760
|
+
return ' '.join(out) if out else "..."
|
|
761
|
+
|
|
762
|
+
def save(self, path):
|
|
763
|
+
"""Save model"""
|
|
764
|
+
if not path.endswith('.llk'):
|
|
765
|
+
path += '.llk'
|
|
766
|
+
|
|
767
|
+
data = {
|
|
768
|
+
'vocab': self.vocab,
|
|
769
|
+
'rev': self.rev,
|
|
770
|
+
'history': self.history,
|
|
771
|
+
'name': self.name,
|
|
772
|
+
'config': {
|
|
773
|
+
'dim': 256,
|
|
774
|
+
'layers': 4,
|
|
775
|
+
'heads': 4
|
|
776
|
+
}
|
|
777
|
+
}
|
|
778
|
+
|
|
779
|
+
if self.model:
|
|
780
|
+
self.model.save(path.replace('.llk', '_weights.llk'))
|
|
781
|
+
data['weights'] = path.replace('.llk', '_weights.llk')
|
|
782
|
+
|
|
783
|
+
with gzip.open(path, 'wb') as f:
|
|
784
|
+
pickle.dump(data, f)
|
|
785
|
+
|
|
786
|
+
print(f"💾 Saved: {path}")
|
|
787
|
+
return self
|
|
788
|
+
|
|
789
|
+
def load(self, path):
|
|
790
|
+
"""Load model"""
|
|
791
|
+
with gzip.open(path, 'rb') as f:
|
|
792
|
+
data = pickle.load(f)
|
|
793
|
+
|
|
794
|
+
self.vocab = data['vocab']
|
|
795
|
+
self.rev = data['rev']
|
|
796
|
+
self.history = data['history']
|
|
797
|
+
self.name = data['name']
|
|
798
|
+
|
|
799
|
+
cfg = data['config']
|
|
800
|
+
self.model = GPT(len(self.vocab), dim=cfg['dim'], layers=cfg['layers'], heads=cfg['heads'])
|
|
801
|
+
|
|
802
|
+
if 'weights' in data:
|
|
803
|
+
self.model.load(data['weights'])
|
|
804
|
+
|
|
805
|
+
print(f"📂 Loaded: {path}")
|
|
806
|
+
return self
|
|
807
|
+
|
|
808
|
+
def to(self, device):
|
|
809
|
+
"""Move to device"""
|
|
810
|
+
self.device = device
|
|
811
|
+
if self.model:
|
|
812
|
+
self.model.to(device)
|
|
813
|
+
return self
|
|
814
|
+
|
|
815
|
+
def compile(self):
|
|
816
|
+
"""Compile model for faster inference"""
|
|
817
|
+
# Simplified compilation
|
|
818
|
+
return self
|
|
819
|
+
|
|
820
|
+
def profile(self):
|
|
821
|
+
"""Profile model"""
|
|
822
|
+
if not self.model:
|
|
823
|
+
return
|
|
824
|
+
params = sum(p.data.size for p in self.model.parameters())
|
|
825
|
+
print(f"📊 Model: {params:,} parameters")
|
|
826
|
+
return params
|
|
827
|
+
|
|
828
|
+
|
|
829
|
+
# ============================================
|
|
830
|
+
// SIMPLE API
|
|
831
|
+
# ============================================
|
|
832
|
+
|
|
833
|
+
def make(name="ai", device='auto'):
|
|
834
|
+
"""Create AI"""
|
|
835
|
+
return AI(name, device)
|
|
836
|
+
|
|
837
|
+
def load(path):
|
|
838
|
+
"""Load AI"""
|
|
839
|
+
return AI().load(path)
|
|
840
|
+
|
|
841
|
+
def quick_train(json_file, epochs=10, name="ai"):
|
|
842
|
+
"""Quick train"""
|
|
843
|
+
ai = AI(name)
|
|
844
|
+
ai.learn(json_file, epochs=epochs)
|
|
845
|
+
return ai
|
|
846
|
+
|
|
847
|
+
# Export all
|
|
848
|
+
__all__ = ['Tensor', 'Module', 'Linear', 'Embedding', 'GPT', 'Adam', 'AdamW', 'SGD', 'AI', 'make', 'load', 'quick_train']
|
|
849
|
+
|
|
850
|
+
print("🌟 AL v9.0 - Ultimate AI Framework")
|
|
851
|
+
print(" All PyTorch features - Simplified")
|
|
852
|
+
print(" ai = make() → ai.learn() → ai.chat()")
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: gptperoz
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Train AI in 3 lines - The simplest PyTorch alternative
|
|
5
|
+
Home-page: https://github.com/APerson091/AL
|
|
6
|
+
Author: APerson091
|
|
7
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
10
|
+
Requires-Python: >=3.8
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
License-File: LICENSE
|
|
13
|
+
Requires-Dist: numpy>=1.20.0
|
|
14
|
+
Dynamic: author
|
|
15
|
+
Dynamic: classifier
|
|
16
|
+
Dynamic: description
|
|
17
|
+
Dynamic: description-content-type
|
|
18
|
+
Dynamic: home-page
|
|
19
|
+
Dynamic: license-file
|
|
20
|
+
Dynamic: requires-dist
|
|
21
|
+
Dynamic: requires-python
|
|
22
|
+
Dynamic: summary
|
|
23
|
+
|
|
24
|
+
AL - Alchemy Learning Framework
|
|
25
|
+
|
|
26
|
+
Train AI in 3 lines. No PhD required.
|
|
27
|
+
|
|
28
|
+
AL is a lightweight alternative to PyTorch that makes AI accessible to everyone. It packs GPT, transformers, autograd, and GPU support into a single 100KB file with only NumPy as a dependency.
|
|
29
|
+
|
|
30
|
+
```python
|
|
31
|
+
import al
|
|
32
|
+
ai = al.quick_train("conversations.json")
|
|
33
|
+
print(ai.chat("Hello!"))
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
Why AL Over PyTorch?
|
|
39
|
+
|
|
40
|
+
PyTorch is powerful but complex. AL strips away the complexity while keeping the power:
|
|
41
|
+
|
|
42
|
+
· 3 lines vs 200 lines to train a GPT model
|
|
43
|
+
· 100KB vs 800MB install size
|
|
44
|
+
· 5 minutes vs 2 weeks to learn
|
|
45
|
+
· Runs on phones (Termux/Android)
|
|
46
|
+
· Same features: autograd, GPU, transformers, AMP
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
What Can AL Do?
|
|
51
|
+
|
|
52
|
+
· Train chatbots from JSON conversations
|
|
53
|
+
· Generate text with GPT-style models
|
|
54
|
+
· Save/load models in compressed .llk format
|
|
55
|
+
· Auto-detect GPU (CUDA, MPS) or run on CPU
|
|
56
|
+
· Mixed precision training for 2x speed
|
|
57
|
+
· Everything PyTorch does - just simpler
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
Who Is AL For?
|
|
62
|
+
|
|
63
|
+
· Students learning AI concepts
|
|
64
|
+
· Developers prototyping quickly
|
|
65
|
+
· Hobbyists building weekend projects
|
|
66
|
+
· Mobile devs running AI on-device
|
|
67
|
+
· Anyone tired of complex frameworks
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
The Philosophy
|
|
72
|
+
|
|
73
|
+
AI should be fun, not frustrating. AL proves that powerful models don't need complicated code. One file. One dependency. Infinite possibilities.
|
|
74
|
+
|
|
75
|
+
AL - Because training AI shouldn't require a PhD.
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
numpy>=1.20.0
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
al
|
gptperoz-0.1.0/setup.cfg
ADDED
gptperoz-0.1.0/setup.py
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
from setuptools import setup
|
|
2
|
+
|
|
3
|
+
with open("README.md", "r", encoding="utf-8") as f:
|
|
4
|
+
long_description = f.read()
|
|
5
|
+
|
|
6
|
+
setup(
|
|
7
|
+
name="gptperoz", # ← Must match exactly!
|
|
8
|
+
version="0.1.0",
|
|
9
|
+
description="Train AI in 3 lines - The simplest PyTorch alternative",
|
|
10
|
+
long_description=long_description,
|
|
11
|
+
long_description_content_type="text/markdown",
|
|
12
|
+
author="APerson091",
|
|
13
|
+
url="https://github.com/APerson091/AL",
|
|
14
|
+
py_modules=["al"],
|
|
15
|
+
python_requires=">=3.8",
|
|
16
|
+
install_requires=["numpy>=1.20.0"],
|
|
17
|
+
classifiers=[
|
|
18
|
+
"License :: OSI Approved :: MIT License",
|
|
19
|
+
"Programming Language :: Python :: 3",
|
|
20
|
+
"Topic :: Scientific/Engineering :: Artificial Intelligence",
|
|
21
|
+
],
|
|
22
|
+
)
|