mlpclassifier 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- mlpclassifier-0.1.0/PKG-INFO +82 -0
- mlpclassifier-0.1.0/README.md +71 -0
- mlpclassifier-0.1.0/pyproject.toml +12 -0
- mlpclassifier-0.1.0/setup.cfg +4 -0
- mlpclassifier-0.1.0/src/classifier/__init__.py +0 -0
- mlpclassifier-0.1.0/src/classifier/classifier.py +271 -0
- mlpclassifier-0.1.0/src/classifier/utils.py +110 -0
- mlpclassifier-0.1.0/src/mlpclassifier.egg-info/PKG-INFO +82 -0
- mlpclassifier-0.1.0/src/mlpclassifier.egg-info/SOURCES.txt +10 -0
- mlpclassifier-0.1.0/src/mlpclassifier.egg-info/dependency_links.txt +1 -0
- mlpclassifier-0.1.0/src/mlpclassifier.egg-info/requires.txt +4 -0
- mlpclassifier-0.1.0/src/mlpclassifier.egg-info/top_level.txt +1 -0
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: mlpclassifier
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: An implementation of an MLP classifier (with an interface of that of scikit-learn's MLPClassifier class.
|
|
5
|
+
Requires-Python: >=3.11
|
|
6
|
+
Description-Content-Type: text/markdown
|
|
7
|
+
Requires-Dist: matplotlib>=3.10.8
|
|
8
|
+
Requires-Dist: numpy>=2.4.3
|
|
9
|
+
Requires-Dist: pandas>=3.0.1
|
|
10
|
+
Requires-Dist: scikit-learn>=1.8.0
|
|
11
|
+
|
|
12
|
+
# Neural-Network-From-Scratch-COSC-221-CSB
|
|
13
|
+
|
|
14
|
+
Neural Network to classify handwritten digits (a rite of passage project at this point lol).
|
|
15
|
+
|
|
16
|
+
We will try to re-implement a stripped down version of the `MLPClassifier` class from `scikit-learn` from first principles. With this, we can then train a general classifier using the Multi-Layered Perceptron model.
|
|
17
|
+
|
|
18
|
+
# To run
|
|
19
|
+
|
|
20
|
+
So since we've re-implemented an MLP using `scikit-learn`'s `MLPClassifier` as a template, the API should be familiar.
|
|
21
|
+
|
|
22
|
+
To import
|
|
23
|
+
|
|
24
|
+
|
|
25
|
+
# Dataset
|
|
26
|
+
|
|
27
|
+
Download the dataset from Kaggle
|
|
28
|
+
|
|
29
|
+
```
|
|
30
|
+
curl -L https://www.kaggle.com/api/v1/datasets/download/hojjatk/mnist-dataset -o ./dataset.zip
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Then just unzip it into a directory called `./dataset`
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
unzip -d dataset ./dataset.zip
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Optional, but clean redundancy:
|
|
40
|
+
|
|
41
|
+
```
|
|
42
|
+
rm -r *-idx*-ubyte
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
I've removed some duplicates, so currently I have:
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
$ ls ./dataset/
|
|
49
|
+
t10k-images.idx3-ubyte train-images.idx3-ubyte
|
|
50
|
+
t10k-labels.idx1-ubyte train-labels.idx1-ubyte
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
So it seems like by convention:
|
|
54
|
+
|
|
55
|
+
- we divide our dataset into training data, and then testing data
|
|
56
|
+
- currently, it seems like we have 60k training examples and 10k testing examples
|
|
57
|
+
- they do this to see how well the model has generalized
|
|
58
|
+
|
|
59
|
+
# Reference model
|
|
60
|
+
|
|
61
|
+
For now, we'll use a reference model through `scikit-learn`.
|
|
62
|
+
|
|
63
|
+
# TODO
|
|
64
|
+
|
|
65
|
+
- [x] debug all the row vector stuff
|
|
66
|
+
- [] package it in pip
|
|
67
|
+
- [] document the API
|
|
68
|
+
|
|
69
|
+
## Forward propagation
|
|
70
|
+
|
|
71
|
+
- [x] variable L for layer
|
|
72
|
+
- [x] a list $n^{[l]}$ for the size at each layer
|
|
73
|
+
- [] initialize using He's initalization
|
|
74
|
+
- [x] forward propagation step using that forward propagation formula
|
|
75
|
+
|
|
76
|
+
## Backward propagation
|
|
77
|
+
|
|
78
|
+
- [x] He's initialization
|
|
79
|
+
- [x] back propagation
|
|
80
|
+
- [x] scoring
|
|
81
|
+
- [x] saving
|
|
82
|
+
- [] make the learn rate $\alpha$ more adjustable
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# Neural-Network-From-Scratch-COSC-221-CSB
|
|
2
|
+
|
|
3
|
+
Neural Network to classify handwritten digits (a rite of passage project at this point lol).
|
|
4
|
+
|
|
5
|
+
We will try to re-implement a stripped down version of the `MLPClassifier` class from `scikit-learn` from first principles. With this, we can then train a general classifier using the Multi-Layered Perceptron model.
|
|
6
|
+
|
|
7
|
+
# To run
|
|
8
|
+
|
|
9
|
+
So since we've re-implemented an MLP using `scikit-learn`'s `MLPClassifier` as a template, the API should be familiar.
|
|
10
|
+
|
|
11
|
+
To import
|
|
12
|
+
|
|
13
|
+
|
|
14
|
+
# Dataset
|
|
15
|
+
|
|
16
|
+
Download the dataset from Kaggle
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
curl -L https://www.kaggle.com/api/v1/datasets/download/hojjatk/mnist-dataset -o ./dataset.zip
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
Then just unzip it into a directory called `./dataset`
|
|
23
|
+
|
|
24
|
+
```
|
|
25
|
+
unzip -d dataset ./dataset.zip
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
Optional, but clean redundancy:
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
rm -r *-idx*-ubyte
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
I've removed some duplicates, so currently I have:
|
|
35
|
+
|
|
36
|
+
```
|
|
37
|
+
$ ls ./dataset/
|
|
38
|
+
t10k-images.idx3-ubyte train-images.idx3-ubyte
|
|
39
|
+
t10k-labels.idx1-ubyte train-labels.idx1-ubyte
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
So it seems like by convention:
|
|
43
|
+
|
|
44
|
+
- we divide our dataset into training data, and then testing data
|
|
45
|
+
- currently, it seems like we have 60k training examples and 10k testing examples
|
|
46
|
+
- they do this to see how well the model has generalized
|
|
47
|
+
|
|
48
|
+
# Reference model
|
|
49
|
+
|
|
50
|
+
For now, we'll use a reference model through `scikit-learn`.
|
|
51
|
+
|
|
52
|
+
# TODO
|
|
53
|
+
|
|
54
|
+
- [x] debug all the row vector stuff
|
|
55
|
+
- [] package it in pip
|
|
56
|
+
- [] document the API
|
|
57
|
+
|
|
58
|
+
## Forward propagation
|
|
59
|
+
|
|
60
|
+
- [x] variable L for layer
|
|
61
|
+
- [x] a list $n^{[l]}$ for the size at each layer
|
|
62
|
+
- [] initialize using He's initalization
|
|
63
|
+
- [x] forward propagation step using that forward propagation formula
|
|
64
|
+
|
|
65
|
+
## Backward propagation
|
|
66
|
+
|
|
67
|
+
- [x] He's initialization
|
|
68
|
+
- [x] back propagation
|
|
69
|
+
- [x] scoring
|
|
70
|
+
- [x] saving
|
|
71
|
+
- [] make the learn rate $\alpha$ more adjustable
|
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
[project]
|
|
2
|
+
name = "mlpclassifier"
|
|
3
|
+
version = "0.1.0"
|
|
4
|
+
description = "An implementation of an MLP classifier (with an interface of that of scikit-learn's MLPClassifier class."
|
|
5
|
+
readme = "README.md"
|
|
6
|
+
requires-python = ">=3.11"
|
|
7
|
+
dependencies = [
|
|
8
|
+
"matplotlib>=3.10.8",
|
|
9
|
+
"numpy>=2.4.3",
|
|
10
|
+
"pandas>=3.0.1",
|
|
11
|
+
"scikit-learn>=1.8.0",
|
|
12
|
+
]
|
|
File without changes
|
|
@@ -0,0 +1,271 @@
|
|
|
1
|
+
import pickle
|
|
2
|
+
|
|
3
|
+
from src.utils import *
|
|
4
|
+
import copy
|
|
5
|
+
|
|
6
|
+
class MLPClassifier:
|
|
7
|
+
|
|
8
|
+
activations = {
|
|
9
|
+
"relu":(relu,d_relu),
|
|
10
|
+
"logistic":(sigmoid, d_sigmoid),
|
|
11
|
+
"tanh":(tanh, d_tanh),
|
|
12
|
+
"identity":(identity, d_identity)
|
|
13
|
+
}
|
|
14
|
+
|
|
15
|
+
def __init__(self, hidden_layer_sizes:tuple[int,...], max_iter:int=20, activation:str='relu', batch_size:int=200, alpha:float=1e-2, verbose:bool=False) -> None:
|
|
16
|
+
"""
|
|
17
|
+
We basically are implementing a stripped down version of the scikit learn MLPClassifier class. No solvers, we're only going to use stochastic gradient descent.
|
|
18
|
+
|
|
19
|
+
Args:
|
|
20
|
+
hidden_layer_sizes (tuple): a tuple representing the size of each hidden layer
|
|
21
|
+
max_iter (int): the number of passes through the whole dataset
|
|
22
|
+
activation (string): the activation function, can be relu, sigmoid, tanh, identity
|
|
23
|
+
alpha (float): the learning rate. Default is 1e-10.
|
|
24
|
+
verbose (bool): a boolean representing whether we want to display progress to the user
|
|
25
|
+
|
|
26
|
+
Return:
|
|
27
|
+
None
|
|
28
|
+
"""
|
|
29
|
+
if hidden_layer_sizes == tuple():
|
|
30
|
+
raise ValueError("The hidden layer must exist")
|
|
31
|
+
self.hidden_layer_sizes = hidden_layer_sizes
|
|
32
|
+
self.max_iter = max_iter
|
|
33
|
+
self.verbose = verbose
|
|
34
|
+
self.alpha = alpha
|
|
35
|
+
self.batch_size = batch_size
|
|
36
|
+
|
|
37
|
+
if activation not in self.activations:
|
|
38
|
+
raise ValueError(f"The activation function called {activation} doesn't exist")
|
|
39
|
+
|
|
40
|
+
self.activation, self.d_activation = self.activations[activation]
|
|
41
|
+
|
|
42
|
+
def load_weights(self, coefs, intercepts, classes):
|
|
43
|
+
"""
|
|
44
|
+
Load the weights and biases that has already been trained
|
|
45
|
+
"""
|
|
46
|
+
|
|
47
|
+
# weights contains a list of weight matrix
|
|
48
|
+
expected_weights_shape = tuple((self.hidden_layer_sizes[i],self.hidden_layer_sizes[i + 1])
|
|
49
|
+
for i in range(len(self.hidden_layer_sizes) - 1))
|
|
50
|
+
# (256, 128), (128, 64), (64, 10)
|
|
51
|
+
actual_weights_shape = tuple(w.shape for w in coefs[1:-1])
|
|
52
|
+
if actual_weights_shape != expected_weights_shape:
|
|
53
|
+
raise ValueError(f"Please load weights of the right shape for this model. Expected hidden layer weight shape {expected_weights_shape}")
|
|
54
|
+
|
|
55
|
+
expected_biases_shape = tuple((b, )
|
|
56
|
+
for b in self.hidden_layer_sizes)
|
|
57
|
+
actual_biases_shape = tuple(b.shape for b in intercepts[:-1])
|
|
58
|
+
if actual_biases_shape != expected_biases_shape:
|
|
59
|
+
raise ValueError(f"Please load biases of the right shape for this model. Expected hidden layer weight shape {expected_biases_shape}, got {actual_biases_shape}"
|
|
60
|
+
)
|
|
61
|
+
if (coefs[-1].shape[1] != intercepts[-1].shape[0]):
|
|
62
|
+
raise ValueError("The weights you passed doesn't match that shape of the biases.")
|
|
63
|
+
|
|
64
|
+
if (intercepts[-1].shape[0] != len(classes)):
|
|
65
|
+
raise ValueError("The classes you picked doesn't the size of the biases")
|
|
66
|
+
|
|
67
|
+
self.coefs_ = copy.deepcopy(coefs)
|
|
68
|
+
self.intercepts_ = copy.deepcopy(intercepts)
|
|
69
|
+
self.classes_ = copy.deepcopy(classes)
|
|
70
|
+
|
|
71
|
+
def forward(self, X):
|
|
72
|
+
"""
|
|
73
|
+
Given a list of samples, we do forward propagation according to the weights and biases to arrive at a prediction.
|
|
74
|
+
|
|
75
|
+
Args:
|
|
76
|
+
X (numpy darray): a numpy array with shape (number of samples, number of features)
|
|
77
|
+
|
|
78
|
+
Return:
|
|
79
|
+
Y (list): a list that contains matrices representing a^{[l]} at layer l, for all m samples.
|
|
80
|
+
Z (list): a list that contains matrices representing z^{[l]} at layer l, for all m samples.
|
|
81
|
+
"""
|
|
82
|
+
|
|
83
|
+
try:
|
|
84
|
+
# forward propagation
|
|
85
|
+
Z = [X]
|
|
86
|
+
Y = [X]
|
|
87
|
+
for weight, bias in zip(self.coefs_[:-1], self.intercepts_[:-1]):
|
|
88
|
+
z = Y[-1]@weight + bias
|
|
89
|
+
Y.append(self.activation(z))
|
|
90
|
+
Z.append(z)
|
|
91
|
+
weight = self.coefs_[-1]
|
|
92
|
+
bias = self.intercepts_[-1]
|
|
93
|
+
z = Y[-1]@weight + bias
|
|
94
|
+
Y.append(softmax(z))
|
|
95
|
+
Z.append(z)
|
|
96
|
+
return Y, Z
|
|
97
|
+
|
|
98
|
+
except NameError as e:
|
|
99
|
+
raise NameError(f"You can't do forward propagation before acquiring the weights: {e}")
|
|
100
|
+
|
|
101
|
+
|
|
102
|
+
def predict_proba(self, X):
|
|
103
|
+
"""
|
|
104
|
+
Given a list of samples, we do forward propagation according to the weights and biases to arrive at a prediction.
|
|
105
|
+
|
|
106
|
+
Args:
|
|
107
|
+
X (numpy darray): a numpy array with shape (number of samples, number of features)
|
|
108
|
+
|
|
109
|
+
Return:
|
|
110
|
+
Y (numpy darray): a numpy array with shape (number of sample, number of types of labels) which represents a probability distribution
|
|
111
|
+
"""
|
|
112
|
+
return self.forward(X)[0][-1]
|
|
113
|
+
|
|
114
|
+
def predict(self, X):
|
|
115
|
+
"""
|
|
116
|
+
A simple wrapper around predict_proba where it automatically maps the probability distribution to the best match among the classes.
|
|
117
|
+
"""
|
|
118
|
+
Y = self.predict_proba(X)
|
|
119
|
+
print(Y.shape)
|
|
120
|
+
return np.argmax(Y, axis=1)
|
|
121
|
+
|
|
122
|
+
def score(self, X, Y):
|
|
123
|
+
"""
|
|
124
|
+
Just gives the percentage at which we predicted the right label.
|
|
125
|
+
|
|
126
|
+
Args:
|
|
127
|
+
X (numpy darray): a list of features
|
|
128
|
+
Y (numpy darray): a list of labels for those features
|
|
129
|
+
|
|
130
|
+
Return:
|
|
131
|
+
result (tuple): (score, another tuple of index of failed prediction)
|
|
132
|
+
"""
|
|
133
|
+
boolean_array_result = np.array(self.predict(X)) == np.array(Y)
|
|
134
|
+
incorrect_indicies = np.where(~boolean_array_result)[0]
|
|
135
|
+
score = sum(boolean_array_result)/len(boolean_array_result)
|
|
136
|
+
return score, incorrect_indicies
|
|
137
|
+
|
|
138
|
+
def fit(self, X, y, save_path=None) -> None:
|
|
139
|
+
"""
|
|
140
|
+
This is to train the model, in order to adjust the weights and biases.
|
|
141
|
+
|
|
142
|
+
Args:
|
|
143
|
+
X (numpy darray): a numpy array with shape (number of samples, number of features)
|
|
144
|
+
y (numpy darray): the labels with shape (number of sampels, )
|
|
145
|
+
save_path (string): path to save (optional)
|
|
146
|
+
Return:
|
|
147
|
+
None
|
|
148
|
+
"""
|
|
149
|
+
try:
|
|
150
|
+
if (len(X) != len(y)):
|
|
151
|
+
raise ValueError("The number of example features should match that of the number of labels given")
|
|
152
|
+
# first divide into batches
|
|
153
|
+
# i hate naming conventions
|
|
154
|
+
Y = y
|
|
155
|
+
self.classes_ = np.unique(Y)
|
|
156
|
+
self.coefs_ = []
|
|
157
|
+
self.intercepts_= []
|
|
158
|
+
for i in range(len(self.hidden_layer_sizes) + 1):
|
|
159
|
+
n = int()
|
|
160
|
+
n_prev = int()
|
|
161
|
+
if i == len(self.hidden_layer_sizes):
|
|
162
|
+
n = len(self.classes_)
|
|
163
|
+
else:
|
|
164
|
+
n = self.hidden_layer_sizes[i]
|
|
165
|
+
|
|
166
|
+
if i == 0:
|
|
167
|
+
n_prev = X.shape[1]
|
|
168
|
+
else:
|
|
169
|
+
n_prev = self.hidden_layer_sizes[i - 1]
|
|
170
|
+
# He's initialization sets variance = 2/n^{[l - 1]}
|
|
171
|
+
self.coefs_.append(np.random.normal(0, np.sqrt(2/n_prev), size=(n_prev, n)))
|
|
172
|
+
self.intercepts_.append(np.zeros(n))
|
|
173
|
+
|
|
174
|
+
L = len(self.hidden_layer_sizes) + 1
|
|
175
|
+
for iter_num in range(self.max_iter):
|
|
176
|
+
J = np.array([])
|
|
177
|
+
for i in range(int(np.ceil(len(X)/self.batch_size))):
|
|
178
|
+
m = self.batch_size
|
|
179
|
+
x = X[self.batch_size*i:min(self.batch_size*(i + 1), len(X))]
|
|
180
|
+
y = Y[self.batch_size*i:min(self.batch_size*(i + 1), len(Y))]
|
|
181
|
+
# one hot labeling
|
|
182
|
+
indices= np.searchsorted(self.classes_, y)
|
|
183
|
+
y = np.eye(len(self.classes_))[indices]
|
|
184
|
+
|
|
185
|
+
# so we now need to propagate backwards
|
|
186
|
+
|
|
187
|
+
# first need to compute delta[L]
|
|
188
|
+
# forward propagation to get a^{[L]}
|
|
189
|
+
a, z = self.forward(x)
|
|
190
|
+
|
|
191
|
+
# J for the batch loss matrix
|
|
192
|
+
J = cross_entropy_loss(y, a[-1])
|
|
193
|
+
# first calculate the batch change in bias
|
|
194
|
+
delta = a[-1] - y
|
|
195
|
+
for l in reversed(range(1, L)): # l goes from L - 1 to 0
|
|
196
|
+
# make db the average already
|
|
197
|
+
db = np.mean(delta, axis=0)
|
|
198
|
+
dW = (a[l].T @ delta)/m
|
|
199
|
+
|
|
200
|
+
self.coefs_[l] -= self.alpha*dW
|
|
201
|
+
self.intercepts_[l] -= self.alpha*db
|
|
202
|
+
|
|
203
|
+
if l > 0:
|
|
204
|
+
delta = (delta @ self.coefs_[l].T) * self.d_activation(z[l])
|
|
205
|
+
if self.verbose:
|
|
206
|
+
print(f"Finished iteration {iter_num + 1}/{self.max_iter}. Loss: {np.mean(J, axis=0)}")
|
|
207
|
+
|
|
208
|
+
except KeyboardInterrupt as e:
|
|
209
|
+
if save_path != None:
|
|
210
|
+
self.save(save_path)
|
|
211
|
+
raise InterruptedError(f"Training canceled! Has saved weights to {save_path} as specified")
|
|
212
|
+
|
|
213
|
+
|
|
214
|
+
def save(self, path):
|
|
215
|
+
"""
|
|
216
|
+
Saving weights as a pickle file. Path is just the file path to save at.
|
|
217
|
+
"""
|
|
218
|
+
with open(path, 'wb') as file:
|
|
219
|
+
pickle.dump(
|
|
220
|
+
{
|
|
221
|
+
"weights":self.coefs_,
|
|
222
|
+
"biases":self.intercepts_,
|
|
223
|
+
"classes":self.classes_
|
|
224
|
+
},
|
|
225
|
+
file
|
|
226
|
+
)
|
|
227
|
+
|
|
228
|
+
|
|
229
|
+
if __name__ == "__main__":
|
|
230
|
+
images = get_images_fast("dataset/train-images.idx3-ubyte")
|
|
231
|
+
labels = get_labels_fast("dataset/train-labels.idx1-ubyte")
|
|
232
|
+
import matplotlib.pyplot as plt
|
|
233
|
+
test_images = get_images_fast("dataset/t10k-images.idx3-ubyte")
|
|
234
|
+
test_labels = get_labels_fast("dataset/t10k-labels.idx1-ubyte")
|
|
235
|
+
|
|
236
|
+
X = images.reshape(images.shape[0], -1)/255
|
|
237
|
+
|
|
238
|
+
X_test = test_images.reshape(test_images.shape[0], -1)/255
|
|
239
|
+
|
|
240
|
+
model = MLPClassifier(
|
|
241
|
+
hidden_layer_sizes=(128, 64),
|
|
242
|
+
activation='relu',
|
|
243
|
+
max_iter=1000,
|
|
244
|
+
batch_size=200,
|
|
245
|
+
verbose=True,
|
|
246
|
+
)
|
|
247
|
+
|
|
248
|
+
SAVE_PATH = "weights/self_trained_2.pkl"
|
|
249
|
+
|
|
250
|
+
model.fit(X, labels, save_path=SAVE_PATH)
|
|
251
|
+
|
|
252
|
+
#with open("weights/sklearn_weights_and_biases.pkl", 'rb') as file:
|
|
253
|
+
# weights_and_biases = pickle.load(file)
|
|
254
|
+
# model.load_weights(weights_and_biases["weights"],
|
|
255
|
+
# weights_and_biases["biases"],
|
|
256
|
+
# weights_and_biases["classes"])
|
|
257
|
+
|
|
258
|
+
N = 10_000
|
|
259
|
+
score, incorrect_indicies = model.score(X_test[:N], test_labels[:N])
|
|
260
|
+
print("score: ", score)
|
|
261
|
+
model.save(SAVE_PATH)
|
|
262
|
+
#print("incorrect indicies: ", incorrect_indicies)
|
|
263
|
+
#predict, Z = model.forward(X_test[:N])
|
|
264
|
+
#print([i.shape for i in Z])
|
|
265
|
+
#predicted_label = model.predict(np.array([x_test[i] for i in incorrect_indicies]))
|
|
266
|
+
#print(predicted_label)
|
|
267
|
+
#for i in incorrect_indicies:
|
|
268
|
+
# predicted_label = model.predict(np.array([x_test[i]]))[0]
|
|
269
|
+
# plt.imshow(test_images[i])
|
|
270
|
+
# plt.title(f"actual label: {test_labels[i]}, predicted label: {predicted_label}")
|
|
271
|
+
# plt.show()
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
import numpy as np
|
|
2
|
+
|
|
3
|
+
# functions for importing datasets
|
|
4
|
+
|
|
5
|
+
def get_images(filename):
|
|
6
|
+
with open(filename, 'rb') as file:
|
|
7
|
+
magic_bytes = file.read(4)
|
|
8
|
+
number_of_images = int.from_bytes(file.read(4))
|
|
9
|
+
number_of_rows = int.from_bytes(file.read(4))
|
|
10
|
+
number_of_columns = int.from_bytes(file.read(4))
|
|
11
|
+
|
|
12
|
+
return [[[int.from_bytes(file.read(1)) for _ in range(number_of_columns)]
|
|
13
|
+
for _ in range(number_of_rows)]
|
|
14
|
+
for _ in range(number_of_images)]
|
|
15
|
+
|
|
16
|
+
def get_labels(filename):
|
|
17
|
+
with open(filename, 'rb') as file:
|
|
18
|
+
magic_bytes = file.read(4)
|
|
19
|
+
number_of_labels = int.from_bytes(file.read(4))
|
|
20
|
+
return [int.from_bytes(file.read(1)) for _ in range(number_of_labels)]
|
|
21
|
+
|
|
22
|
+
def get_images_fast(filename):
|
|
23
|
+
with open(filename, 'rb') as file:
|
|
24
|
+
magic_bytes = int.from_bytes(file.read(4))
|
|
25
|
+
if magic_bytes != 2051:
|
|
26
|
+
raise ValueError("Magic bytes doesn't match that of idx3")
|
|
27
|
+
number_of_images = int.from_bytes(file.read(4))
|
|
28
|
+
number_of_rows = int.from_bytes(file.read(4))
|
|
29
|
+
number_of_columns = int.from_bytes(file.read(4))
|
|
30
|
+
|
|
31
|
+
images = np.frombuffer(file.read(), dtype=np.uint8)
|
|
32
|
+
images = images[:number_of_images*number_of_rows*number_of_columns]
|
|
33
|
+
return images.reshape(number_of_images, number_of_rows, number_of_columns)
|
|
34
|
+
|
|
35
|
+
def get_labels_fast(filename):
|
|
36
|
+
with open(filename, 'rb') as file:
|
|
37
|
+
magic_bytes = int.from_bytes(file.read(4))
|
|
38
|
+
if magic_bytes != 2049:
|
|
39
|
+
raise ValueError("Magic bytes doesn't match that of idx1")
|
|
40
|
+
number_of_labels = int.from_bytes(file.read(4))
|
|
41
|
+
return np.frombuffer(file.read(), dtype=np.uint8)
|
|
42
|
+
|
|
43
|
+
# some useful functions
|
|
44
|
+
|
|
45
|
+
def relu(z):
|
|
46
|
+
"""
|
|
47
|
+
The ReLU function makes all input less than 0, 0. Leaves the rest unchanged.
|
|
48
|
+
"""
|
|
49
|
+
return np.maximum(0, z)
|
|
50
|
+
|
|
51
|
+
def d_relu(z):
|
|
52
|
+
"""
|
|
53
|
+
The derivative of the ReLU function is the unit step function.
|
|
54
|
+
"""
|
|
55
|
+
return z > 0
|
|
56
|
+
|
|
57
|
+
def sigmoid(z):
|
|
58
|
+
"""
|
|
59
|
+
The sigmoid function
|
|
60
|
+
"""
|
|
61
|
+
return 1/(1 + np.exp(-z))
|
|
62
|
+
|
|
63
|
+
def d_sigmoid(z):
|
|
64
|
+
"""
|
|
65
|
+
The derivative of the sigmoid function
|
|
66
|
+
"""
|
|
67
|
+
return sigmoid(z)*(1 - sigmoid(z))
|
|
68
|
+
|
|
69
|
+
def tanh(z):
|
|
70
|
+
"""
|
|
71
|
+
The hyperbolic tangent function
|
|
72
|
+
"""
|
|
73
|
+
# kinda redundant lol
|
|
74
|
+
return np.tanh(z)
|
|
75
|
+
|
|
76
|
+
def d_tanh(z):
|
|
77
|
+
return 1 - tanh(z)**2
|
|
78
|
+
|
|
79
|
+
def identity(z):
|
|
80
|
+
return z
|
|
81
|
+
|
|
82
|
+
def d_identity(z):
|
|
83
|
+
return 1
|
|
84
|
+
|
|
85
|
+
def softmax(z):
|
|
86
|
+
"""
|
|
87
|
+
We're expecting batched z, meaning that we will have m samples as columns, each row being the predicted classification.
|
|
88
|
+
"""
|
|
89
|
+
exp_z = np.exp(z - np.max(z, axis=1, keepdims=True)) # stable softmax
|
|
90
|
+
#exp_z = np.exp(z) # stable softmax
|
|
91
|
+
return exp_z / np.sum(exp_z, axis=1, keepdims=True)
|
|
92
|
+
|
|
93
|
+
def cross_entropy_loss(y, y_hat):
|
|
94
|
+
epsilon = 1e-12
|
|
95
|
+
y_hat = np.clip(y_hat, epsilon, 1 - epsilon) # in case that y_hat is so small that we run out of precision and it becomes 0
|
|
96
|
+
return - np.sum(y*np.log(y_hat), axis=1)
|
|
97
|
+
|
|
98
|
+
if __name__ == "__main__":
|
|
99
|
+
#import matplotlib.pyplot as plt
|
|
100
|
+
#x = np.linspace(-5, 5, 100)
|
|
101
|
+
#plt.plot(x, tanh(x), label="g(x)")
|
|
102
|
+
#plt.legend()
|
|
103
|
+
#plt.show()
|
|
104
|
+
#s = softmax(np.array([1,2,3]))
|
|
105
|
+
#print(s)
|
|
106
|
+
#print(np.sum(s))
|
|
107
|
+
pass
|
|
108
|
+
|
|
109
|
+
|
|
110
|
+
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: mlpclassifier
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: An implementation of an MLP classifier (with an interface of that of scikit-learn's MLPClassifier class.
|
|
5
|
+
Requires-Python: >=3.11
|
|
6
|
+
Description-Content-Type: text/markdown
|
|
7
|
+
Requires-Dist: matplotlib>=3.10.8
|
|
8
|
+
Requires-Dist: numpy>=2.4.3
|
|
9
|
+
Requires-Dist: pandas>=3.0.1
|
|
10
|
+
Requires-Dist: scikit-learn>=1.8.0
|
|
11
|
+
|
|
12
|
+
# Neural-Network-From-Scratch-COSC-221-CSB
|
|
13
|
+
|
|
14
|
+
Neural Network to classify handwritten digits (a rite of passage project at this point lol).
|
|
15
|
+
|
|
16
|
+
We will try to re-implement a stripped down version of the `MLPClassifier` class from `scikit-learn` from first principles. With this, we can then train a general classifier using the Multi-Layered Perceptron model.
|
|
17
|
+
|
|
18
|
+
# To run
|
|
19
|
+
|
|
20
|
+
So since we've re-implemented an MLP using `scikit-learn`'s `MLPClassifier` as a template, the API should be familiar.
|
|
21
|
+
|
|
22
|
+
To import
|
|
23
|
+
|
|
24
|
+
|
|
25
|
+
# Dataset
|
|
26
|
+
|
|
27
|
+
Download the dataset from Kaggle
|
|
28
|
+
|
|
29
|
+
```
|
|
30
|
+
curl -L https://www.kaggle.com/api/v1/datasets/download/hojjatk/mnist-dataset -o ./dataset.zip
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Then just unzip it into a directory called `./dataset`
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
unzip -d dataset ./dataset.zip
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Optional, but clean redundancy:
|
|
40
|
+
|
|
41
|
+
```
|
|
42
|
+
rm -r *-idx*-ubyte
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
I've removed some duplicates, so currently I have:
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
$ ls ./dataset/
|
|
49
|
+
t10k-images.idx3-ubyte train-images.idx3-ubyte
|
|
50
|
+
t10k-labels.idx1-ubyte train-labels.idx1-ubyte
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
So it seems like by convention:
|
|
54
|
+
|
|
55
|
+
- we divide our dataset into training data, and then testing data
|
|
56
|
+
- currently, it seems like we have 60k training examples and 10k testing examples
|
|
57
|
+
- they do this to see how well the model has generalized
|
|
58
|
+
|
|
59
|
+
# Reference model
|
|
60
|
+
|
|
61
|
+
For now, we'll use a reference model through `scikit-learn`.
|
|
62
|
+
|
|
63
|
+
# TODO
|
|
64
|
+
|
|
65
|
+
- [x] debug all the row vector stuff
|
|
66
|
+
- [] package it in pip
|
|
67
|
+
- [] document the API
|
|
68
|
+
|
|
69
|
+
## Forward propagation
|
|
70
|
+
|
|
71
|
+
- [x] variable L for layer
|
|
72
|
+
- [x] a list $n^{[l]}$ for the size at each layer
|
|
73
|
+
- [] initialize using He's initalization
|
|
74
|
+
- [x] forward propagation step using that forward propagation formula
|
|
75
|
+
|
|
76
|
+
## Backward propagation
|
|
77
|
+
|
|
78
|
+
- [x] He's initialization
|
|
79
|
+
- [x] back propagation
|
|
80
|
+
- [x] scoring
|
|
81
|
+
- [x] saving
|
|
82
|
+
- [] make the learn rate $\alpha$ more adjustable
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
README.md
|
|
2
|
+
pyproject.toml
|
|
3
|
+
src/classifier/__init__.py
|
|
4
|
+
src/classifier/classifier.py
|
|
5
|
+
src/classifier/utils.py
|
|
6
|
+
src/mlpclassifier.egg-info/PKG-INFO
|
|
7
|
+
src/mlpclassifier.egg-info/SOURCES.txt
|
|
8
|
+
src/mlpclassifier.egg-info/dependency_links.txt
|
|
9
|
+
src/mlpclassifier.egg-info/requires.txt
|
|
10
|
+
src/mlpclassifier.egg-info/top_level.txt
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
classifier
|