pokepy-generator 1.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Anay Shekhar
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,411 @@
1
+ Metadata-Version: 2.4
2
+ Name: pokepy-generator
3
+ Version: 1.0.1
4
+ Summary: A character-level language model built from scratch using only numpy.
5
+ Requires-Python: >=3.9
6
+ Description-Content-Type: text/markdown
7
+ License-File: LICENSE
8
+ Requires-Dist: numpy>=1.24.0
9
+ Requires-Dist: gradio>=4.0.0
10
+ Requires-Dist: huggingface-hub<=0.24.0
11
+ Dynamic: license-file
12
+
13
+ # pokepy :)
14
+
15
+ ![Alt text](pokemon.jpg) </br>
16
+
17
+ a character-level language model built completely from scratch using only numpy that generates pokémon sounding names. no PyTorch, no autograd, no deep learning frameworks — every forward pass, backward pass, and gradient update is manually implemented.
18
+
19
+ this project started as a way to understand how neural networks actually work under the hood. I first built a simple MLP, then expanded it into a WaveNet-style architecture to explore how increasing context length changes what a model can learn.
20
+
21
+ ## demo
22
+
23
+ try it live (local desktop):
24
+
25
+ for mac users, double click PokepyLauncher.command to launch the application.
26
+ for windows users, double click PokepyLauncher.bat to launch the application.
27
+
28
+ once running, the terminal will spin up a local matrix-inference engine. open your browser and navigate to: http://localhost:10000
29
+
30
+ ---
31
+
32
+ # what is this
33
+
34
+ I wanted to understand the foundations behind language models, so I built a mini character-level text generator completely from scratch.
35
+
36
+ instead of relying on existing machine learning libraries, I manually implemented:
37
+
38
+ * embeddings
39
+ * linear layers
40
+ * batch normalization
41
+ * tanh activations
42
+ * softmax
43
+ * cross entropy loss
44
+ * backpropagation
45
+ * gradient descent
46
+
47
+ everything runs only with numpy.
48
+
49
+ the model learns character patterns from pokémon names and generates new names based on the relationships it discovers.
50
+
51
+ ---
52
+
53
+ # model 1 — MLP
54
+
55
+ ## architecture
56
+
57
+ the first version was a simple multilayer perceptron:
58
+
59
+ * character embedding layer (10-dimensional vectors)
60
+ * linear layer
61
+ * batch normalization
62
+ * tanh activation
63
+ * output linear layer
64
+ * softmax + cross entropy loss
65
+
66
+ hidden size:
67
+
68
+ ```
69
+ 200 neurons
70
+ ```
71
+
72
+ context length:
73
+
74
+ ```
75
+ 3 characters
76
+ ```
77
+
78
+ this means the model only looks at the previous 3 characters to predict the next one.
79
+
80
+ example:
81
+
82
+ ```
83
+ pik → a
84
+ ika → next character
85
+ ```
86
+
87
+ the context continuously shifts as the model generates.
88
+
89
+ ---
90
+
91
+ ## MLP results
92
+
93
+ training:
94
+
95
+ ```
96
+ train loss: 1.294
97
+ validation loss: 3.504
98
+ ```
99
+
100
+ generated names:
101
+
102
+ ```
103
+ blipedeedo
104
+ rosalini
105
+ lect
106
+ dartic
107
+ star
108
+ vigus
109
+ swannon
110
+ hippowdon
111
+ the
112
+ larvinerao
113
+ ```
114
+
115
+ the MLP learned basic character relationships, but the limited context window made it difficult to understand longer patterns inside names.
116
+
117
+ ---
118
+
119
+ # model 2 — WaveNet
120
+
121
+ ## why I built this
122
+
123
+ the biggest limitation of the MLP was context length.
124
+
125
+ with only 3 characters of context, the model could only see a small part of each name.
126
+
127
+ for example:
128
+
129
+ ```
130
+ charizard
131
+
132
+ cha
133
+ har
134
+ ari
135
+ riz
136
+ ```
137
+
138
+ the model does not understand the larger structure of the word.
139
+
140
+ WaveNet improves this by gradually combining groups of characters, allowing the model to build larger representations without massively increasing the number of parameters.
141
+
142
+ ---
143
+
144
+ ## architecture
145
+
146
+ WaveNet-style architecture:
147
+
148
+ * character embedding layer (10 dimensions)
149
+ * FlattenConsecutive layers
150
+ * multiple linear layers
151
+ * batch normalization
152
+ * tanh activations
153
+ * final output layer
154
+ * softmax + cross entropy loss
155
+
156
+ context length:
157
+
158
+ ```
159
+ 8 characters
160
+ ```
161
+
162
+ the model builds information hierarchically:
163
+
164
+ ```
165
+ characters
166
+
167
+
168
+
169
+ combined character groups
170
+
171
+
172
+
173
+ higher level features
174
+
175
+
176
+
177
+ next character prediction
178
+ ```
179
+
180
+ ---
181
+
182
+ ## WaveNet results
183
+
184
+ training:
185
+
186
+ ```
187
+ train loss: 1.949
188
+ validation loss: 2.748
189
+ ```
190
+
191
+ generated names:
192
+
193
+ ```
194
+ gropinig
195
+ pyghislacat
196
+ poloun
197
+ hoongel
198
+ spuspiniyan
199
+ ongtover
200
+ kasato
201
+ xel
202
+ felspipon
203
+ linmatie
204
+ asherron
205
+ beatdiqdule
206
+ madstutf
207
+ drudona
208
+ rouzslra
209
+ liwsywunk
210
+ galeon
211
+ magnoslaws
212
+ araidono
213
+ lickopt
214
+ ```
215
+
216
+ WaveNet produced longer and more structured generations because it had access to a larger context window.
217
+
218
+ ---
219
+
220
+ # MLP vs WaveNet
221
+
222
+ | | MLP | WaveNet |
223
+ | ---------------- | ------------------- | ------------------- |
224
+ | Context size | 3 characters | 8 characters |
225
+ | Hidden size | 200 | 32 |
226
+ | Training steps | 300,000 | 10,000 |
227
+ | Architecture | Single hidden layer | Hierarchical layers |
228
+ | Feature learning | Direct | Progressive |
229
+ | Main advantage | Simple baseline | Larger context |
230
+
231
+ ---
232
+
233
+ # challenges
234
+
235
+ ## context length
236
+
237
+ one of the biggest lessons from this project was understanding why context matters.
238
+
239
+ a model with a smaller context window can only learn local patterns, while larger context allows it to understand longer relationships.
240
+
241
+ the MLP used:
242
+
243
+ ```
244
+ 3 character context
245
+ ```
246
+
247
+ while WaveNet increased this to:
248
+
249
+ ```
250
+ 8 character context
251
+ ```
252
+
253
+ which allowed it to capture more structure from names.
254
+
255
+ ---
256
+
257
+ ## batch normalization
258
+
259
+ implementing batch normalization manually was one of the hardest parts.
260
+
261
+ I had to handle:
262
+
263
+ * batch mean
264
+ * batch variance
265
+ * running mean
266
+ * running variance
267
+
268
+ training and inference use different statistics, so saving the running values was required for the deployed model to generate correctly.
269
+
270
+ ---
271
+
272
+ ## backpropagation
273
+
274
+ instead of using:
275
+
276
+ ```python
277
+ loss.backward()
278
+ ```
279
+
280
+ I manually calculated gradients for:
281
+
282
+ * embeddings
283
+ * linear layers
284
+ * batch normalization
285
+ * tanh activations
286
+ * softmax cross entropy
287
+
288
+ this helped me understand how neural networks actually learn instead of treating them as black boxes.
289
+
290
+ ---
291
+
292
+ ## random generation
293
+
294
+ generation is probabilistic.
295
+
296
+ even with the same trained model, outputs change because the next character is sampled from the model's probability distribution.
297
+
298
+ ---
299
+
300
+ # training details
301
+
302
+ ## MLP
303
+
304
+ dataset:
305
+
306
+ ```
307
+ pokemon names
308
+ ```
309
+
310
+ training:
311
+
312
+ * optimizer: SGD
313
+ * batch size: 32
314
+ * steps: 300,000
315
+ * learning rate: `0.1 → 0.01 after 100k steps`
316
+
317
+ parameters:
318
+
319
+ ```
320
+ C
321
+ W1
322
+ W2
323
+ b2
324
+ bngain
325
+ bnbias
326
+ ```
327
+
328
+ ---
329
+
330
+ ## WaveNet
331
+
332
+ dataset:
333
+
334
+ ```
335
+ pokemon names
336
+ ```
337
+
338
+ training:
339
+
340
+ * optimizer: SGD
341
+ * batch size: 32
342
+ * steps: 10,000
343
+ * learning rate: `0.1 → 0.01 after 8000 steps`
344
+
345
+ parameters:
346
+
347
+ ```
348
+ embeddings
349
+ linear layers
350
+ batch normalization parameters
351
+ ```
352
+
353
+ ---
354
+
355
+ # deployment
356
+
357
+ the model is deployed using Hugging Face Spaces with Gradio.
358
+
359
+ the demo loads the trained numpy weights and runs inference without PyTorch or external ML frameworks.
360
+
361
+ the deployed model uses:
362
+
363
+ * trained embeddings
364
+ * linear layer weights
365
+ * batch normalization parameters
366
+ * vocabulary mappings
367
+
368
+ the entire inference pipeline runs using manually implemented numpy layers.
369
+
370
+ ---
371
+
372
+ # usage
373
+
374
+ install dependencies:
375
+
376
+ ```bash
377
+ pip install numpy
378
+ ```
379
+
380
+ run:
381
+
382
+ ```bash
383
+ python mlp.py
384
+ python wavenet.py
385
+ ```
386
+
387
+ you will need:
388
+
389
+ ```
390
+ data/
391
+
392
+ └── pokemon.txt
393
+ ```
394
+
395
+ with one pokémon name per line.
396
+
397
+ ---
398
+
399
+ # what I learned
400
+
401
+ this project taught me how language models are built from the ground up.
402
+
403
+ the biggest takeaway was that improving a model is not always about making it bigger. changing the architecture and giving the model better ways to understand context can have a larger impact than simply adding more parameters.
404
+
405
+ ---
406
+
407
+ # license
408
+
409
+ MIT
410
+
411
+ heavily inspired by Andrej Karpathy's makemore series — highly recommend if you want to understand neural networks from the inside out :)