fast-aes 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,110 @@
1
+ = FastAES - Fast AES implementation for Ruby in C
2
+
3
+ This is a lightweight, fast implementation of AES (the US government's Advanced Encryption Standard,
4
+ aka "Rijndael"), written in C for speed. You can read more on the {Wikipedia AES Page}[http://en.wikipedia.org/wiki/Advanced_Encryption_Standard].
5
+ The algorithm itself was extracted from work by Christophe Devine for the open source Netcat clone
6
+ {sbd}[http://www.cycom.se/dl/sbd]. According to the community, this is
7
+ {one of the best performing AES implementations available}[http://www.derkeiler.com/Newsgroups/sci.crypt/2003-07/0162.html]:
8
+
9
+ > With some exceptions your code performs better than all others in
10
+ > enc[ryption]/dec[ryption]. Do you have an explanation of that fact? Thanks.
11
+ >
12
+ Well, I've tried to make the code as simple and straightforward as
13
+ possible; I also used a few basic tricks, like loop unrolling.
14
+
15
+ Since this library wraps the sbd implementation, it supports a subset of AES, specifically:
16
+
17
+ * 128, 192, and 256-bit ciphers
18
+ * Cipher Block Chaining (CBC) mode only
19
+ * Encrypted blocks are padded at 16-bit boundaries ({read more on padding}[http://www.di-mgt.com.au/cryptopad.html#whatispadding])
20
+
21
+ You can read specifics about AES-CBC in the IPSec-related {RFC 3602}[http://www.rfc-archive.org/getrfc.php?rfc=3602],
22
+ if you really care that much.
23
+
24
+ Bottom line, this gem works. Fast.
25
+
26
+ === Other Ruby AES gems
27
+
28
+ I couldn't find any that worked worth a crap. The {ruby-aes}[http://rubyforge.org/projects/ruby-aes/]
29
+ project has Ruby 1.9 bugs that have been open over _two_ _years_ now, {crypt/rijndael}[http://crypt.rubyforge.org/rijndael.html]
30
+ doesn't work on Ruby 1.9 and is *SLOOOOOOW* (as it's written in Ruby), and some people even report getting
31
+ {inconsistent encryption results from other libraries}[http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/228214].
32
+
33
+ So I grabbed some C reference code, wrapped a Ruby interface around it, and voíla.
34
+
35
+ C'mon people, it's not that hard. It's called Google. In my day, you had to actually *WRITE* the code.
36
+
37
+ == Installation
38
+
39
+ gem install gemcutter
40
+ gem install fast-aes
41
+
42
+ == Example
43
+
44
+ Simple encryption/decryption:
45
+
46
+ require 'fast-aes'
47
+
48
+ # key can be 128, 192, or 256 bits
49
+ key = '424b3b5c4d454c7a51376748255d7b7156585f543f776243227352746f'
50
+
51
+ aes = FastAES.new(key)
52
+
53
+ text = "Hey there, how are you?"
54
+
55
+ data = aes.encrypt(text)
56
+
57
+ puts aes.decrypt(data) # "Hey there, how are you?"
58
+
59
+ Pretty simple, jah?
60
+
61
+ == Why AES?
62
+
63
+ === SSL vs AES
64
+
65
+ I'm going to guess you're using Ruby with Rails, which means you're doing 90+% web development.
66
+ In that case, if you need security, SSL is the obvious choice (and the right one).
67
+
68
+ But there will probably come a time, padawan, when you need a couple backend servers to talk -
69
+ maybe job servers, or an admin port, or whatever. Maybe even a simple chat server.
70
+
71
+ You can use SSL for this if you want it to be time-consuming to setup, painful to maintain, and
72
+ slow. Or you can use a different algorithm, such as AES. Setting up an SSH tunnel is another good
73
+ alternative (although AES is faster, and setup is slightly easier).
74
+
75
+ === AES vs Other Encryption Standards
76
+
77
+ There are a bizillion (literally!) different encryption standards out there. If you have
78
+ a PhD, and can't find a job, writing an encryption algorithm is a good thing to put on your resume -
79
+ on that outside chance that someone will hire you and use it. If you don't possess the talent to
80
+ write an encryption standard, you can spend hours trying to crack one - for similar reasons. As a
81
+ result, of the many encryption alternatives, most are either (a) cracked or (b) covered by patents.
82
+
83
+ Personally, when it comes to encryption, I think choosing what the US government chooses is a decent
84
+ choice. They tend to be "security conscious."
85
+
86
+ === Special Note
87
+
88
+ As this software deals with encryption/decryption, please note there is *NO* *WARRANTY*, not even
89
+ with regards to FITNESS FOR A PARTICULAR PURPOSE or NONINFRINGEMENT. This means if you use this
90
+ library, and it turns out there's a flaw in the implementation that results in your data being
91
+ hacked, *IT* *IS* *NOT* *MY* *FAULT*. It's YOUR responsibility to check the implementation of this
92
+ library and algorithm. If you can't understand C code, that's NOT MY PROBLEM.
93
+
94
+ == Author
95
+
96
+ Original AES C reference code by Christophe Devine. Thanks Christophe!
97
+
98
+ This gem copyright (c) 2010 {Nate Wiger}[http://nate.wiger.org]. Released under the MIT License.
99
+
100
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation
101
+ files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use,
102
+ copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the
103
+ Software is furnished to do so, subject to the following conditions:
104
+
105
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
106
+
107
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
108
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
109
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
110
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,11 @@
1
+ # Loads mkmf which is used to make makefiles for Ruby extensions
2
+ require 'mkmf'
3
+
4
+ # Give it a name
5
+ extension_name = 'fast_aes'
6
+
7
+ # The destination
8
+ dir_config(extension_name)
9
+
10
+ # Do the work
11
+ create_makefile(extension_name)
@@ -0,0 +1,1023 @@
1
+ /*//////////////////////////////////////////////////////////////////////////////
2
+ ////////////////////////////////////////////////////////////////////////////////
3
+ //
4
+ // Part of the FastAES Ruby/C library implementation.
5
+ // Implementation in C originally by Christophe Devine.
6
+ //
7
+ ////////////////////////////////////////////////////////////////////////////////
8
+ //////////////////////////////////////////////////////////////////////////////*/
9
+
10
+ #include <string.h>
11
+ #include <stdio.h>
12
+ #include <stdint.h>
13
+
14
+ #include "ruby.h"
15
+ #include "fast_aes.h"
16
+
17
+ /* Global boolean */
18
+ int fast_aes_do_gen_tables = 1;
19
+
20
+ /* Old school. Oh yeah */
21
+ #ifndef RSTRING_PTR
22
+ #define RSTRING_PTR(s) (RSTRING(s)->ptr)
23
+ #define RSTRING_LEN(s) (RSTRING(s)->len)
24
+ #endif
25
+
26
+ /* Ruby buckets */
27
+ VALUE rb_cFastAES;
28
+
29
+ void Init_fast_aes()
30
+ {
31
+ rb_cFastAES = rb_define_class("FastAES", rb_cObject);
32
+
33
+ rb_define_alloc_func(rb_cFastAES, fast_aes_alloc);
34
+ rb_define_method(rb_cFastAES, "initialize", fast_aes_initialize, 1);
35
+ rb_define_method(rb_cFastAES, "encrypt", fast_aes_encrypt, 1);
36
+ rb_define_method(rb_cFastAES, "decrypt", fast_aes_decrypt, 1);
37
+ rb_define_method(rb_cFastAES, "key", fast_aes_key, 0);
38
+ }
39
+
40
+ VALUE fast_aes_key(VALUE self)
41
+ {
42
+ /* get our "self" data structure (eg, member vars) */
43
+ fast_aes_t* fast_aes;
44
+ Data_Get_Struct(self, fast_aes_t, fast_aes);
45
+ VALUE new_str = rb_str_new(fast_aes->key, fast_aes->key_bits/8);
46
+ return new_str;
47
+ }
48
+
49
+ VALUE fast_aes_alloc(VALUE klass)
50
+ {
51
+ /* Initialize our structs */
52
+ fast_aes_t *fast_aes = malloc(sizeof(fast_aes_t));
53
+
54
+ /* Clear out memory */
55
+ memset(fast_aes->key, 0, sizeof(fast_aes->key));
56
+ memset(fast_aes->erk, 0, sizeof(fast_aes->erk));
57
+ memset(fast_aes->drk, 0, sizeof(fast_aes->drk));
58
+ memset(fast_aes->initial_erk, 0, sizeof(fast_aes->initial_erk));
59
+ memset(fast_aes->initial_drk, 0, sizeof(fast_aes->initial_drk));
60
+
61
+ return Data_Wrap_Struct(klass, fast_aes_mark, fast_aes_free, fast_aes);
62
+ }
63
+
64
+ VALUE fast_aes_initialize(VALUE self, VALUE key)
65
+ {
66
+ /* get our "self" data structure (eg, member vars) */
67
+ fast_aes_t* fast_aes;
68
+ Data_Get_Struct(self, fast_aes_t, fast_aes);
69
+ char error_mesg[350];
70
+
71
+ int key_bits;
72
+ char* key_data = StringValuePtr(key);
73
+
74
+ /* since the tables are global there's no need to generate them more than once
75
+ * regardless of how many instances there are of this object
76
+ */
77
+ if( fast_aes_do_gen_tables == 1 )
78
+ {
79
+ fast_aes_gen_tables();
80
+ fast_aes_do_gen_tables = 0;
81
+ }
82
+
83
+ /* if they are trying to use a number of bits that is larger that the key
84
+ * has available, truncate the bits to the key bits.
85
+ * ie., they pass a 128 bit key but pass keytype N256 we will use N128
86
+ */
87
+ key_bits = strlen(key_data)*8;
88
+ switch(key_bits)
89
+ {
90
+ case 128:
91
+ case 192:
92
+ case 256:
93
+ fast_aes->key_bits = key_bits;
94
+ memcpy(fast_aes->key, key_data, key_bits/8);
95
+ /*printf("AES key=%s, bits=%d\n", fast_aes->key, fast_aes->key_bits);*/
96
+ break;
97
+ default:
98
+ sprintf(error_mesg, "AES key must be 128, 192, or 256 bits in length (got %d): %s", key_bits, key_data);
99
+ rb_raise(rb_eArgError, error_mesg);
100
+ return Qnil;
101
+ }
102
+
103
+ if (fast_aes_initialize_state(fast_aes)) {
104
+ rb_raise(rb_eRuntimeError, "Failed to initialize AES internal state");
105
+ return Qnil;
106
+ }
107
+ return Qtrue;
108
+ }
109
+
110
+ void fast_aes_module_shutdown( fast_aes_t* fast_aes )
111
+ {
112
+ }
113
+
114
+ /* This method **MUST** be present even if it does nothing */
115
+ void fast_aes_mark( fast_aes_t* fast_aes )
116
+ {
117
+ /*rb_gc_mark(??);
118
+ //should we mark each member here? */
119
+ }
120
+
121
+ void fast_aes_free( fast_aes_t* fast_aes )
122
+ {
123
+ fast_aes_module_shutdown(fast_aes);
124
+ free(fast_aes);
125
+ }
126
+
127
+ VALUE fast_aes_encrypt(
128
+ VALUE self,
129
+ VALUE buffer
130
+ )
131
+ {
132
+ /* get our "self" data structure (eg, member vars) */
133
+ fast_aes_t* fast_aes;
134
+ Data_Get_Struct(self, fast_aes_t, fast_aes);
135
+
136
+ char* pDataIn = StringValuePtr(buffer);
137
+ int uiNumBytesIn = RSTRING_LEN(buffer);
138
+ char* pDataOut = malloc((uiNumBytesIn + 15) & -16); /* auto-malloc min size in 16-byte increments */
139
+
140
+ unsigned char *pRead, *pWrite;
141
+ pRead = (unsigned char*)pDataIn;
142
+ pWrite = (unsigned char*)pDataOut;
143
+
144
+ /*//////////////////////////////////////////////////////////////////////////
145
+ ////////////////////////////////////////////////////////////////////////////
146
+ // This routine will encode all input bytes in entirety (AES always "succeeds")
147
+ */
148
+ int puiNumBytesOut = 0;
149
+
150
+ /* set the state back to the start to allow for correct encryption
151
+ * everytime we are passed data to encrypt
152
+ */
153
+ if (fast_aes_reinitialize_state(fast_aes)) {
154
+ rb_raise(rb_eRuntimeError, "Failed to reinitialize AES internal state");
155
+ return Qnil;
156
+ }
157
+
158
+ /*//////////////////////////////////////////////////////////////////////////
159
+ ////////////////////////////////////////////////////////////////////////////
160
+ // Perform block encodes 16 bytes at a time while we still have at least
161
+ // 16 bytes of input remaining.
162
+ */
163
+ while( uiNumBytesIn >= 16 )
164
+ {
165
+ fast_aes_encrypt_block(fast_aes, pRead, pWrite);
166
+ pRead += 16; pWrite += 16;
167
+ uiNumBytesIn -= 16;
168
+ puiNumBytesOut += 16;
169
+ }
170
+
171
+ /*//////////////////////////////////////////////////////////////////////////
172
+ ////////////////////////////////////////////////////////////////////////////
173
+ // Have to catch any straggling bytes that are left after encoding the
174
+ // 16-byte blocks. The policy here will be to pad the input with zeros.
175
+ */
176
+ if( uiNumBytesIn > 0 )
177
+ {
178
+ unsigned char temp[16];
179
+ memset(temp, 0, sizeof(temp)); /* pad with 0's */
180
+ memcpy(temp, pRead, uiNumBytesIn);
181
+ fast_aes_encrypt_block(fast_aes, temp, pWrite);
182
+ puiNumBytesOut += 16;
183
+ }
184
+
185
+ /* return the encrypted string */
186
+ VALUE new_str = rb_str_new(pDataOut, puiNumBytesOut);
187
+ free(pDataOut);
188
+ return new_str;
189
+ }
190
+
191
+ VALUE fast_aes_decrypt(
192
+ VALUE self,
193
+ VALUE buffer
194
+ )
195
+ {
196
+ /* get our "self" data structure (eg, member vars) */
197
+ fast_aes_t* fast_aes;
198
+ Data_Get_Struct(self, fast_aes_t, fast_aes);
199
+
200
+ char* pDataIn = StringValuePtr(buffer);
201
+ int uiNumBytesIn = RSTRING_LEN(buffer);
202
+ char* pDataOut = malloc((uiNumBytesIn + 15) & -16); /* auto-malloc min size in 16-byte increments */
203
+ pDataOut = malloc(uiNumBytesIn + 15);
204
+
205
+ unsigned char *pRead, *pWrite;
206
+ pRead = (unsigned char*)pDataIn;
207
+ pWrite = (unsigned char*)pDataOut;
208
+
209
+ /*//////////////////////////////////////////////////////////////////////////
210
+ ////////////////////////////////////////////////////////////////////////////
211
+ // AES does not fail, and this routine will encode all input bytes
212
+ // entirely.
213
+ */
214
+ int puiNumBytesOut = 0;
215
+
216
+ /* set the state back to the start to allow for correct decryption
217
+ // everytime we are passed data to decrypt
218
+ */
219
+ if (fast_aes_reinitialize_state(fast_aes)) {
220
+ rb_raise(rb_eRuntimeError, "Failed to reinitialize AES internal state");
221
+ return Qnil;
222
+ }
223
+
224
+ /*//////////////////////////////////////////////////////////////////////////
225
+ ////////////////////////////////////////////////////////////////////////////
226
+ // Perform block decodes 16 bytes at a time while we still have at least
227
+ // 16 bytes of input remaining.
228
+ */
229
+ while( uiNumBytesIn >= 16 )
230
+ {
231
+ fast_aes_decrypt_block(fast_aes, pRead, pWrite);
232
+ pRead += 16; pWrite += 16;
233
+ uiNumBytesIn -= 16;
234
+ puiNumBytesOut += 16;
235
+ }
236
+
237
+ /*//////////////////////////////////////////////////////////////////////////
238
+ ////////////////////////////////////////////////////////////////////////////
239
+ // Have to catch any straggling bytes that are left after decoding the
240
+ // 16-byte blocks. Strip trailing zeros, which is something fucking
241
+ // loose-cannon rjc couldn't figure out despite being a "genius". He needs
242
+ // a punch in the junk, I swear to god.
243
+ */
244
+ if( uiNumBytesIn > 0 )
245
+ {
246
+ unsigned char temp[16];
247
+ memset(temp, 0, sizeof(temp)); /* pad with 0's */
248
+ memcpy(temp, pRead, uiNumBytesIn);
249
+ fast_aes_decrypt_block(fast_aes, temp, pWrite);
250
+ puiNumBytesOut += 16;
251
+ }
252
+
253
+ /* Strip zeros, simple but effective. RJC can suck my kawck.
254
+ * "Senior." LOL. You're fired.
255
+ */
256
+ while (puiNumBytesOut > 0) {
257
+ if (pDataOut[puiNumBytesOut - 1] != 0) break;
258
+ puiNumBytesOut -= 1;
259
+ }
260
+
261
+ /* return the decrypted string */
262
+ VALUE new_str = rb_str_new(pDataOut, puiNumBytesOut);
263
+ free(pDataOut);
264
+ return new_str;
265
+ }
266
+
267
+ /*//////////////////////////////////////////////////////////////////////////////
268
+ //////////////////////////////////////////////////////////////////////////////*/
269
+
270
+ /* uncomment the following line to use pre-computed tables */
271
+ /* otherwise the tables will be generated at the first run */
272
+ //#define FIXED_TABLES
273
+
274
+ #ifndef FIXED_TABLES
275
+
276
+ /* forward S-box & tables */
277
+
278
+ uint32_t FSb[256];
279
+ uint32_t FT0[256];
280
+ uint32_t FT1[256];
281
+ uint32_t FT2[256];
282
+ uint32_t FT3[256];
283
+
284
+ /* reverse S-box & tables */
285
+
286
+ uint32_t RSb[256];
287
+ uint32_t RT0[256];
288
+ uint32_t RT1[256];
289
+ uint32_t RT2[256];
290
+ uint32_t RT3[256];
291
+
292
+ /* round constants */
293
+
294
+ uint32_t RCON[10];
295
+
296
+ /* tables generation flag */
297
+
298
+ /* tables generation routine */
299
+
300
+ #define ROTR8(x) ( ( ( x << 24 ) & 0xFFFFFFFF ) | \
301
+ ( ( x & 0xFFFFFFFF ) >> 8 ) )
302
+
303
+ #define XTIME(x) ( ( x << 1 ) ^ ( ( x & 0x80 ) ? 0x1B : 0x00 ) )
304
+ #define MUL(x,y) ( ( x && y ) ? pow[(log[x] + log[y]) % 255] : 0 )
305
+
306
+ void fast_aes_gen_tables( void )
307
+ {
308
+ int i;
309
+ uint8_t x, y;
310
+ uint8_t pow[256];
311
+ uint8_t log[256];
312
+
313
+ /* compute pow and log tables over GF(2^8) */
314
+
315
+ for( i = 0, x = 1; i < 256; i++, x ^= XTIME( x ) )
316
+ {
317
+ pow[i] = x;
318
+ log[x] = i;
319
+ }
320
+
321
+ /* calculate the round constants */
322
+
323
+ for( i = 0, x = 1; i < 10; i++, x = XTIME( x ) )
324
+ {
325
+ RCON[i] = (uint32_t) x << 24;
326
+ }
327
+
328
+ /* generate the forward and reverse S-boxes */
329
+
330
+ FSb[0x00] = 0x63;
331
+ RSb[0x63] = 0x00;
332
+
333
+ for( i = 1; i < 256; ++i )
334
+ {
335
+ x = pow[255 - log[i]];
336
+
337
+ y = x; y = ( y << 1 ) | ( y >> 7 );
338
+ x ^= y; y = ( y << 1 ) | ( y >> 7 );
339
+ x ^= y; y = ( y << 1 ) | ( y >> 7 );
340
+ x ^= y; y = ( y << 1 ) | ( y >> 7 );
341
+ x ^= y ^ 0x63;
342
+
343
+ FSb[i] = x;
344
+ RSb[x] = i;
345
+ }
346
+
347
+ /* generate the forward and reverse tables */
348
+
349
+ for( i = 0; i < 256; ++i )
350
+ {
351
+ x = (unsigned char) FSb[i]; y = XTIME( x );
352
+
353
+ FT0[i] = (uint32_t) ( x ^ y ) ^
354
+ ( (uint32_t) x << 8 ) ^
355
+ ( (uint32_t) x << 16 ) ^
356
+ ( (uint32_t) y << 24 );
357
+
358
+ FT0[i] &= 0xFFFFFFFF;
359
+
360
+ FT1[i] = ROTR8( FT0[i] );
361
+ FT2[i] = ROTR8( FT1[i] );
362
+ FT3[i] = ROTR8( FT2[i] );
363
+
364
+ y = (unsigned char) RSb[i];
365
+
366
+ RT0[i] = ( (uint32_t) MUL( 0x0B, y ) ) ^
367
+ ( (uint32_t) MUL( 0x0D, y ) << 8 ) ^
368
+ ( (uint32_t) MUL( 0x09, y ) << 16 ) ^
369
+ ( (uint32_t) MUL( 0x0E, y ) << 24 );
370
+
371
+ RT0[i] &= 0xFFFFFFFF;
372
+
373
+ RT1[i] = ROTR8( RT0[i] );
374
+ RT2[i] = ROTR8( RT1[i] );
375
+ RT3[i] = ROTR8( RT2[i] );
376
+ }
377
+ }
378
+
379
+ #else
380
+
381
+ /* forward S-box */
382
+
383
+ static const uint32_t FSb[256] =
384
+ {
385
+ 0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5,
386
+ 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
387
+ 0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0,
388
+ 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
389
+ 0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC,
390
+ 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15,
391
+ 0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A,
392
+ 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75,
393
+ 0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0,
394
+ 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84,
395
+ 0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B,
396
+ 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF,
397
+ 0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85,
398
+ 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8,
399
+ 0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5,
400
+ 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2,
401
+ 0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17,
402
+ 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73,
403
+ 0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88,
404
+ 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB,
405
+ 0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C,
406
+ 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79,
407
+ 0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9,
408
+ 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08,
409
+ 0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6,
410
+ 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A,
411
+ 0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E,
412
+ 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
413
+ 0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94,
414
+ 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
415
+ 0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68,
416
+ 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16
417
+ };
418
+
419
+ /* forward tables */
420
+
421
+ #define FT \
422
+ \
423
+ V(C6,63,63,A5), V(F8,7C,7C,84), V(EE,77,77,99), V(F6,7B,7B,8D), \
424
+ V(FF,F2,F2,0D), V(D6,6B,6B,BD), V(DE,6F,6F,B1), V(91,C5,C5,54), \
425
+ V(60,30,30,50), V(02,01,01,03), V(CE,67,67,A9), V(56,2B,2B,7D), \
426
+ V(E7,FE,FE,19), V(B5,D7,D7,62), V(4D,AB,AB,E6), V(EC,76,76,9A), \
427
+ V(8F,CA,CA,45), V(1F,82,82,9D), V(89,C9,C9,40), V(FA,7D,7D,87), \
428
+ V(EF,FA,FA,15), V(B2,59,59,EB), V(8E,47,47,C9), V(FB,F0,F0,0B), \
429
+ V(41,AD,AD,EC), V(B3,D4,D4,67), V(5F,A2,A2,FD), V(45,AF,AF,EA), \
430
+ V(23,9C,9C,BF), V(53,A4,A4,F7), V(E4,72,72,96), V(9B,C0,C0,5B), \
431
+ V(75,B7,B7,C2), V(E1,FD,FD,1C), V(3D,93,93,AE), V(4C,26,26,6A), \
432
+ V(6C,36,36,5A), V(7E,3F,3F,41), V(F5,F7,F7,02), V(83,CC,CC,4F), \
433
+ V(68,34,34,5C), V(51,A5,A5,F4), V(D1,E5,E5,34), V(F9,F1,F1,08), \
434
+ V(E2,71,71,93), V(AB,D8,D8,73), V(62,31,31,53), V(2A,15,15,3F), \
435
+ V(08,04,04,0C), V(95,C7,C7,52), V(46,23,23,65), V(9D,C3,C3,5E), \
436
+ V(30,18,18,28), V(37,96,96,A1), V(0A,05,05,0F), V(2F,9A,9A,B5), \
437
+ V(0E,07,07,09), V(24,12,12,36), V(1B,80,80,9B), V(DF,E2,E2,3D), \
438
+ V(CD,EB,EB,26), V(4E,27,27,69), V(7F,B2,B2,CD), V(EA,75,75,9F), \
439
+ V(12,09,09,1B), V(1D,83,83,9E), V(58,2C,2C,74), V(34,1A,1A,2E), \
440
+ V(36,1B,1B,2D), V(DC,6E,6E,B2), V(B4,5A,5A,EE), V(5B,A0,A0,FB), \
441
+ V(A4,52,52,F6), V(76,3B,3B,4D), V(B7,D6,D6,61), V(7D,B3,B3,CE), \
442
+ V(52,29,29,7B), V(DD,E3,E3,3E), V(5E,2F,2F,71), V(13,84,84,97), \
443
+ V(A6,53,53,F5), V(B9,D1,D1,68), V(00,00,00,00), V(C1,ED,ED,2C), \
444
+ V(40,20,20,60), V(E3,FC,FC,1F), V(79,B1,B1,C8), V(B6,5B,5B,ED), \
445
+ V(D4,6A,6A,BE), V(8D,CB,CB,46), V(67,BE,BE,D9), V(72,39,39,4B), \
446
+ V(94,4A,4A,DE), V(98,4C,4C,D4), V(B0,58,58,E8), V(85,CF,CF,4A), \
447
+ V(BB,D0,D0,6B), V(C5,EF,EF,2A), V(4F,AA,AA,E5), V(ED,FB,FB,16), \
448
+ V(86,43,43,C5), V(9A,4D,4D,D7), V(66,33,33,55), V(11,85,85,94), \
449
+ V(8A,45,45,CF), V(E9,F9,F9,10), V(04,02,02,06), V(FE,7F,7F,81), \
450
+ V(A0,50,50,F0), V(78,3C,3C,44), V(25,9F,9F,BA), V(4B,A8,A8,E3), \
451
+ V(A2,51,51,F3), V(5D,A3,A3,FE), V(80,40,40,C0), V(05,8F,8F,8A), \
452
+ V(3F,92,92,AD), V(21,9D,9D,BC), V(70,38,38,48), V(F1,F5,F5,04), \
453
+ V(63,BC,BC,DF), V(77,B6,B6,C1), V(AF,DA,DA,75), V(42,21,21,63), \
454
+ V(20,10,10,30), V(E5,FF,FF,1A), V(FD,F3,F3,0E), V(BF,D2,D2,6D), \
455
+ V(81,CD,CD,4C), V(18,0C,0C,14), V(26,13,13,35), V(C3,EC,EC,2F), \
456
+ V(BE,5F,5F,E1), V(35,97,97,A2), V(88,44,44,CC), V(2E,17,17,39), \
457
+ V(93,C4,C4,57), V(55,A7,A7,F2), V(FC,7E,7E,82), V(7A,3D,3D,47), \
458
+ V(C8,64,64,AC), V(BA,5D,5D,E7), V(32,19,19,2B), V(E6,73,73,95), \
459
+ V(C0,60,60,A0), V(19,81,81,98), V(9E,4F,4F,D1), V(A3,DC,DC,7F), \
460
+ V(44,22,22,66), V(54,2A,2A,7E), V(3B,90,90,AB), V(0B,88,88,83), \
461
+ V(8C,46,46,CA), V(C7,EE,EE,29), V(6B,B8,B8,D3), V(28,14,14,3C), \
462
+ V(A7,DE,DE,79), V(BC,5E,5E,E2), V(16,0B,0B,1D), V(AD,DB,DB,76), \
463
+ V(DB,E0,E0,3B), V(64,32,32,56), V(74,3A,3A,4E), V(14,0A,0A,1E), \
464
+ V(92,49,49,DB), V(0C,06,06,0A), V(48,24,24,6C), V(B8,5C,5C,E4), \
465
+ V(9F,C2,C2,5D), V(BD,D3,D3,6E), V(43,AC,AC,EF), V(C4,62,62,A6), \
466
+ V(39,91,91,A8), V(31,95,95,A4), V(D3,E4,E4,37), V(F2,79,79,8B), \
467
+ V(D5,E7,E7,32), V(8B,C8,C8,43), V(6E,37,37,59), V(DA,6D,6D,B7), \
468
+ V(01,8D,8D,8C), V(B1,D5,D5,64), V(9C,4E,4E,D2), V(49,A9,A9,E0), \
469
+ V(D8,6C,6C,B4), V(AC,56,56,FA), V(F3,F4,F4,07), V(CF,EA,EA,25), \
470
+ V(CA,65,65,AF), V(F4,7A,7A,8E), V(47,AE,AE,E9), V(10,08,08,18), \
471
+ V(6F,BA,BA,D5), V(F0,78,78,88), V(4A,25,25,6F), V(5C,2E,2E,72), \
472
+ V(38,1C,1C,24), V(57,A6,A6,F1), V(73,B4,B4,C7), V(97,C6,C6,51), \
473
+ V(CB,E8,E8,23), V(A1,DD,DD,7C), V(E8,74,74,9C), V(3E,1F,1F,21), \
474
+ V(96,4B,4B,DD), V(61,BD,BD,DC), V(0D,8B,8B,86), V(0F,8A,8A,85), \
475
+ V(E0,70,70,90), V(7C,3E,3E,42), V(71,B5,B5,C4), V(CC,66,66,AA), \
476
+ V(90,48,48,D8), V(06,03,03,05), V(F7,F6,F6,01), V(1C,0E,0E,12), \
477
+ V(C2,61,61,A3), V(6A,35,35,5F), V(AE,57,57,F9), V(69,B9,B9,D0), \
478
+ V(17,86,86,91), V(99,C1,C1,58), V(3A,1D,1D,27), V(27,9E,9E,B9), \
479
+ V(D9,E1,E1,38), V(EB,F8,F8,13), V(2B,98,98,B3), V(22,11,11,33), \
480
+ V(D2,69,69,BB), V(A9,D9,D9,70), V(07,8E,8E,89), V(33,94,94,A7), \
481
+ V(2D,9B,9B,B6), V(3C,1E,1E,22), V(15,87,87,92), V(C9,E9,E9,20), \
482
+ V(87,CE,CE,49), V(AA,55,55,FF), V(50,28,28,78), V(A5,DF,DF,7A), \
483
+ V(03,8C,8C,8F), V(59,A1,A1,F8), V(09,89,89,80), V(1A,0D,0D,17), \
484
+ V(65,BF,BF,DA), V(D7,E6,E6,31), V(84,42,42,C6), V(D0,68,68,B8), \
485
+ V(82,41,41,C3), V(29,99,99,B0), V(5A,2D,2D,77), V(1E,0F,0F,11), \
486
+ V(7B,B0,B0,CB), V(A8,54,54,FC), V(6D,BB,BB,D6), V(2C,16,16,3A)
487
+
488
+ #define V(a,b,c,d) 0x##a##b##c##d
489
+ static const uint32_t FT0[256] = { FT };
490
+ #undef V
491
+
492
+ #define V(a,b,c,d) 0x##d##a##b##c
493
+ static const uint32_t FT1[256] = { FT };
494
+ #undef V
495
+
496
+ #define V(a,b,c,d) 0x##c##d##a##b
497
+ static const uint32_t FT2[256] = { FT };
498
+ #undef V
499
+
500
+ #define V(a,b,c,d) 0x##b##c##d##a
501
+ static const uint32_t FT3[256] = { FT };
502
+ #undef V
503
+
504
+ #undef FT
505
+
506
+ /* reverse S-box */
507
+
508
+ static const uint32_t RSb[256] =
509
+ {
510
+ 0x52, 0x09, 0x6A, 0xD5, 0x30, 0x36, 0xA5, 0x38,
511
+ 0xBF, 0x40, 0xA3, 0x9E, 0x81, 0xF3, 0xD7, 0xFB,
512
+ 0x7C, 0xE3, 0x39, 0x82, 0x9B, 0x2F, 0xFF, 0x87,
513
+ 0x34, 0x8E, 0x43, 0x44, 0xC4, 0xDE, 0xE9, 0xCB,
514
+ 0x54, 0x7B, 0x94, 0x32, 0xA6, 0xC2, 0x23, 0x3D,
515
+ 0xEE, 0x4C, 0x95, 0x0B, 0x42, 0xFA, 0xC3, 0x4E,
516
+ 0x08, 0x2E, 0xA1, 0x66, 0x28, 0xD9, 0x24, 0xB2,
517
+ 0x76, 0x5B, 0xA2, 0x49, 0x6D, 0x8B, 0xD1, 0x25,
518
+ 0x72, 0xF8, 0xF6, 0x64, 0x86, 0x68, 0x98, 0x16,
519
+ 0xD4, 0xA4, 0x5C, 0xCC, 0x5D, 0x65, 0xB6, 0x92,
520
+ 0x6C, 0x70, 0x48, 0x50, 0xFD, 0xED, 0xB9, 0xDA,
521
+ 0x5E, 0x15, 0x46, 0x57, 0xA7, 0x8D, 0x9D, 0x84,
522
+ 0x90, 0xD8, 0xAB, 0x00, 0x8C, 0xBC, 0xD3, 0x0A,
523
+ 0xF7, 0xE4, 0x58, 0x05, 0xB8, 0xB3, 0x45, 0x06,
524
+ 0xD0, 0x2C, 0x1E, 0x8F, 0xCA, 0x3F, 0x0F, 0x02,
525
+ 0xC1, 0xAF, 0xBD, 0x03, 0x01, 0x13, 0x8A, 0x6B,
526
+ 0x3A, 0x91, 0x11, 0x41, 0x4F, 0x67, 0xDC, 0xEA,
527
+ 0x97, 0xF2, 0xCF, 0xCE, 0xF0, 0xB4, 0xE6, 0x73,
528
+ 0x96, 0xAC, 0x74, 0x22, 0xE7, 0xAD, 0x35, 0x85,
529
+ 0xE2, 0xF9, 0x37, 0xE8, 0x1C, 0x75, 0xDF, 0x6E,
530
+ 0x47, 0xF1, 0x1A, 0x71, 0x1D, 0x29, 0xC5, 0x89,
531
+ 0x6F, 0xB7, 0x62, 0x0E, 0xAA, 0x18, 0xBE, 0x1B,
532
+ 0xFC, 0x56, 0x3E, 0x4B, 0xC6, 0xD2, 0x79, 0x20,
533
+ 0x9A, 0xDB, 0xC0, 0xFE, 0x78, 0xCD, 0x5A, 0xF4,
534
+ 0x1F, 0xDD, 0xA8, 0x33, 0x88, 0x07, 0xC7, 0x31,
535
+ 0xB1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xEC, 0x5F,
536
+ 0x60, 0x51, 0x7F, 0xA9, 0x19, 0xB5, 0x4A, 0x0D,
537
+ 0x2D, 0xE5, 0x7A, 0x9F, 0x93, 0xC9, 0x9C, 0xEF,
538
+ 0xA0, 0xE0, 0x3B, 0x4D, 0xAE, 0x2A, 0xF5, 0xB0,
539
+ 0xC8, 0xEB, 0xBB, 0x3C, 0x83, 0x53, 0x99, 0x61,
540
+ 0x17, 0x2B, 0x04, 0x7E, 0xBA, 0x77, 0xD6, 0x26,
541
+ 0xE1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0C, 0x7D
542
+ };
543
+
544
+ /* reverse tables */
545
+
546
+ #define RT \
547
+ \
548
+ V(51,F4,A7,50), V(7E,41,65,53), V(1A,17,A4,C3), V(3A,27,5E,96), \
549
+ V(3B,AB,6B,CB), V(1F,9D,45,F1), V(AC,FA,58,AB), V(4B,E3,03,93), \
550
+ V(20,30,FA,55), V(AD,76,6D,F6), V(88,CC,76,91), V(F5,02,4C,25), \
551
+ V(4F,E5,D7,FC), V(C5,2A,CB,D7), V(26,35,44,80), V(B5,62,A3,8F), \
552
+ V(DE,B1,5A,49), V(25,BA,1B,67), V(45,EA,0E,98), V(5D,FE,C0,E1), \
553
+ V(C3,2F,75,02), V(81,4C,F0,12), V(8D,46,97,A3), V(6B,D3,F9,C6), \
554
+ V(03,8F,5F,E7), V(15,92,9C,95), V(BF,6D,7A,EB), V(95,52,59,DA), \
555
+ V(D4,BE,83,2D), V(58,74,21,D3), V(49,E0,69,29), V(8E,C9,C8,44), \
556
+ V(75,C2,89,6A), V(F4,8E,79,78), V(99,58,3E,6B), V(27,B9,71,DD), \
557
+ V(BE,E1,4F,B6), V(F0,88,AD,17), V(C9,20,AC,66), V(7D,CE,3A,B4), \
558
+ V(63,DF,4A,18), V(E5,1A,31,82), V(97,51,33,60), V(62,53,7F,45), \
559
+ V(B1,64,77,E0), V(BB,6B,AE,84), V(FE,81,A0,1C), V(F9,08,2B,94), \
560
+ V(70,48,68,58), V(8F,45,FD,19), V(94,DE,6C,87), V(52,7B,F8,B7), \
561
+ V(AB,73,D3,23), V(72,4B,02,E2), V(E3,1F,8F,57), V(66,55,AB,2A), \
562
+ V(B2,EB,28,07), V(2F,B5,C2,03), V(86,C5,7B,9A), V(D3,37,08,A5), \
563
+ V(30,28,87,F2), V(23,BF,A5,B2), V(02,03,6A,BA), V(ED,16,82,5C), \
564
+ V(8A,CF,1C,2B), V(A7,79,B4,92), V(F3,07,F2,F0), V(4E,69,E2,A1), \
565
+ V(65,DA,F4,CD), V(06,05,BE,D5), V(D1,34,62,1F), V(C4,A6,FE,8A), \
566
+ V(34,2E,53,9D), V(A2,F3,55,A0), V(05,8A,E1,32), V(A4,F6,EB,75), \
567
+ V(0B,83,EC,39), V(40,60,EF,AA), V(5E,71,9F,06), V(BD,6E,10,51), \
568
+ V(3E,21,8A,F9), V(96,DD,06,3D), V(DD,3E,05,AE), V(4D,E6,BD,46), \
569
+ V(91,54,8D,B5), V(71,C4,5D,05), V(04,06,D4,6F), V(60,50,15,FF), \
570
+ V(19,98,FB,24), V(D6,BD,E9,97), V(89,40,43,CC), V(67,D9,9E,77), \
571
+ V(B0,E8,42,BD), V(07,89,8B,88), V(E7,19,5B,38), V(79,C8,EE,DB), \
572
+ V(A1,7C,0A,47), V(7C,42,0F,E9), V(F8,84,1E,C9), V(00,00,00,00), \
573
+ V(09,80,86,83), V(32,2B,ED,48), V(1E,11,70,AC), V(6C,5A,72,4E), \
574
+ V(FD,0E,FF,FB), V(0F,85,38,56), V(3D,AE,D5,1E), V(36,2D,39,27), \
575
+ V(0A,0F,D9,64), V(68,5C,A6,21), V(9B,5B,54,D1), V(24,36,2E,3A), \
576
+ V(0C,0A,67,B1), V(93,57,E7,0F), V(B4,EE,96,D2), V(1B,9B,91,9E), \
577
+ V(80,C0,C5,4F), V(61,DC,20,A2), V(5A,77,4B,69), V(1C,12,1A,16), \
578
+ V(E2,93,BA,0A), V(C0,A0,2A,E5), V(3C,22,E0,43), V(12,1B,17,1D), \
579
+ V(0E,09,0D,0B), V(F2,8B,C7,AD), V(2D,B6,A8,B9), V(14,1E,A9,C8), \
580
+ V(57,F1,19,85), V(AF,75,07,4C), V(EE,99,DD,BB), V(A3,7F,60,FD), \
581
+ V(F7,01,26,9F), V(5C,72,F5,BC), V(44,66,3B,C5), V(5B,FB,7E,34), \
582
+ V(8B,43,29,76), V(CB,23,C6,DC), V(B6,ED,FC,68), V(B8,E4,F1,63), \
583
+ V(D7,31,DC,CA), V(42,63,85,10), V(13,97,22,40), V(84,C6,11,20), \
584
+ V(85,4A,24,7D), V(D2,BB,3D,F8), V(AE,F9,32,11), V(C7,29,A1,6D), \
585
+ V(1D,9E,2F,4B), V(DC,B2,30,F3), V(0D,86,52,EC), V(77,C1,E3,D0), \
586
+ V(2B,B3,16,6C), V(A9,70,B9,99), V(11,94,48,FA), V(47,E9,64,22), \
587
+ V(A8,FC,8C,C4), V(A0,F0,3F,1A), V(56,7D,2C,D8), V(22,33,90,EF), \
588
+ V(87,49,4E,C7), V(D9,38,D1,C1), V(8C,CA,A2,FE), V(98,D4,0B,36), \
589
+ V(A6,F5,81,CF), V(A5,7A,DE,28), V(DA,B7,8E,26), V(3F,AD,BF,A4), \
590
+ V(2C,3A,9D,E4), V(50,78,92,0D), V(6A,5F,CC,9B), V(54,7E,46,62), \
591
+ V(F6,8D,13,C2), V(90,D8,B8,E8), V(2E,39,F7,5E), V(82,C3,AF,F5), \
592
+ V(9F,5D,80,BE), V(69,D0,93,7C), V(6F,D5,2D,A9), V(CF,25,12,B3), \
593
+ V(C8,AC,99,3B), V(10,18,7D,A7), V(E8,9C,63,6E), V(DB,3B,BB,7B), \
594
+ V(CD,26,78,09), V(6E,59,18,F4), V(EC,9A,B7,01), V(83,4F,9A,A8), \
595
+ V(E6,95,6E,65), V(AA,FF,E6,7E), V(21,BC,CF,08), V(EF,15,E8,E6), \
596
+ V(BA,E7,9B,D9), V(4A,6F,36,CE), V(EA,9F,09,D4), V(29,B0,7C,D6), \
597
+ V(31,A4,B2,AF), V(2A,3F,23,31), V(C6,A5,94,30), V(35,A2,66,C0), \
598
+ V(74,4E,BC,37), V(FC,82,CA,A6), V(E0,90,D0,B0), V(33,A7,D8,15), \
599
+ V(F1,04,98,4A), V(41,EC,DA,F7), V(7F,CD,50,0E), V(17,91,F6,2F), \
600
+ V(76,4D,D6,8D), V(43,EF,B0,4D), V(CC,AA,4D,54), V(E4,96,04,DF), \
601
+ V(9E,D1,B5,E3), V(4C,6A,88,1B), V(C1,2C,1F,B8), V(46,65,51,7F), \
602
+ V(9D,5E,EA,04), V(01,8C,35,5D), V(FA,87,74,73), V(FB,0B,41,2E), \
603
+ V(B3,67,1D,5A), V(92,DB,D2,52), V(E9,10,56,33), V(6D,D6,47,13), \
604
+ V(9A,D7,61,8C), V(37,A1,0C,7A), V(59,F8,14,8E), V(EB,13,3C,89), \
605
+ V(CE,A9,27,EE), V(B7,61,C9,35), V(E1,1C,E5,ED), V(7A,47,B1,3C), \
606
+ V(9C,D2,DF,59), V(55,F2,73,3F), V(18,14,CE,79), V(73,C7,37,BF), \
607
+ V(53,F7,CD,EA), V(5F,FD,AA,5B), V(DF,3D,6F,14), V(78,44,DB,86), \
608
+ V(CA,AF,F3,81), V(B9,68,C4,3E), V(38,24,34,2C), V(C2,A3,40,5F), \
609
+ V(16,1D,C3,72), V(BC,E2,25,0C), V(28,3C,49,8B), V(FF,0D,95,41), \
610
+ V(39,A8,01,71), V(08,0C,B3,DE), V(D8,B4,E4,9C), V(64,56,C1,90), \
611
+ V(7B,CB,84,61), V(D5,32,B6,70), V(48,6C,5C,74), V(D0,B8,57,42)
612
+
613
+ #define V(a,b,c,d) 0x##a##b##c##d
614
+ static const uint32_t RT0[256] = { RT };
615
+ #undef V
616
+
617
+ #define V(a,b,c,d) 0x##d##a##b##c
618
+ static const uint32_t RT1[256] = { RT };
619
+ #undef V
620
+
621
+ #define V(a,b,c,d) 0x##c##d##a##b
622
+ static const uint32_t RT2[256] = { RT };
623
+ #undef V
624
+
625
+ #define V(a,b,c,d) 0x##b##c##d##a
626
+ static const uint32_t RT3[256] = { RT };
627
+ #undef V
628
+
629
+ #undef RT
630
+
631
+ /* round constants */
632
+
633
+ static const uint32_t RCON[10] =
634
+ {
635
+ 0x01000000, 0x02000000, 0x04000000, 0x08000000,
636
+ 0x10000000, 0x20000000, 0x40000000, 0x80000000,
637
+ 0x1B000000, 0x36000000
638
+ };
639
+
640
+ void aes_gen_tables( void )
641
+ {
642
+ }
643
+
644
+ #endif
645
+
646
+ /* platform-independant 32-bit integer manipulation macros */
647
+
648
+ #define GET_UINT32(n,b,i) \
649
+ { \
650
+ (n) = ( (uint32_t) (b)[(i) ] << 24 ) \
651
+ | ( (uint32_t) (b)[(i) + 1] << 16 ) \
652
+ | ( (uint32_t) (b)[(i) + 2] << 8 ) \
653
+ | ( (uint32_t) (b)[(i) + 3] ); \
654
+ }
655
+
656
+ #define PUT_UINT32(n,b,i) \
657
+ { \
658
+ (b)[(i) ] = (uint8_t) ( (n) >> 24 ); \
659
+ (b)[(i) + 1] = (uint8_t) ( (n) >> 16 ); \
660
+ (b)[(i) + 2] = (uint8_t) ( (n) >> 8 ); \
661
+ (b)[(i) + 3] = (uint8_t) ( (n) ); \
662
+ }
663
+
664
+ /* decryption key schedule tables */
665
+
666
+ int KT_init = 1;
667
+
668
+ uint32_t KT0[256];
669
+ uint32_t KT1[256];
670
+ uint32_t KT2[256];
671
+ uint32_t KT3[256];
672
+
673
+ uint32_t initial_KT0[256];
674
+ uint32_t initial_KT1[256];
675
+ uint32_t initial_KT2[256];
676
+ uint32_t initial_KT3[256];
677
+
678
+ /* AES key scheduling routine */
679
+
680
+ int
681
+ fast_aes_initialize_state(fast_aes_t* fast_aes)
682
+ {
683
+ int i;
684
+ uint32_t *RK, *SK;
685
+
686
+ switch( fast_aes->key_bits )
687
+ {
688
+ case 128: fast_aes->nr = 10; break;
689
+ case 192: fast_aes->nr = 12; break;
690
+ case 256: fast_aes->nr = 14; break;
691
+ default : return( 1 );
692
+ }
693
+
694
+ RK = fast_aes->erk;
695
+
696
+ for( i = 0; i < (fast_aes->key_bits >> 5); ++i )
697
+ {
698
+ GET_UINT32(
699
+ fast_aes->erk[i],
700
+ ((unsigned char*)fast_aes->key),
701
+ i * 4
702
+ );
703
+ }
704
+
705
+ /* setup encryption round keys */
706
+
707
+ switch( fast_aes->key_bits )
708
+ {
709
+ case 128:
710
+
711
+ for( i = 0; i < 10; ++i, RK += 4 )
712
+ {
713
+ RK[4] = RK[0] ^ RCON[i] ^
714
+ ( FSb[ (uint8_t) ( RK[3] >> 16 ) ] << 24 ) ^
715
+ ( FSb[ (uint8_t) ( RK[3] >> 8 ) ] << 16 ) ^
716
+ ( FSb[ (uint8_t) ( RK[3] ) ] << 8 ) ^
717
+ ( FSb[ (uint8_t) ( RK[3] >> 24 ) ] );
718
+
719
+ RK[5] = RK[1] ^ RK[4];
720
+ RK[6] = RK[2] ^ RK[5];
721
+ RK[7] = RK[3] ^ RK[6];
722
+ }
723
+ break;
724
+
725
+ case 192:
726
+
727
+ for( i = 0; i < 8; ++i, RK += 6 )
728
+ {
729
+ RK[6] = RK[0] ^ RCON[i] ^
730
+ ( FSb[ (uint8_t) ( RK[5] >> 16 ) ] << 24 ) ^
731
+ ( FSb[ (uint8_t) ( RK[5] >> 8 ) ] << 16 ) ^
732
+ ( FSb[ (uint8_t) ( RK[5] ) ] << 8 ) ^
733
+ ( FSb[ (uint8_t) ( RK[5] >> 24 ) ] );
734
+
735
+ RK[7] = RK[1] ^ RK[6];
736
+ RK[8] = RK[2] ^ RK[7];
737
+ RK[9] = RK[3] ^ RK[8];
738
+ RK[10] = RK[4] ^ RK[9];
739
+ RK[11] = RK[5] ^ RK[10];
740
+ }
741
+ break;
742
+
743
+ case 256:
744
+
745
+ for( i = 0; i < 7; ++i, RK += 8 )
746
+ {
747
+ RK[8] = RK[0] ^ RCON[i] ^
748
+ ( FSb[ (uint8_t) ( RK[7] >> 16 ) ] << 24 ) ^
749
+ ( FSb[ (uint8_t) ( RK[7] >> 8 ) ] << 16 ) ^
750
+ ( FSb[ (uint8_t) ( RK[7] ) ] << 8 ) ^
751
+ ( FSb[ (uint8_t) ( RK[7] >> 24 ) ] );
752
+
753
+ RK[9] = RK[1] ^ RK[8];
754
+ RK[10] = RK[2] ^ RK[9];
755
+ RK[11] = RK[3] ^ RK[10];
756
+
757
+ RK[12] = RK[4] ^
758
+ ( FSb[ (uint8_t) ( RK[11] >> 24 ) ] << 24 ) ^
759
+ ( FSb[ (uint8_t) ( RK[11] >> 16 ) ] << 16 ) ^
760
+ ( FSb[ (uint8_t) ( RK[11] >> 8 ) ] << 8 ) ^
761
+ ( FSb[ (uint8_t) ( RK[11] ) ] );
762
+
763
+ RK[13] = RK[5] ^ RK[12];
764
+ RK[14] = RK[6] ^ RK[13];
765
+ RK[15] = RK[7] ^ RK[14];
766
+ }
767
+ break;
768
+ }
769
+
770
+ /* setup decryption round keys */
771
+
772
+ if( KT_init )
773
+ {
774
+ for( i = 0; i < 256; ++i )
775
+ {
776
+ KT0[i] = RT0[ FSb[i] ];
777
+ KT1[i] = RT1[ FSb[i] ];
778
+ KT2[i] = RT2[ FSb[i] ];
779
+ KT3[i] = RT3[ FSb[i] ];
780
+ }
781
+
782
+ KT_init = 0;
783
+ }
784
+
785
+ SK = fast_aes->drk;
786
+
787
+ *SK++ = *RK++;
788
+ *SK++ = *RK++;
789
+ *SK++ = *RK++;
790
+ *SK++ = *RK++;
791
+
792
+ for( i = 1; i < fast_aes->nr; ++i )
793
+ {
794
+ RK -= 8;
795
+
796
+ *SK++ = KT0[ (uint8_t) ( *RK >> 24 ) ] ^
797
+ KT1[ (uint8_t) ( *RK >> 16 ) ] ^
798
+ KT2[ (uint8_t) ( *RK >> 8 ) ] ^
799
+ KT3[ (uint8_t) ( *RK ) ]; RK++;
800
+
801
+ *SK++ = KT0[ (uint8_t) ( *RK >> 24 ) ] ^
802
+ KT1[ (uint8_t) ( *RK >> 16 ) ] ^
803
+ KT2[ (uint8_t) ( *RK >> 8 ) ] ^
804
+ KT3[ (uint8_t) ( *RK ) ]; RK++;
805
+
806
+ *SK++ = KT0[ (uint8_t) ( *RK >> 24 ) ] ^
807
+ KT1[ (uint8_t) ( *RK >> 16 ) ] ^
808
+ KT2[ (uint8_t) ( *RK >> 8 ) ] ^
809
+ KT3[ (uint8_t) ( *RK ) ]; RK++;
810
+
811
+ *SK++ = KT0[ (uint8_t) ( *RK >> 24 ) ] ^
812
+ KT1[ (uint8_t) ( *RK >> 16 ) ] ^
813
+ KT2[ (uint8_t) ( *RK >> 8 ) ] ^
814
+ KT3[ (uint8_t) ( *RK ) ]; RK++;
815
+ }
816
+
817
+ RK -= 8;
818
+
819
+ *SK++ = *RK++;
820
+ *SK++ = *RK++;
821
+ *SK++ = *RK++;
822
+ *SK++ = *RK++;
823
+
824
+ /* setup values for fast re-initialization */
825
+ memcpy(fast_aes->initial_erk, fast_aes->erk, sizeof(fast_aes->initial_erk));
826
+ memcpy(fast_aes->initial_drk, fast_aes->drk, sizeof(fast_aes->initial_drk));
827
+ return 0;
828
+ }
829
+
830
+ int
831
+ fast_aes_reinitialize_state(fast_aes_t* fast_aes)
832
+ {
833
+ /* put round keys for encryption and decryption back to their initial
834
+ // states so we can encrypt and decrypt new items properly
835
+ */
836
+ memcpy(fast_aes->erk, fast_aes->initial_erk, sizeof(fast_aes->initial_erk));
837
+ memcpy(fast_aes->drk, fast_aes->initial_drk, sizeof(fast_aes->initial_drk));
838
+
839
+ return 0;
840
+ }
841
+
842
+ /* AES 128-bit block encryption routine */
843
+
844
+ void
845
+ fast_aes_encrypt_block(fast_aes_t* fast_aes, uint8_t input[16], uint8_t output[16])
846
+ {
847
+ uint32_t *RK, X0, X1, X2, X3, Y0, Y1, Y2, Y3;
848
+
849
+ RK = fast_aes->erk;
850
+
851
+ GET_UINT32( X0, input, 0 ); X0 ^= RK[0];
852
+ GET_UINT32( X1, input, 4 ); X1 ^= RK[1];
853
+ GET_UINT32( X2, input, 8 ); X2 ^= RK[2];
854
+ GET_UINT32( X3, input, 12 ); X3 ^= RK[3];
855
+
856
+ #define AES_FROUND(X0,X1,X2,X3,Y0,Y1,Y2,Y3) \
857
+ { \
858
+ RK += 4; \
859
+ \
860
+ X0 = RK[0] ^ FT0[ (uint8_t) ( Y0 >> 24 ) ] ^ \
861
+ FT1[ (uint8_t) ( Y1 >> 16 ) ] ^ \
862
+ FT2[ (uint8_t) ( Y2 >> 8 ) ] ^ \
863
+ FT3[ (uint8_t) ( Y3 ) ]; \
864
+ \
865
+ X1 = RK[1] ^ FT0[ (uint8_t) ( Y1 >> 24 ) ] ^ \
866
+ FT1[ (uint8_t) ( Y2 >> 16 ) ] ^ \
867
+ FT2[ (uint8_t) ( Y3 >> 8 ) ] ^ \
868
+ FT3[ (uint8_t) ( Y0 ) ]; \
869
+ \
870
+ X2 = RK[2] ^ FT0[ (uint8_t) ( Y2 >> 24 ) ] ^ \
871
+ FT1[ (uint8_t) ( Y3 >> 16 ) ] ^ \
872
+ FT2[ (uint8_t) ( Y0 >> 8 ) ] ^ \
873
+ FT3[ (uint8_t) ( Y1 ) ]; \
874
+ \
875
+ X3 = RK[3] ^ FT0[ (uint8_t) ( Y3 >> 24 ) ] ^ \
876
+ FT1[ (uint8_t) ( Y0 >> 16 ) ] ^ \
877
+ FT2[ (uint8_t) ( Y1 >> 8 ) ] ^ \
878
+ FT3[ (uint8_t) ( Y2 ) ]; \
879
+ }
880
+
881
+ AES_FROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 1 */
882
+ AES_FROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 2 */
883
+ AES_FROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 3 */
884
+ AES_FROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 4 */
885
+ AES_FROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 5 */
886
+ AES_FROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 6 */
887
+ AES_FROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 7 */
888
+ AES_FROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 8 */
889
+ AES_FROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 9 */
890
+
891
+ if( fast_aes->nr > 10 )
892
+ {
893
+ AES_FROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 10 */
894
+ AES_FROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 11 */
895
+ }
896
+
897
+ if( fast_aes->nr > 12 )
898
+ {
899
+ AES_FROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 12 */
900
+ AES_FROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 13 */
901
+ }
902
+
903
+ /* last round */
904
+
905
+ RK += 4;
906
+
907
+ X0 = RK[0] ^ ( FSb[ (uint8_t) ( Y0 >> 24 ) ] << 24 ) ^
908
+ ( FSb[ (uint8_t) ( Y1 >> 16 ) ] << 16 ) ^
909
+ ( FSb[ (uint8_t) ( Y2 >> 8 ) ] << 8 ) ^
910
+ ( FSb[ (uint8_t) ( Y3 ) ] );
911
+
912
+ X1 = RK[1] ^ ( FSb[ (uint8_t) ( Y1 >> 24 ) ] << 24 ) ^
913
+ ( FSb[ (uint8_t) ( Y2 >> 16 ) ] << 16 ) ^
914
+ ( FSb[ (uint8_t) ( Y3 >> 8 ) ] << 8 ) ^
915
+ ( FSb[ (uint8_t) ( Y0 ) ] );
916
+
917
+ X2 = RK[2] ^ ( FSb[ (uint8_t) ( Y2 >> 24 ) ] << 24 ) ^
918
+ ( FSb[ (uint8_t) ( Y3 >> 16 ) ] << 16 ) ^
919
+ ( FSb[ (uint8_t) ( Y0 >> 8 ) ] << 8 ) ^
920
+ ( FSb[ (uint8_t) ( Y1 ) ] );
921
+
922
+ X3 = RK[3] ^ ( FSb[ (uint8_t) ( Y3 >> 24 ) ] << 24 ) ^
923
+ ( FSb[ (uint8_t) ( Y0 >> 16 ) ] << 16 ) ^
924
+ ( FSb[ (uint8_t) ( Y1 >> 8 ) ] << 8 ) ^
925
+ ( FSb[ (uint8_t) ( Y2 ) ] );
926
+
927
+ PUT_UINT32( X0, output, 0 );
928
+ PUT_UINT32( X1, output, 4 );
929
+ PUT_UINT32( X2, output, 8 );
930
+ PUT_UINT32( X3, output, 12 );
931
+ }
932
+
933
+ /* AES 128-bit block decryption routine */
934
+
935
+ void
936
+ fast_aes_decrypt_block(fast_aes_t* fast_aes, uint8_t input[16], uint8_t output[16])
937
+ {
938
+ uint32_t *RK, X0, X1, X2, X3, Y0, Y1, Y2, Y3;
939
+
940
+ RK = fast_aes->drk;
941
+
942
+ GET_UINT32( X0, input, 0 ); X0 ^= RK[0];
943
+ GET_UINT32( X1, input, 4 ); X1 ^= RK[1];
944
+ GET_UINT32( X2, input, 8 ); X2 ^= RK[2];
945
+ GET_UINT32( X3, input, 12 ); X3 ^= RK[3];
946
+
947
+ #define AES_RROUND(X0,X1,X2,X3,Y0,Y1,Y2,Y3) \
948
+ { \
949
+ RK += 4; \
950
+ \
951
+ X0 = RK[0] ^ RT0[ (uint8_t) ( Y0 >> 24 ) ] ^ \
952
+ RT1[ (uint8_t) ( Y3 >> 16 ) ] ^ \
953
+ RT2[ (uint8_t) ( Y2 >> 8 ) ] ^ \
954
+ RT3[ (uint8_t) ( Y1 ) ]; \
955
+ \
956
+ X1 = RK[1] ^ RT0[ (uint8_t) ( Y1 >> 24 ) ] ^ \
957
+ RT1[ (uint8_t) ( Y0 >> 16 ) ] ^ \
958
+ RT2[ (uint8_t) ( Y3 >> 8 ) ] ^ \
959
+ RT3[ (uint8_t) ( Y2 ) ]; \
960
+ \
961
+ X2 = RK[2] ^ RT0[ (uint8_t) ( Y2 >> 24 ) ] ^ \
962
+ RT1[ (uint8_t) ( Y1 >> 16 ) ] ^ \
963
+ RT2[ (uint8_t) ( Y0 >> 8 ) ] ^ \
964
+ RT3[ (uint8_t) ( Y3 ) ]; \
965
+ \
966
+ X3 = RK[3] ^ RT0[ (uint8_t) ( Y3 >> 24 ) ] ^ \
967
+ RT1[ (uint8_t) ( Y2 >> 16 ) ] ^ \
968
+ RT2[ (uint8_t) ( Y1 >> 8 ) ] ^ \
969
+ RT3[ (uint8_t) ( Y0 ) ]; \
970
+ }
971
+
972
+ AES_RROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 1 */
973
+ AES_RROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 2 */
974
+ AES_RROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 3 */
975
+ AES_RROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 4 */
976
+ AES_RROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 5 */
977
+ AES_RROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 6 */
978
+ AES_RROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 7 */
979
+ AES_RROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 8 */
980
+ AES_RROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 9 */
981
+
982
+ if( fast_aes->nr > 10 )
983
+ {
984
+ AES_RROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 10 */
985
+ AES_RROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 11 */
986
+ }
987
+
988
+ if( fast_aes->nr > 12 )
989
+ {
990
+ AES_RROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 ); /* round 12 */
991
+ AES_RROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 ); /* round 13 */
992
+ }
993
+
994
+ /* last round */
995
+
996
+ RK += 4;
997
+
998
+ X0 = RK[0] ^ ( RSb[ (uint8_t) ( Y0 >> 24 ) ] << 24 ) ^
999
+ ( RSb[ (uint8_t) ( Y3 >> 16 ) ] << 16 ) ^
1000
+ ( RSb[ (uint8_t) ( Y2 >> 8 ) ] << 8 ) ^
1001
+ ( RSb[ (uint8_t) ( Y1 ) ] );
1002
+
1003
+ X1 = RK[1] ^ ( RSb[ (uint8_t) ( Y1 >> 24 ) ] << 24 ) ^
1004
+ ( RSb[ (uint8_t) ( Y0 >> 16 ) ] << 16 ) ^
1005
+ ( RSb[ (uint8_t) ( Y3 >> 8 ) ] << 8 ) ^
1006
+ ( RSb[ (uint8_t) ( Y2 ) ] );
1007
+
1008
+ X2 = RK[2] ^ ( RSb[ (uint8_t) ( Y2 >> 24 ) ] << 24 ) ^
1009
+ ( RSb[ (uint8_t) ( Y1 >> 16 ) ] << 16 ) ^
1010
+ ( RSb[ (uint8_t) ( Y0 >> 8 ) ] << 8 ) ^
1011
+ ( RSb[ (uint8_t) ( Y3 ) ] );
1012
+
1013
+ X3 = RK[3] ^ ( RSb[ (uint8_t) ( Y3 >> 24 ) ] << 24 ) ^
1014
+ ( RSb[ (uint8_t) ( Y2 >> 16 ) ] << 16 ) ^
1015
+ ( RSb[ (uint8_t) ( Y1 >> 8 ) ] << 8 ) ^
1016
+ ( RSb[ (uint8_t) ( Y0 ) ] );
1017
+
1018
+ PUT_UINT32( X0, output, 0 );
1019
+ PUT_UINT32( X1, output, 4 );
1020
+ PUT_UINT32( X2, output, 8 );
1021
+ PUT_UINT32( X3, output, 12 );
1022
+ }
1023
+