This directory contains file and shell scripts tstaes.c makegenarm.sh makegenx86.sh makeoptx86.sh that can be used to build executables. These executable are used to validate the implementation and to benchmark the performance of the aes functions in the kernel. This directory also serves as a development environment for porting of the aes functions to any new architectures. On xnu-1699.20.6 (from which we add this work), the generic aes source code sits at bsd/crypto/aes/gen. The x86_64 and i386 architectural optimization is given in bsd/crypto/aes/i386. After making some code corrections (aes.h and most assembly code in i386), now you can build a test executable that is functionally equivalent to aes in the kernel code. To generate a test executable for the aes in x86_64/i386 kernel, $ makeoptx86.sh This will build a test executable tstaesoptx86 (x86_64/i386). The executable will automatically detects the CPU clock rates. You specify the number of iterations and the number of 16-byte blocks for simulation. The executable generates (random number) the test data, and calls aes_encrypt_cbc to encrypt the plain data into cipher data, and then calls aes_decrypt_cbc to decrypt cipher into decrypted data. Afterwards, it compares the decrypted data against the plain data. Should there be a mismatch, the code breaks and exit. Otherwise, it measures the times the system spends on the 2 functions under test. Afterwards, it prints out the performance profiling data. On K5, $ tstaesoptx86 1000 2560 device max CPU clock rate = 2659.00 MHz 40960 bytes per cbc call aes_encrypt_cbc : time elapsed = 220.24 usecs, 177.37 MBytes/sec, 14.30 cycles/byte best iteration : time elapsed = 218.30 usecs, 178.94 MBytes/sec, 14.17 cycles/byte worst iteration : time elapsed = 286.14 usecs, 136.51 MBytes/sec, 18.58 cycles/byte aes_decrypt_cbc : time elapsed = 199.85 usecs, 195.46 MBytes/sec, 12.97 cycles/byte best iteration : time elapsed = 198.17 usecs, 197.12 MBytes/sec, 12.86 cycles/byte worst iteration : time elapsed = 228.12 usecs, 171.23 MBytes/sec, 14.81 cycles/byte On K5B (with aesni) $ tstaesoptx86 1000 256 device max CPU clock rate = 2400.00 MHz 4096 bytes per cbc call aes_encrypt_cbc : time elapsed = 6.69 usecs, 583.67 MBytes/sec, 3.92 cycles/byte best iteration : time elapsed = 6.38 usecs, 612.46 MBytes/sec, 3.74 cycles/byte worst iteration : time elapsed = 9.72 usecs, 401.96 MBytes/sec, 5.69 cycles/byte aes_decrypt_cbc : time elapsed = 2.05 usecs, 1902.65 MBytes/sec, 1.20 cycles/byte best iteration : time elapsed = 1.96 usecs, 1997.06 MBytes/sec, 1.15 cycles/byte worst iteration : time elapsed = 4.60 usecs, 849.00 MBytes/sec, 2.70 cycles/byte You can also build a test executable using the generic source code for the i386/x86_64 architecture. $ makegenx86.sh When run on K5, $ tstaesgenx86 1000 2560 device max CPU clock rate = 2659.00 MHz 40960 bytes per cbc call aes_encrypt_cbc : time elapsed = 278.05 usecs, 140.49 MBytes/sec, 18.05 cycles/byte best iteration : time elapsed = 274.63 usecs, 142.24 MBytes/sec, 17.83 cycles/byte worst iteration : time elapsed = 309.70 usecs, 126.13 MBytes/sec, 20.10 cycles/byte aes_decrypt_cbc : time elapsed = 265.43 usecs, 147.17 MBytes/sec, 17.23 cycles/byte best iteration : time elapsed = 262.20 usecs, 148.98 MBytes/sec, 17.02 cycles/byte worst iteration : time elapsed = 296.19 usecs, 131.88 MBytes/sec, 19.23 cycles/byte We can see the current AES implementation in the x86_64 kernel has been improved from 17.83/17.02 down to 14.12/12.86 cycles/byte for aes_encrypt_cbc and aes_decrypt_cbc, respectively. --------- iOS --------- Similarly, you can build a test executable for the aes in the armv7 kernel (which uses the generic source code) $ makegenarm.sh Note that you need the iOS SDK installed. We can then copy this executable to iOS devices for simulation. On N88, iPhone:~ root# ./tstaesgenarm 1000 2560 device max CPU clock rate = 600.00 MHz 40960 bytes per cbc call aes_encrypt_cbc : time elapsed = 2890.18 usecs, 13.52 MBytes/sec, 42.34 cycles/byte best iteration : time elapsed = 2692.00 usecs, 14.51 MBytes/sec, 39.43 cycles/byte worst iteration : time elapsed = 18248.33 usecs, 2.14 MBytes/sec, 267.31 cycles/byte aes_decrypt_cbc : time elapsed = 3078.20 usecs, 12.69 MBytes/sec, 45.09 cycles/byte best iteration : time elapsed = 2873.33 usecs, 13.59 MBytes/sec, 42.09 cycles/byte worst iteration : time elapsed = 9664.79 usecs, 4.04 MBytes/sec, 141.57 cycles/byte