Software Portability And Optimization: SPO600 - Phase 2 - Compiling Optimizations And Benchmarking

This post is going to be pretty long as I will be going through following stages below

Setting up md5deep in x86_64 and running multiple tests of md5deep
Bench-marking of md5deep x86_64 without optimization and with optimization
Bench-marking of md5deep on AArch64 with optimization as I have already bench-marked md5deep on Aarch64 without optimization in SPO600 - Stage 1 blog

So setting up md5deep on x86_64 would be same as we set it up on Aarch64 but I will be going through it real quick

Download md5deep package(tar.gz) from https://github.com/jessek/hashdeep/releases/tag/release-4.4
Using scp or sftp transfer that tar.gz file to x86_64 architecture
Unpack that package using tar -xvzf name of the file command
Enter into unpacked folder and start typing following commands
sh bootstrap.sh
./configure
make install
Generate multiple files with some random data in it for testing purposes by typing following command keep count same for the first time but then change file name and cout to 10 and 100 respectively this will create 3 files with size of 10mb,105mb and 1.0gb. dd if=/dev/urandom of=hello2.txt bs=1048576 count=1000
time for i in {1..100}; do ./md5deep hello.txt; done by typing this command md5deep is gonna run your file 100 times and will give total time to execution and run the whole program but we have to take average by dividing by 100 of it so that we can get time of each run-time

(After taking average)

(10MB)
real    0m0.033s
user    0m0.028s
sys     0m0.007s

(105MB)
real    0m0.309s
user    0m0.284s
sys     0m0.037s

(1.0GB)
real    0m3.727s
user    0m3.556s
sys     0m0.458s

Before taking average

(10MB)
real    0m3.303s
user    0m2.875s
sys     0m0.742s

(105MB)
real    0m30.971s
user    0m28.934s
sys     0m3.921s

(1.0GB)
real    5m13.540s
user    4m55.446s
sys     0m46.945s

Test above are without optimization running on O0 by default. This command will help change all optimization flags

find -name Makefile | xargs sed -i 's/-O0/-O2/'

just to make sure your changes have applied just check anyone of the Makefiles and you would see O2 wherever O0 is there

-O2 flag
10 mb file        105 mb file    1 gb file
real: 0m0.041s    real: 0m0.387s    real: 0m4.48s
user: 0m0.011s    user: 0m0.250s    user: 0m3.73s
sys: 0m0.005s    sys: 0m0.015s    sys: 0m0.84s

-O3 flag
10 mb file    105 mb file    1 gb file
real: 0m0.035s    real: 0m0.158s    real: 0m2.071s
user: 0m0.010s    user: 0m0.97s    user: 0m1.756s
sys: 0m0.003s    sys: 0m0.014s    sys: 0m0.77s

By comparing the real time you can see the -O3 flag makes the program faster. Which is a pretty nice optimization. So I recommend they replace the -O2 flag with -O3.

Now I will be bench-marking and compiling optimizations in Aarch64

-O2 flag
10.5 mb file    105 mb file    1.5 gb file
real: 0m0.032s    real: 0m0.355s    real: 0m4.892s
user: 0m0.025s    user: 0m0.333s    user: 0m4.651s
sys: 0m0.001s    sys: 0m0.028s    sys: 0m0.500s

-O3 flag
10.5 mb file    105 mb file    1.5 gb file
real: 0m0.035s    real: 0m0.333s    real: 0m4.668s
user: 0m0.025s    user: 0m0.325s    user: 0m4.399s
sys: 0m0.001s    sys: 0m0.027s    sys: 0m0.326s

Finally done with optimization of md5deep on aarch64 with O2 and O3 it didn't gave expected results. So from the bench-marking and optimizations applied on Aarch64 and x86_64 we can say that x86_64 has perfectly given expected results by properly having compatibility with optimization flags like O2 and O3. I would surely recommend to use O3 flags in x86_64 architecture while executing md5deep program. But G Profiling really helped finding functions which were giving calls to other functions for example: multihash_update(unsigned char const*, unsigned int)
void hash_final_sha1(void * ctx, unsigned char *sum) { }but it seemed pretty hard to make any changes. But I guess changing optimization flags was an alternate option which worked overall so my phase 3 for the project will be based on up streaming my optimization bench-marking

Software Portability And Optimization

Thursday, 29 March 2018

SPO600 - Phase 2 - Compiling Optimizations And Benchmarking

No comments:

Post a Comment