Thursday, 1 March 2018

SPO600 Project Stage 1

For this project I would be working on MD5DEEP software package used in the computer security, system administration and computer forensics communities communities to run large number of files through several cryptographic digests. Well basically this software uses hashing as their cryptographic method for encryption to keep file data secure and safe. Hashing is a method for reducing large inputs to a  smaller fixed size output.

MD5DEEP consists of SHA variants like Tiger and WhirlPool. It has many more functionalities as decribed below

Recursive operation - Md5deep has the capacity to recursive analyze an whole registry tree. That is, figure the MD5 to each document in An registry and for each document in each subdirectory.

Comparison mode - Md5deep might accept a rundown about known hashes What's more think about them with An set from claiming enter files. The project camwood show Possibly the individuals information files that match those rundown of referred to hashes or the individuals that don't match. Hashes sets camwood make drawn from Encase, the national product reference Library, iLook Investigator, Hashkeeper, md5sum, BSD md5, What's more other non specific hash generating projects. Clients need aid welcome will include purpose will read different formats too!.

Time estimation - md5deep can produce a time estimate when it's processing very large files.

Piecewise hashing - Hash input files in arbitrary sized blocks

File type mode - md5deep can process only files of a certain type, such as regular files, block devices, etc.

credits:http://md5deep.sourceforge.net/

So my approach would be to explore MD5DEEP understand concepts, explain methods, implement and try to optimize its code for better performance. I downloaded MD5DEEP from  https://github.com/jessek/hashdeep/releases just for information MD5DEEP is platform or architecture dependent so there might be chances you might come across platform or architecture dependent issues. Luckily I didn't come across such issue while downloading and installing it on Linux platform. The repository which has hashdeep source to download was last updated in 2014 which is pretty old and has hashdeep version 4.4 which turns out to be most latest and updated version indeed according to google sources.


After downloading it your local machine go to the directory where downloaded file is located there are few commands we need to fire in order to install this software.

 -> Extracting the tar.gz

tar xvzf nameofthefile.tar.gz after extraction you will see folder of that extracted file go to that folder and type following command

->sh bootstrap.sh

->./configure

->make

->make install

 After installation is done make a text file with some text in it in the same directory where you have installed it and test with the command md5deep nameof thefile.txt or hashdeep nameofthefile.txt in the same way you can test SHA variants and benchmark the result timing by simple adding time in front of md5deep for example:

time md5deep nameofthefile.txt //would result into
5444783fea966d71ed28da359a3cae9 /home/location/nameofthefile.txt
real    0m0.030s
user    0m0.000s
sys     0m0.016s

My next approach would be to export this attempt to aarch64 and x86_64 and try to benchmark it.

Now that I have fully setup md5deep on Aarch64 system the same way I did for my local system but there would be one more step to it. scp C:/path/directory/...tar.gz server@domain.com:/path/directory/
and the rest of the steps as same as before.-> Extracting the tar.gz tar xvzf nameofthefile.tar.gz after extraction you will see folder of that extracted file go to that folder and type following command
->sh bootstrap.sh
->./configure
->make
->make install DESTDIR=/home/path/directory  install

Here we have added path to make install instead of installing in it same directory cause I ran into issues while installing it gave me an error saying PERMISSION DENIED, well you can try installing it in that particular directory you might succeed. 'make install' is complaining because you're trying to install into system directories that are only writable by the system administrator. This is actually a good thing, because it will prevent you from overwriting system files with your test files.

Our next step would be to create 3 files with different file sizes this could be done with this command dd if=/dev/urandom of=file.txt bs=1048576 count=10 will create a file of size count*bs with some random generated content in it. Just for information the content will not be readable. In above case will be 10 mb in the same way by changing count abd bs we could create remaining 2 files. Let me just introduce to the output after running or executing the above command

(10mb file)
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.125207 s, 83.7 MB/s

(105mb file)
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 13.4593 s, 7.8 MB/s

(1gb file)
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 12.4558 s, 84.2 MB/s

Checking number of words in the file
wc -l file.txt
41031 file.txt

Compiling and running all three files 100times with command time for i in {1..100};do md5deep file.txt;done. The time below is the total time of running the program 100times. So now we gotta take an average to find out the time when it will run single time.

BEFORE AVERAGE
(10 mb file)
real 0m4.260s
user 0m3.586s
sys 0m0.786s

(100mb file)
real 0m39.260s
user 0m34.493s
sys 0m5.738s

(1gb file)
real 6m28.003s
user 5m45.722s
sys 0m51.928s

AFTER AVERAGE
(10 mb file)
real 0m0.04260s
user 0m0.03586s
sys 0m0.00786s

(100mb file)
real 0m0.39260s
user 0m0.34493s
sys 0m0.05738s

(1gb file)
real 6m28.003s
user 5m45.722s
sys 0m51.928s

So my upcoming blog for project would be based on some comparison f md5deep with hashdeep algorithm, sha256. But the main purpose of phase 2 would be trying to  implement altered build options running md5deep, make some changes in md5deep code to permit better optimization by the compiler if I could do it. I will make sure it doesn't affect Aarch64 systems white make such optimizations and changes. Well I found md5deep on github luckily and it has some files to look into md5.c and md5.h so I am still kinda understanding its coding pattern and will be starting to work on it real time soon.

No comments:

Post a Comment