Hashing Times
I did a timed test against md5, sha256, and sha512 and here are the results. These tests were run on local storage. When running the tests against a file on an offsite network share the results varied significantly. I am showing the commands I used for each hash and the average of 3 runs. The file was around 90Mb in size.
time sha256sum /tmp/file.zip
0fa377f4f178d814d73cec67026118972ddfad53730248348768224a8c214f01 /tmp/file.zip
Average: 321.66ms
time sha512sum /tmp/file.zip
1765f536bf65e88ef2fc94e39f7dc7cc9915cac00e4231402fca9ac1c9b2bdeba5094cef8215f303688d0343d2af64f0cfb32a3638f378be4dc2a87c63391564 /tmp/file.zip
Average: 245.33ms
time md5sum /tmp/Klingeledev-2023.02.18-00.00.zip
f7500263adb14cced2f2605767096678 /tmp/Klingeledev-2023.02.18-00.00.zip
Average: 117.66ms
I did another test using a 1.1Gb file of random gibberish and here are the results.
time sha256sum /tmp/file.dd
9024a3169f5df60d94ab77a7df312e152d84be5f169d9aa952c4d2a8cacdcef1 /tmp/file.dd
Average: 6.75 seconds
time sha512sum /tmp/file.dd
79be3bab9b3681a00999291890f0046f82db07f03e4d258ebc5f672296474d7cbf7c76c7f2ac056bf928b06aaf622b535c283718a147cbf3546af7e577768f14 /tmp/file.dd
Average: 4.89 seconds
time md5sum /tmp/file.dd
959c19b06d8cd4331da77ef07843aeea /tmp/file.dd
Average: 2.81 seconds
As you can see, MD5 is the fastest, but when you consider that using a very weak processor you can still reach collision, this isn’t ideal for checking files any longer, especially when considering best security practices. Link to Wikipedia discussing security flaws in MD5.
I need to point out this is on a server stored in the cloud, and it’s one core of an Epyc 75 processor at 2.0Ghz. I ran this same test on my laptop and the results changed in that the sha256 came out ahead by being about twice as fast, which is what is expected.
Backup Script
Here is an example of some backup code I have written that covers zipping a folder, then moving it offsite, and verifying the checksums match. I was comparing the different hashing speeds so I could see if there was a marked difference in working with potentially large backup files, and for files that end up being substantial in size, using sha512 to hash them may take significantly longer than sha256. In the interest of making sure the files are not compromised after creating them, time is a factor to consider. You can easily hash a file locally very quickly, but once the file is offsite, the time it takes to compute the hash can increase substantially due to network or bandwidth constraints. In the interest of ensuring a fast response to the hashing request, I ended up selecting sha256 since it’s currently strong enough and can easily be replaced in the future with a better hashing tool.
date=$(date '+%Y.%m.%d-%H.%M')
user=notarealuser
pass="notmyrealpasssilly"
backup_root=/Backups
# TODO #
# Test that the remote backup folder is online.
# If not, throw error, but still run a local backup.
# If it is online, set a variable to 1 so later rsync's will run.
# Need to implement sha256 hash checking on the files. I'm lazy and haven't done it yet.
# Also need to implement something to clean up backups that are older than X days, to be determined by the user.
# Backup examplesite.com
# Variables and folder creation
backup_folder=$backup_root/examplesite.com/$date
file=Folderofstuff-$date.zip
mkdir $backup_folder
# Copy the wordpress folder and zip it under /tmp
rsync -avPhu --no-i-r /var/www/html/examplesite.com/ $backup_folder
mariabackup --backup --databases='sitedatabase' --target-dir $backup_folder/db/ --user=$user --password=$pass
cd /tmp
zip -r9 $file $backup_folder
# Copy the zip file to the offsite backup location and list file sizes to confirm it's working.
rsync --info=progress2 ./$file /mnt/Backups/examplesite.com/
curl -u notarealuser:"NoTaReAlPaS$" -H "Title: WP Backup" -H "Priority:Low" -H "Tags:info" -d "Site backups complete." https://ntfy.sh/<your topic here>
The code above is part of the script, I still need to work on the ToDo list and hammer out those details. That said, this is a good reference for a script that handles versioning of files in case something gets torched and you don’t catch it until after the backup has been made. You can go back and snag a previous day’s backup and restore the affected files.