Is there any hash based comparison backup software?

Hash-based comparison backup software uses cryptographic hashes to efficiently identify changed data. This article outlines how it works and the benefits.
On this page

Is there any hash based comparison backup software?

Excerpt

Hash-based comparison backup software like Duplicati use cryptographic hashes to efficiently identify and backup changed data for space and time savings.


Introduction

Data backups are essential today to protect against loss from system crashes, cyber attacks, human errors and hardware failures. There are different technologies available for performing backups, each having their own pros and cons. One category is hash-based comparison backup software.

This article provides an overview of hash-based comparison backup, its benefits, examples of popular software, considerations for selection, limitations to be aware of, and recommendations for choosing the right backup software.

What is hash-based comparison backup software?

Hash-based comparison relies on cryptographic hash functions to detect changes in files. A hash value acts like a fingerprint for data - any changes to the file will result in a different hash.

Backup software generates and stores the hash values of the original files. In subsequent backups, the files are hashed again and compared. Only files with differing hashes need to be backed up, since identical hashes indicate the files are unchanged.

This avoids backing up unmodified data again and saves time and storage space. Hash-based backup also reliably verifies file integrity as altered files get flagged when hashes do not match.

Benefits of using hash-based comparison backup software

Some key advantages of using hash-based comparison backup software:

  • Efficient storage usage - Only modified portions of files get backed up after the initial full backup. This significantly reduces storage space requirements.

  • Faster backup and restore - Hashing minimizes the amount of data that needs to be processed during incremental backups. Restore is also quicker.

  • Change detection - Hash mismatches reliably indicate any changes in files down to the bit level. Admins can easily identify altered or corrupted files.

  • Auditability - The hashed based approach provides verification of the integrity and recoverability of data. This satisfies auditing and compliance requirements.

Examples of hash-based comparison backup software

Many backup solutions offer hash-based incremental backups:

  • Duplicati uses hashes to identify changed data segments and also encrypts backups.

  • HashBackup and HashBackup for Linux are open source backup tools specialized for hash-based deduplication.

  • CloudBerry Backup can perform incremental backups to cloud storage using hash verification of files.

  • Rsync is a popular utility that generates hash signatures of files and only transfers changed bytes over the network.

Considerations when choosing hash-based backup software

When evaluating hash-based backup tools, consider these aspects:

  • Compatibility - Support for your operating systems, hypervisor, databases, applications etc.

  • Scalability - Ability to handle large and growing volumes of data seamlessly.

  • Configurability - Granular control over backup scope, scheduling, retention policies as per needs.

  • Security - Encryption of backups, access controls, and cybersecurity protections.

  • Usability - Simple interface and workflows even for non-technical staff. Automation capabilities.

  • Cost - Upfront and ongoing licensing, storage and infrastructure costs.

Limitations of hash-based comparison backup

While very useful, hash-based backup also has some limitations to consider:

  • File size impact - Very large files take longer to hash, reducing backup speed. Incremental backups suffer with frequently changing large files.

  • Files not tracked - Files not part of initial baseline backup will not be hashed and backed up incrementally.

  • Hash collisions - Cryptographic hashes can rarely have collisions which may lead to incorrect change detection.

  • No versioning - Deleted or overwritten files cannot be recovered as only the latest version is hashed and stored.

Conclusion

Hash-based incremental backup enables efficient storage usage, faster processes, reliable change tracking and integrity verification. Software like Duplicati and CloudBerry Backup provide easy to use solutions.

However, hash-based comparison backup may not suit rapidly changing unstructured data. Other technologies like snapshot-based backups can be considered for such use cases. Organizations should evaluate their specific backup and recovery objectives, data profiles and infrastructure when choosing the right software.