Excerpt
Learn how to write a SHA1 hash function from scratch and understand the process behind it. This blog post explains each step in detail and provides examples for better understanding.
Introduction
SHA1 or SHA-1 is a popular one-way cryptographic hash function used widely in security applications. Though the algorithm is complex, it is educational and insightful to understand the core steps involved in implementing a basic SHA1 hash function from scratch. In this post, we will walk through the key stages of developing a SHA1 hashing program in a simplified manner.
Understanding SHA1
SHA1 takes an arbitrary length input message and produces a 160-bit hash value. Key properties:
- Small changes in input lead to significant changes in output
- Infeasible to generate original input from hash
- Low chance of collisions for different inputs
The SHA1 algorithm operates on binary data and consists of 6 main steps:
- Convert input to binary
- Pad the input
- Break into 512-bit blocks
- Initialize variables
- Perform hash computation
- Output final 160-bit hash
Let’s look at each step:
Step 1: Converting Input to Binary
The input string is first converted to a binary bit string. This can be done by:
- ASCII encoding - converting characters to 8-bit ASCII values
- Unicode encoding - converting characters to 16-bit Unicode code points
For example, the string “Hello” becomes
10100100001100101011011000110110001101111
in ASCII binary form.
Step 2: Padding the Message
The binary message is padded to ensure its length is a multiple of 512 bits. Padding involves:
- Appending a single 1 bit to message
- Adding 0 bits until message is 64 bits shy of next 512 multiple
- Appending a 64-bit message length value at the end
So “Hello” with its ASCII binary is padded to become
101001000011001010110110001101100011011110000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001010000001010100100000011001010
Step 3: Breaking the Message into Blocks
The padded message is then broken into 512-bit blocks. If the padded length is 1048 bits, we get two 512-bit blocks:
1Block 1: 0100100001100101011011000110110001101111000000100000000000000000000000000000000000000000000000000000000
2
3Block 2: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001010000001010100100000011001010
Step 4: Initializing Variables and Constants
SHA1 uses five 32-bit variables initialized as:
1var h0 := 0x67452301
2var h1 := 0xEFCDAB89
3var h2 := 0x98BADCFE
4var h3 := 0x10325476
5var h4 := 0xC3D2E1F0
It also uses constants generated from the fractional parts of the cube root of primes from 2 to 79.
Step 5: Hash Computation
This involves four rounds of processing the message blocks:
- Message Scheduling - Creating 80 expanded message words
- Compression Function - Conducting logical operations usingExpanded message, constants and hash values.
- Updating Hash Value - Adding output of compressionfunction to initial hash values.
- Output - Converting final hash values to hexadecimal to generate 160-bit hash.
The compression and update steps are repeated for each 512-bit block.
Step 6: Generating the Final Hash
After all blocks are processed, the final 160-bit hash value is produced by concatenating the 5 updated 32-bit hash variables h0 to h4.
For “Hello”, the final SHA1 hash is
1f7ff9e8b7bb2e09b70935a5d785e0cc5d9d0abf0
An free online tool to quickly verify your answers
Conclusion
Implementing the SHA1 algorithm from scratch provides valuable insights into its internal workings. By following the essential steps of preprocessing, initialization, hash computation and output generation, we can produce the final SHA1 digest for any input. Learning these cryptographic concepts and methods prepares us for tackling more advanced hash functions down the road.