The Avalanche effect
In the realm of integer hashing, the Avalanche effect is the notion that if the input of a hash function is changed slightly, its output must change significantly. i.e., flipping one bit in the input must randomly flip about half the output bits.
This criterion can be hardened. Strict (full) avalanche says that whenever a single input bit is flipped, each of the output bits must change with a 50% probability.
In simpler terms this means that a good hash function is expected to break the correlation between the input and the output as much as possible. Avalanche is one possible measure of how well this property is achieved.
Implementing avalanche
Let us focus on u32->u32
hash functions.
Implementing avalanche is fairly simple. All you need to do is pass each possible u:u32
value through your hash function, and then flip each bit in u
one by one, seeing what bits in hash(u)
are affected. For each flip in the output you ++
the (c,r)
entry in a 32x32
counters array, where (c,r)
corresponds to a flip in the c-th input bit and a flip in the r-th output bit.
An exact estimation of avalanche requires that you try each of the 2^32
possible values for u
once, which may take longer than reasonable. But it is generally enough to just try a large bunch of (uniformly distributed) values to get a good estimate of what avalanche will converge to.
A good hash function should accumulate to a close-to-perfectly-even counters array. i.e., if the array is normalized by the total amount of values we tested, the array will be flat 50% gray. On the other hand, poor hash functions quickly produce an evident pattern in the resulting array (see down below).
Avalanche matrix for MurMur3’s finalizer
I wrote a Shadertoy for this
Over the years I have implemented plenty of stress-tests for PRNGs, QRNs, hash functions, etc… as part of Maverick Render’s MK_api Unit Testing. This time I went ahead and wrote a Shadertoy instead as an exercise:
The code allows you to try out different popular u32->u32
hash functions. You can also type in your own. In a few seconds you will see what the 32x32
avalanche array and the corresponding numerical estimation converge to.
The code can be edited to display the avalanche matrix, or the bias matrix, which is nothing but bias=abs(avalanche-.5)
. i.e., How much avalanche deviates from 50%. Naturally, the bias is meant to converge to 0 for a high-quality hash function.
Some results with other hash functions:
Avalanche matrix for iquint1
Avalanche matrix for Schechter and Bridson hash
Avalanche matrix for Wang’s hash
Valuable links
There is a methodology to create your own hash functions by combining certain reversible operations. I won’t get into the details here, but I will give a couple of fantastic links instead:
- The pretty amazing Hash Function prospector project, by Chris Wellons (a.k.a., skeeto).
skeeto found some extraordinary 32-bit and 64-bit direct/reverse hash functions himself. In particular the lowbias32
one, which comes implemented in my Shadertoy above.
- A dissertion on the search for perfect hashing, by Bob Jenkins.
Takeaway
Many CG use cases where what matters the most is performance can get away with cheap hash/RNG functions. On the other hand, industrial-strength rendering applications such as Maverick Render just can’t.
But there is a sweet spot between high-performance and low-bias (as proven by, for example, the Murmur3 finalizer or skeeto’s low-bias hash) where you can have a negligible bias with just a handful of arithmetic instructions.