Which hashing algorithm is best for uniqueness and speed? Example (good) uses include hash dictionaries. I know there are things like SHA-256 and such, but these algorithms are designed to be secure , which usually means they are slower than algorithms that are less unique .
I tested some different algorithms, measuring speed and number of collisions.
I used three different key sets:
A list of 216,553 English words (in lowercase).
The numbers «1» to «216553» (think ZIP codes, and how a poor hash took down msn.com).
216,553 «random» (i.e. type 4 uuid) GUIDs.Best answer · 2259If you are wanting to create a hash map from an unchanging dictionary, you might want to consider perfect hashing https://en.wikipedia.org/wiki/Perfect_hash_function – during the construction of the hash function and hash table, you can guarantee, for a given dataset, that there will be no collisions.50Here is a list of hash functions, but the short version is:
If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. It has excellent distribution and speed on many different sets of keys and table sizes.33CityHash by Google is the algorithm you are looking for. It is not good for cryptography but is good for generating unique hashes.
Read the blog for more details and the code is available here.
CityHash is written in C++. There also is a plain C port.
About 32-bit support:
All the CityHash functions are tuned for 64-bit processors.26The SHA algorithms (including SHA-256) are designed to be fast.
In fact, their speed can be a problem sometimes. In particular, a common technique for storing a password-derived token is to run a standard fast hash algorithm 10,000 times (storing the hash of the hash of the hash of the hash of the password).18I’ve plotted a short speed comparison of different hashing algorithms when hashing files.
The individual plots only differ slightly in the reading method and can be ignored here, since all files were stored in a tmpfs. Therefore the benchmark was not IO-bound if you are wondering.
Logarithmic scale.16I know there are things like SHA-256 and such, but these algorithms are designed to be secure, which usually means they are slower than algorithms that are less unique.
The assumption that cryptographic hash functions are more unique is wrong, and in fact it can be shown to be often backwards in practice. In truth:10It depends on the data you are hashing. Some hashing works better with specific data like text. Some hashing algorithms were specificaly designed to be good for specific data.
Paul Hsieh once made fast hash. He lists source code and explanations. But it was already beaten.9Use SipHash. It has many desirable properties:
Fast. An optimized implementation takes around 1 cycle per byte.
Secure. SipHash is a strong PRF (pseudorandom function). This means that it is indistinguishable from a random function (unless you know the 128-bit secret key). Hence:8Java uses this simple multiply-and-add algorithm:
The hash code for a String object is computed as.
s*31^(n-1) + s*31^(n-2) + + s[n-1].
using int arithmetic, where s[i] is the i-th character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)5
|security – What makes a hashing algorithm «secure|
|Hashing Algorithm: Deleting an element in linear probing|
Hash algorithm benchmarks aren’t testing the way they’ll be used for in-memory hash tables. In benchmarks the hash algorithm code stays in the instruction cache and nothing else is pushed out. In a real setting hash algorithms with many instructions keep your own code’s instructions out of the cache.
Top responsesMurmurHash author here. Unless you’re hashing gigabytes of stuff, your hash function isn’t going to be a bottleneck. I’d generally recommend Murmur3 if … read more889 votesA good blogger could have strung this out into a 5-part series, with tons of ads. Yet this person gave it away for free. Da real MVP.573 votesThe data is a little old. Newer hash algorithms like xxhash and cityhash are pretty good. This by Peter Kankowski is a pretty decent reference. It is more … read more140 votesCRC32 codding collides with gnu