1024programmer Java C#, C# source program of Google CityHash64 and CityHash128 hashing algorithms

C#, C# source program of Google CityHash64 and CityHash128 hashing algorithms

1. A brief history of CityHash

Google released the CityHash series of string hashing algorithms in 2011. There are two algorithms released today: CityHash64 and CityHash128. They calculate 64- and 128-bit hashes respectively from the string. These algorithms are not suitable for encryption, but are suitable for use in hash tables, etc.

Google has been optimizing the algorithm based on the CPUs commonly used in its data centers and found that it is equally effective for most personal computers and laptops. Especially in terms of 64-bit registers, instruction set-level parallelism, and fast non-paired memory accesses.

The development of this algorithm was greatly inspired by previous work on hashing algorithms, especially Austin Appleby’s MurmurHash. But the main advantage of CityHash is that most steps contain at least two separate mathematical operations. Modern CPUs usually get the best performance from this kind of code.

But CityHash also has its disadvantages: the code is more complex than similar popular algorithms. Google wants to optimize for speed rather than simplicity, so it doesn’t take care of the special case of shorter inputs.

In general, CityHash64 and CityHash128 are new algorithms that solve classic problems. In practical applications, Google expects CityHash64 to be at least 30% faster, and potentially up to twice as fast. In addition, the statistical properties of these algorithms are also complete.

2. Introduction to CityHash (family of hash functions for strings)

CityHash provides hash functions for strings. These functions are mixed with inputting bits thoroughly, but are not suitable for cryptography. See “Hash Quality” below for details on how CityHash is tested and more.
We provide a reference implementation in C++ with a friendly MIT license.
CityHash32() returns a 32-bit hash.
CityHash64() and similar functions return a 64-bit hash.
CityHash128() and similar functions return 128-bit hashes and target strings of at least a few hundred bytes. Depending on the compiler it may be faster than CityHash64() over long enough strings. On shorter strings it is slower than necessary, but we expect this to be relatively unimportant.
CityHashCrc128() and a similar variant of CityHash128() on _mm_crc32_u64(), an intrinsic function compiled to the crc32 instruction on some CPUs. However, none of the functions we provide are CRC.
CityHashCrc256() is a variant of CityHashCrc128() on _mm_crc32_u64(). It returns a 256-bit hash.
All members of the CityHash family relied heavily on the previous work of Austin Appleby, Bob Jenkins and others when designing.
For example, CityHash32 has many similarities with Murmur3a.

Long string performance: 64-bit CPU

CityHash64() and its variants work on short strings, but long strings are also interesting.
CityHash aims to be fast, but its hashes are very good. For CPUs with CRC32 instructions, the CRC is fast, but the CRC is not designed to be a hash function and should not be used as one. CityHashCrc128() is not a CRC, but it uses the CRC32 mechanism.
On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at about 5 to 5.5 bytes/cycle. The other CityHashCrc function is a wrapper around CityHashCrc256 and should have similar performance on long strings. (CityHashCrc256 in v1.0.3 is even faster, but we don’t think it’s thorough enough. CityHash128 peaks at about 4.3 bytes/cycle. The fastest Murmur variant on this hardware, Murmur3F, peaks at about 2.4 bytes/cycle.
We expect the peak speed of CityHash128 to dominate over CityHash64 which is more preferred for short strings or for use in hash tables.
For long strings, a new function SpookyHash by Bob Jenkins is only available on Intel x86-64 CPUs Slightly slower than CityHash128, but noticeably faster on AMD x86-64 CPUs. For hashing long strings on AMD CPUs and/or CPUs without CRC instructions, SpookyHash may work better or better than any CityHash variant Good.

Short string performance: 64-bit CPU

For short strings, such as most hash table keys, CityHash64 is better than CityHash128, depending on the string length A mix. Here are some results on the same hardware we (impractically) had again:

Hash results

CityHash64 v1.0.3 7ns means 1 byte, or 6ns means 8 bytes, or 9ns means 64 bytes Murmur2 (64 bits) 6ns means 1 byte, or 6ns means 8 bytes, or 15ns means 64 bytes 1 byte is 14ns, 8 bytes is 15ns, 64 bytes We don’t have the CityHash64 benchmark results for version 1.1, but we expect the numbers to be similar.

Performance: 32-bit CPU

CityHash32 is the latest variant of CityHash. Its purpose is generally 32-bit hardware, but primarily tested on x86. Our benchmarks suggest that Murmur3 is the closest competitor to CityHash32 on x86.
We are not aware of any faster products that can match the quality. Speed ​​rankings are in our In the test: CityHash32>Murmur3f>Murmur3a (for long strings), to CityHash32>Murmur3a>Murmur3f (for short strings).


3. The source code of CityHash64 and CityHash128.��

Adapted from:

Java version Implementation of cityHash64 and cityHash128 algorithmsicon-default.png?t=M85Bhttps://blog.csdn.net/aA518189/article/details/107867147Referenced :

Clickhouse cityHash64 Java implementationicon-default.png?t=M85Bhttps://blog.csdn.net/lvxueshi/article/details/122505870

using System;namespace Legalsoft.Truffer .Encryption
{///

/// Google CityHash 64, 128-bit algorithm /// https://blog.csdn.net/lvxueshi/article/details/122505870/// https://blog .csdn.net/aA518189/article/details/107867147///

public static class CityHash{private const long k0 = unchecked((long)0xc3a5c85c97cb3127L); private const long k1 = unchecked((long)0xb492b66fbe98f273L); private const long k2 = unchecked((long)0x9ae16a3b2f90404fL);private const long k3 = unchecked((long)0xc949d7c7509e6557L);///

/// ///

///

///

/// private static long toLongLE(byte[] b, int i){return (((long)b[i + 7] << 56) +((long)(b[i + 6] & 255) <<48) +((long)(b[i + 5] & 255) <<40) +((long)(b[i + 4] & 255) <<32) +((long)(b[i + 3] & 255) <<24) +((b[i + 2] & 255) <<16) +((b[i + 1] & 255) <<8) +((b[i + 0] & 255) <<0));}///

/// ///

///

///

/// private static long toIntLE(byte[] b, int i){return (((b[i + 3] & 255L) <<24) +((b[i + 2] & 255L) <<16) +((b[i + 1] & 255L) <<8) +((b[i + 0] & 255L) <<0));}///

/// ///

///

///

/// private static long fetch64(byte[] s, int pos){return toLongLE(s, pos);}///

/// ///

///

///

/// private static long fetch32(byte[] s, int pos){return toIntLE(s, pos);}///

/// ///

///

/// private static int staticCastToInt(byte b){return b & 0xFF;}///

/// ///

///

///

/// private static long rotate(long val, int shift){// Avoid shifting by 64: doing so yields an undefined result.return shift == 0 ? val : ((long)((ulong)val >> shift)) | (val <<(64 - shift) );}///

/// On x86-64, and probably others, it's possible for /// this to compileEquivalent to Rotate(), but requires the second arg to be non-zero. /// to a single instruction if both args are already in registers. ///

///

///

/// private static long rotateByAtLeast1(long val, int shift){return ((long)((ulong)val >> shift)) | (val <<(64 - shift));}///

// / ///

///

/// private static long shiftMix(long val){//return val ^ (val >>> 47); return val ^ ((long)((ulong)val >> 47));}///

/// ///

private const long kMul = unchecked((long)0x9ddfea08eb382d69L);// /

/// ///

///

///

/// private static long hash128to64(long u, long v){long a = (u ^ v) * kMul;a ^= ((long)((ulong)a >> 47));// long b = (u ^ a) * kMul;long b = (v ^ a) * kMul;b ^= ((long)((ulong)b >> 47));b *= kMul;return b;}///

/// ///

///

///

/// private static long hashLen16(long u, long v){return hash128to64(u, v );}private static long hashLen16(long u, long v, long kmul){long a = (u ^ v) * kmul;a ^= (a >> 47);long b = (v ^ a) * kmul; b ^= (b >> 47);b *= kmul;return b;}///

/// ///

///

///

///

/// private static long hashLen0to16(byte[] s, int pos, int len){if (len >= 8){ /*long a = fetch64(s, pos);long b = fetch64(s, pos + len - 8);return hashLen16(a, rotateByAtLeast1(b + len, len)) ^ b;*/long kmul = k2 + len * 2;long a = fetch64(s, pos) + k2;long b = fetch64(s, pos + len - 8);long c = rotate(b, 37) * kmul + a;long d = (rotate( a, 25) + b) * kmul;return hashLen16(c, d, kmul);}if (len >= 4){/*long a = fetch32(s, pos);return hashLen16(len + (a << 3), fetch32(s, pos + len - 4));*/long kmul = k2 + len * 2;long a = fetch32(s, pos + 0);return hashLen16((a < 0){byte a = s[pos];byte b = s[pos + ((int)((uint)len >> 1) )];byte c = s[pos + len - 1];int y = staticCastToInt(a) + (staticCastToInt(b) <<8);int z = len + (staticCastToInt(c) <<2);return shiftMix (y * k2 ^ z * k3) * k2;}return k2;}///

/// This probably works well for 16-byte strings as well, /// but it may be overkill in that case. ///

///

///

///

/// private static long hashLen17to32(byte[] s, int pos, int len){long a = fetch64(s, pos) * k1;long b = fetch64(s, pos + 8);long c = fetch64(s, pos + len - 8 ) * k2;long d = fetch64(s, pos + len - 16) * k0;return hashLen16(rotate(a - b, 43) + rotate(c, 30) + d, a + rotate(b ^ k3, 20 ) - c + len);}public static long reversalByte(long l){/*ByteBuffer buffer = ByteBuffer.allocate(8);sbyte[] array = buffer.putLong(0, l).array();sbyte[] newArr = new sbyte[array.Length];for (int i = array.Length - 1; i >= 0; i--){newArr[array.Length - i - 1] = array[i];}ByteBuffer buffer2 = ByteBuffer.wrap(newArr, 0, 8);//if(littleEndian){// ByteBuffer.order(ByteOrder) method specifies the byte order, that is, big and small endian mode (BIG_ENDIAN/LITTLE_ENDIAN) // ByteBuffer defaults to big endian ( BIG_ENDIAN) mode //buffer.order(ByteOrder.LITTLE_ENDIAN); //}buffer.order(ByteOrder.LITTLE_ENDIAN); return buffer.getLong();*/byte[] buf = BitConverter.GetBytes(l);byte[] newary = new byte[buf.Length];for (int i = buf.Length - 1; i >= 0; i--) newary[i] = buf[i];byte[] buf2 = (byte[]) newary.Clone();return BitConverter.ToInt64(buf2, 0);}/*///

/// ///

///

///

///

/// private static long hashLen33to64(byte[] s, int pos, int len){long z = fetch64(s, pos + 24);long a = fetch64(s, pos) + (len + fetch64(s, pos + len - 16)) * k0;long b = rotate(a + z, 52);long c = rotate(a, 37);a += fetch64(s, pos + 8);c += rotate(a, 7);a += fetch64(s, pos + 16);long vf = a + z;long vs = b + rotate (a, 31) + c;a = fetch64(s, pos + 16) + fetch64(s, len - 32);z = fetch64(s, len - 8);b = rotate(a + z, 52); c = rotate(a, 37);a += fetch64(s, len - 24);c += rotate(a, 7);a += fetch64(s, len - 16);long wf = a + z; long ws = b + rotate(a, 31) + c;long r = shiftMix((vf + ws) * k2 + (wf + vs) * k0);return shiftMix(r * k0 + vs) * k2;}* /private static long hashLen33to64(byte[] s, int pos, int len){long mul = k2 + len * 2;long a = fetch64(s, pos) * k2;long b = fetch64(s, pos + 8) ;long c = fetch64(s, pos + len - 24);long d = fetch64(s, pos + len - 32);long e = fetch64(s, pos + 16) * k2;long f = fetch64(s, pos + 24) * 9;long g = fetch64(s, pos + len - 8);long h = fetch64(s, pos + len - 16) * mul;long u = rotate(a + g, 43) + ( rotate(b, 30) + c) * 9;long v = ((a + g) ^ d) + f + 1;long w = reversalByte((u + v) * mul) + h;long x = rotate( e + f, 42) + c;long y = (reversalByte((v + w) * mul) + g) * mul;long z = e + f + c;a = reversalByte((x + z) * mul + y) + b;b = shiftMix((z + a) * mul + d + h) * mul;return b + x;}///

/// ///

///

///

///

/// public static long cityHash64(byte[] s, int pos, int len){if (len <= 32){if (len <= 16){return hashLen0to16(s, pos, len);}else{return hashLen17to32(s, pos, len);}}else if (len <= 64){return hashLen33to64(s, pos, len);}// For strings over 64 bytes we hash the end first, and then as we// loop we keep 56 bytes of state: v, w, x, y, and z.long x = fetch64(s, pos);long y = fetch64(s, pos + len - 16) ^ k1;long z = fetch64(s, pos + len - 56) ^ k0;long[] v = weakHashLen32WithSeeds (s, pos + len - 64, len, y);long[] w = weakHashLen32WithSeeds(s, pos + len - 32, len * k1, k0);z += shiftMix(v[1]) * k1;x = rotate(z + x, 39) * k1;y = rotate(y, 33) * k1;// Decrease len to the nearest multiple of 64, and operate on 64-byte chunks.len = (len - 1) & ~staticCastToInt((byte)63);do{x = rotate(x + y + v[0] + fetch64(s, pos + 16), 37) * k1;y = rotate(y + v[1] + fetch64 (s, pos + 48), 42) * k1;x ^= w[1];y ^= v[0];z = rotate(z ^ w[0], 33);v = weakHashLen32WithSeeds(s, pos , v[1] * k1, x + w[0]);w = weakHashLen32WithSeeds(s, pos + 32, z + w[1], y);long tmp = x;x = z;z = tmp;pos += 64;len -= 64;} while (len != 0);return hashLen16(hashLen16(v[0], w[0]) + shiftMix(y) * k1 + z, hashLen16(v[1], w[1]) + x);}///

/// ///

///

///

///

///

///

///

/// private static long[ ] weakHashLen32WithSeeds(long w, long x, long y, long z, long a, long b){a += w;b = rotate(b + a + z, 21);long c = a;a += x; a += y;b += rotate(a, 44);return new long[] { a + z, b + c };}///

/// Return a 16-byte hash for s[0 ] ... s[31], a, and b./// Quick and dirty.///

///

///

// /

///

/// private static long[] weakHashLen32WithSeeds(byte[] s, int pos, long a, long b){ return weakHashLen32WithSeeds(fetch64(s, pos + 0),fetch64(s, pos + 8),fetch64(s, pos + 16),fetch64(s, pos + 24),a, b);}private static long[] cityMurmur(byte[] s, int pos, int len, long seed0, long seed1){long a = seed0;long b = seed1;long c = 0;long d = 0;int l = len - 16;if (l = 8 ? fetch64(s, pos + 0 ) : c));}else{c = hashLen16(fetch64(s, pos + len - 8) + k1, a);d = hashLen16(b + len, c + fetch64(s, pos + len - 16)) ;a += d;do{a ^= shiftMix(fetch64(s, pos + 0) * k1) * k1;a *= k1;b ^= a;civate static long hashLen17to32(byte[] s, int pos, int len){long a = fetch64(s, pos) * k1;long b = fetch64(s, pos + 8);long c = fetch64(s, pos + len - 8) * k2;long d = fetch64(s, pos + len - 16) * k0;return hashLen16(rotate(a - b, 43) + rotate(c, 30) + d, a + rotate(b ^ k3, 20) - c + len);}public static long reversalByte(long l){/*ByteBuffer buffer = ByteBuffer.allocate(8);sbyte[] array = buffer.putLong(0, l).array(); sbyte[] newArr = new sbyte[array.Length];for (int i = array.Length - 1; i >= 0; i--){newArr[array.Length - i - 1] = array[i]; }ByteBuffer buffer2 = ByteBuffer.wrap(newArr, 0, 8);//if(littleEndian){// ByteBuffer.order(ByteOrder) method specifies the byte order, that is, big and small endian mode (BIG_ENDIAN/LITTLE_ENDIAN) // ByteBuffer defaults to Big Endian (BIG_ENDIAN) mode //buffer.order(ByteOrder.LITTLE_ENDIAN); //}buffer.order(ByteOrder.LITTLE_ENDIAN); return buffer.getLong();*/byte[] buf = BitConverter.GetBytes(l); byte[] newary = new byte[buf.Length];for (int i = buf.Length - 1; i >= 0; i--) newary[i] = buf[i];byte[] buf2 = (byte [])newary.Clone();return BitConverter.ToInt64(buf2, 0);}/*///

/// ///

///

// /

///

/// private static long hashLen33to64(byte[] s, int pos, int len){long z = fetch64( s, pos + 24);long a = fetch64(s, pos) + (len + fetch64(s, pos + len - 16)) * k0;long b = rotate(a + z, 52);long c = rotate (a, 37);a += fetch64(s, pos + 8);c += rotate(a, 7);a += fetch64(s, pos + 16);long vf = a + z;long vs = b + rotate(a, 31) + c;a = fetch64(s, pos + 16) + fetch64(s, len - 32);z = fetch64(s, len - 8);b = rotate(a + z, 52);c = rotate(a, 37);a += fetch64(s, len - 24);c += rotate(a, 7);a += fetch64(s, len - 16);long wf = a + z;long ws = b + rotate(a, 31) + c;long r = shiftMix((vf + ws) * k2 + (wf + vs) * k0);return shiftMix(r * k0 + vs) * k2 ;}*/private static long hashLen33to64(byte[] s, int pos, int len){long mul = k2 + len * 2;long a = fetch64(s, pos) * k2;long b = fetch64(s, pos + 8);long c = fetch64(s, pos + len - 24);long d = fetch64(s, pos + len - 32);long e = fetch64(s, pos + 16) * k2;long f = fetch64 (s, pos + 24) * 9;long g = fetch64(s, pos + len - 8);long h = fetch64(s, pos + len - 16) * mul;long u = rotate(a + g, 43 ) + (rotate(b, 30) + c) * 9;long v = ((a + g) ^ d) + f + 1;long w = reversalByte((u + v) * mul) + h;long x = rotate(e + f, 42) + c;long y = (reversalByte((v + w) * mul) + g) * mul;long z = e + f + c;a = reversalByte((x + z) * mul + y) + b;b = shiftMix((z + a) * mul + d + h) * mul;return b + x;}///

/// ///

/ //

///

///

/// public static long cityHash64(byte[] s, int pos, int len){if (len <= 32){if (len <= 16){return hashLen0to16(s, pos, len);}else{return hashLen17to32(s, pos, len);}}else if ( len <= 64){return hashLen33to64(s, pos, len);}// For strings over 64 bytes we hash the end first, and then as we// loop we keep 56 bytes of state: v, w, x, y, and z.long x = fetch64(s, pos);long y = fetch64(s, pos + len - 16) ^ k1;long z = fetch64(s, pos + len - 56) ^ k0;long[] v = weakHashLen32WithSeeds(s, pos + len - 64, len, y);long[] w = weakHashLen32WithSeeds(s, pos + len - 32, len * k1, k0);z += shiftMix(v[1]) * k1;x = rotate(z + x, 39) * k1;y = rotate(y, 33) * k1;// Decrease len to the nearest multiple of 64, and operate on 64-byte chunks.len = (len - 1) & ~staticCastToInt((byte)63);do{x = rotate(x + y + v[0] + fetch64(s, pos + 16), 37) * k1;y = rotate(y + v[1 ] + fetch64(s, pos + 48), 42) * k1;x ^= w[1];y ^= v[0];z = rotate(z ^ w[0], 33);v = weakHashLen32WithSeeds( s, pos, v[1] * k1, x + w[0]);w = weakHashLen32WithSeeds(s, pos + 32, z + w[1], y);long tmp = x;x = z;z = tmp;pos += 64;len -= 64;} while (len != 0);return hashLen16(hashLen16(v[0], w[0]) + shiftMix(y) * k1 + z, hashLen16(v[ 1], w[1]) + x);}///

/// ///

///

///

/ //

///

///

///

/// private static long[] weakHashLen32WithSeeds(long w, long x, long y, long z, long a, long b){a += w;b = rotate(b + a + z, 21);long c = a;a + = x;a += y;b += rotate(a, 44);return new long[] { a + z, b + c };}///

/// Return a 16-byte hash for s[0] ... s[31], a, and b./// Quick and dirty.///

///

///

///

///

/// private static long[] weakHashLen32WithSeeds(byte[] s, int pos, long a, long b){return weakHashLen32WithSeeds(fetch64(s, pos + 0),fetch64(s, pos + 8),fetch64(s, pos + 16),fetch64(s, pos + 24),a, b);}private static long[] cityMurmur(byte[] s, int pos, int len, long seed0, long seed1){long a = seed0;long b = seed1;long c = 0;long d = 0;int l = len - 16; if (l = 8 ? fetch64(s, pos + 0) : c));}else{c = hashLen16(fetch64(s, pos + len - 8) + k1, a);d = hashLen16(b + len, c + fetch64(s, pos + len - 16));a += d;do{a ^= shiftMix(fetch64(s, pos + 0) * k1) * k1;a *= k1;b ^= a;c ^= shiftMix(fetch64(s, pos + 8) * k1) * k1;c *= k1;d ^= c;pos += 16;l -= 16;} while (l > 0);}a = hashLen16(a, c);b = hashLen16 (d, b);return new long[] { a ^ b, hashLen16(b, a) };}private static long[] cityHash128WithSeed(byte[] s, int pos, int len, long seed0, long seed1){ if (len = 128);y += rotate(w[0], 37) * k0 + z;x += rotate(v[0] + z, 49 ) * k0;// If 0 <len <128, hash up to 4 chunks of 32 bytes each from the end of s.for (int tail_dOne= 0; tail_done = 16){return cityHash128WithSeed(s, pos + 16, len - 16, fetch64(s, pos) ^ k3, fetch64(s, pos + 8));}else if (len >= 8){return cityHash128WithSeed(new byte[0], 0, 0, fetch64(s, pos) ^ (len * k0), fetch64(s, pos + len - 8) ^ k1);}else{return cityHash128WithSeed(s, pos, len, k0, k1);}}}
}

4. Demo code for calling method

private void button1_Click(object sender, EventArgs e)
{string str = "Marxism-Leninism";byte[] buf = Encoding .Default.GetBytes(str);long hash64 = CityHash.cityHash64(buf, 0, buf.Length);long[] hash128 = CityHash.cityHash128(buf, 0, buf.Length);label16.Text = hash64 + " " + hash128[0] + "." + hash128[1];
}

^= shiftMix(fetch64(s, pos + 8) * k1) * k1;c *= k1;d ^= c;pos += 16;l -= 16;} while (l > 0);}a = hashLen16 (a, c);b = hashLen16(d, b);return new long[] { a ^ b, hashLen16(b, a) };}private static long[] cityHash128WithSeed(byte[] s, int pos, int len, long seed0, long seed1){if (len = 128);y += rotate(w[0], 37) * k0 + z;x += rotate (v[0] + z, 49) * k0;// If 0 <len <128, hash up to 4 chunks of 32 bytes each from the end of s.for (int tail_dOne= 0; tail_done = 16){return cityHash128WithSeed(s, pos + 16, len – 16, fetch64(s, pos) ^ k3, fetch64(s, pos + 8));}else if (len >= 8){return cityHash128WithSeed(new byte[0], 0, 0, fetch64(s, pos) ^ ( len * k0), fetch64(s, pos + len – 8) ^ k1);}else{return cityHash128WithSeed(s, pos, len, k0, k1);}}}
}

4. Demo code for calling method

private void button1_Click(object sender, EventArgs e)
{string str = "Marxist-Leninist ism";byte[] buf = Encoding.Default.GetBytes(str);long hash64 = CityHash.cityHash64(buf, 0, buf.Length);long[] hash128 = CityHash.cityHash128(buf, 0, buf.Length) ;label16.Text = hash64 + " " + hash128[0] + "." + hash128[1];
}

This article is from the internet and does not represent1024programmerPosition, please indicate the source when reprinting:https://www.1024programmer.com/c-c-source-program-of-google-cityhash64-and-cityhash128-hashing-algorithms/

author: admin

Previous article
Next article

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact us

181-3619-1160

Online consultation: QQ交谈

E-mail: 34331943@QQ.com

Working hours: Monday to Friday, 9:00-17:30, holidays off

Follow wechat
Scan wechat and follow us

Scan wechat and follow us

Follow Weibo
Back to top
首页
微信
电话
搜索