The faster.map from wsm2110

Are you taking care of Hash Collisions?

I was looking into the code and I am not sure you are taking care of hash collisions

Inspiration can be found here: https://stackoverflow.com/questions/32027271/generate-two-different-strings-with-the-same-hashcode

GetOrAddValueRef

It would be good to expose a method

public ref TValue GetOrAddValueRef(TKey key)

(like in DictionarySlim).
That would help in case the TValue is a big struct and also to avoid double-lookups in quite common scenario like:

if (map.Get(key, out var oldValue)) // 1st
  map.Update(key, CalculateNewValue(oldValue)); //2nd
else
  map.Emplace(key, default); //2nd

vs.

ref var value = ref map.GetOrAddValueRef(key);
value = CalculateNewValue(value);

Throws `IndexOutOfRange` on large amount of items (with the same hash?) inserted

Minimum working example:

var map = new MultiMap<int, int>();
for (int i = 0; i < 1000; i++)
{
    // Console.WriteLine(i);
    map.Emplace(2, i); 			// line 86
}

The piece of code above throws the following error on my machine on the 914th iteration:

Error message:
   System.IndexOutOfRangeException : Index was outside the bounds of the array.
Stacktrace:
     at <snip>.Tests.MultimapTest() in <snip>\Tests.cs:line 86

Similar problems also happen if you replace the 2 with other numbers or new object(), etc.

System information:

.NET 6.0.400
Target Framework: net6.0 (bin), netstandard2.0 (lib)
System: Windows 11 22H2 x64

More methods

These changes would be a great addition:

ContainsKey - Should check if a key exists
GetOrThrow - Should throw if the requested key does not exist
Indexers
- var value = VarName[key] (Also should throw if the requested key does not exists)
- VarName[key] = value (should call emplace)

Hey,
I have noticed that we can easily address this issue. Currently we only use two x86 SSE2 instructions - Sse2.CompareEqual and Sse2.MoveMask. Those two can be easily replaced with platform independent equivalents - the static method Vector128.Equal and the extension method Vector128.ExtractMostSignificantBits.
At run-time the JIT will replace them with SSE2 or their analogues on ARM, depending on which architecture the app is running on.

So basically code like this

	//compare vectors
	var comparison = Sse2.CompareEqual(left, right);

	//convert to int bitarray
	int result = Sse2.MoveMask(comparison);

	if (!Sse2.IsSupported)
	{
	throw new NotSupportedException("Simd SSe2 is not supported");
	}

	public void Copy(DenseMapSIMD<TKey, TValue> denseMap)
	{
	for (var i = 0; i < denseMap._entries.Length; ++i)
	{
	if (_metadata[i] < 0)
	{
	continue;
	}

	var entry = denseMap._entries[i];
	Emplace(entry.Key, entry.Value);
	}
	}

	private readonly uint[] jump_distances = new uint[num_jump_distances]
	{
	// 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210, 231,
	// 253, 276, 300, 325, 351, 378, 406, 435, 465, 496, 528, 561, 595, 630,

	// results in
	// * 16 - 16 starting point

	32, 80, 144, 240, 320, 432, 560, 704, 864, 1040, 1232, 1440, 1664, 1905, 2160, 2432,
	2720, 3344, 3680, 4032, 4400, 4784, 5184, 5600, 6032, 6480, 6944, 7424, 7920, 8432, 8960
	};

	// pipelining friendly algorithm
	h = (h ^ (h >> 16)) * 0x85ebca6b;
	h = (h ^ (h >> 13)) * 0xc2b2ae35;
	return h ^ (h >> 16);

	/// <summary>
	/// Insert a key and value in the hashmap
	/// </summary>
	/// <param name="key">The key.</param>
	/// <param name="value">The value.</param>
	/// <returns>returns false if key already exists</returns>
	[MethodImpl(MethodImplOptions.AggressiveInlining)]
	public bool Emplace(TKey key, TValue value)

	if (_compare.Equals(_entries[index + jumpDistance + offset].Key, key))
	{
	return true;
	}

	_loadFactor = loadFactor;

	if (loadFactor > 0.9)
	{
	loadFactor = 0.9;
	}

	if (index > _length)
	{
	index = 0;
	jumpDistanceIndex = 0;
	continue;
	}

	if (index + jumpDistance + 16 > _length)
	{
	Resize();
	goto start;
	}

	if (BitOperations.IsPow2(length))
	{
	_length = length;
	}

	_metadata = new sbyte[_length + 16];
	_entries = new Entry<TKey, TValue>[_length + 16];

	new Span<sbyte>(_metadata).Fill(_emptyBucket);

	var emplaceVector = Sse2.CompareEqual(_emptyBucketVector, right);

	//check for empty entries
	int result = Sse2.MoveMask(emplaceVector);

	//use greaterThan so we can find al tombstones and empty entries (-126, -127)
	var emplaceVector = Sse2.CompareGreaterThan(_emplaceBucketVector, right);

	//check for tombstones - deleted and empty entries
	result = Sse2.MoveMask(emplaceVector);

	var oldEntries = new Entry<TKey, TValue>[_entries.Length];
	Array.Copy(_entries, oldEntries, _entries.Length);

	var oldMetadata = new sbyte[_metadata.Length];
	Array.Copy(_metadata, oldMetadata, _metadata.Length);

Method	Mean	Error	StdDev	Ratio	RatioSD
DenseMapSIMD2_Get_WithMaxDistanceCheck	10.65 ms	0.030 ms	0.060 ms	1.00	0.00
DenseMapSIMD2_Get_WithoutMaxDistanceCheck	10.04 ms	0.167 ms	0.336 ms	0.94	0.03

Method	Mean	Error	StdDev	Ratio	RatioSD
DenseMapSIMD2_Get_WithMaxDistanceCheck	6.707 ms	0.0984 ms	0.1987 ms	1.00	0.00
DenseMapSIMD2_Get_WithoutMaxDistanceCheck	6.268 ms	0.0449 ms	0.0906 ms	0.94	0.02

	result = Sse2.MoveMask(Sse2.CompareEqual(_emptyBucketVector, right));

	//Contains empty buckets;
	if (result != 0)
	{
	break;
	}

wsm2110 / faster.map Goto Github PK

faster.map's People

Contributors

Stargazers

Watchers

Forkers

faster.map's Issues

Recommend Projects

Recommend Topics

Recommend Org