Code Monkey home page Code Monkey logo

faster.map's People

Contributors

miroslavp avatar therzok avatar wsm2110 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

faster.map's Issues

GetOrAddValueRef

It would be good to expose a method

public ref TValue GetOrAddValueRef(TKey key)

(like in DictionarySlim).
That would help in case the TValue is a big struct and also to avoid double-lookups in quite common scenario like:

if (map.Get(key, out var oldValue)) // 1st
  map.Update(key, CalculateNewValue(oldValue)); //2nd
else
  map.Emplace(key, default); //2nd

vs.

ref var value = ref map.GetOrAddValueRef(key);
value = CalculateNewValue(value);

Throws `IndexOutOfRange` on large amount of items (with the same hash?) inserted

Minimum working example:

var map = new MultiMap<int, int>();
for (int i = 0; i < 1000; i++)
{
    // Console.WriteLine(i);
    map.Emplace(2, i); 			// line 86
}

The piece of code above throws the following error on my machine on the 914th iteration:

Error message:
   System.IndexOutOfRangeException : Index was outside the bounds of the array.
Stacktrace:
     at <snip>.Tests.MultimapTest() in <snip>\Tests.cs:line 86

Similar problems also happen if you replace the 2 with other numbers or new object(), etc.


System information:

.NET 6.0.400
Target Framework: net6.0 (bin), netstandard2.0 (lib)
System: Windows 11 22H2 x64

More methods

These changes would be a great addition:

  • ContainsKey - Should check if a key exists
  • GetOrThrow - Should throw if the requested key does not exist
  • Indexers
    • var value = VarName[key] (Also should throw if the requested key does not exists)
    • VarName[key] = value (should call emplace)

Easy multiplatform support

Hey,
I have noticed that we can easily address this issue. Currently we only use two x86 SSE2 instructions - Sse2.CompareEqual and Sse2.MoveMask. Those two can be easily replaced with platform independent equivalents - the static method Vector128.Equal and the extension method Vector128.ExtractMostSignificantBits.
At run-time the JIT will replace them with SSE2 or their analogues on ARM, depending on which architecture the app is running on.

So basically code like this

//compare vectors
var comparison = Sse2.CompareEqual(left, right);
//convert to int bitarray
int result = Sse2.MoveMask(comparison);

can be changed to

int result = (int)Vector128.Equals(left, right).ExtractMostSignificantBits();

I've tried and replaced the SSE2 calls with Vector128.Equal and Vector128.ExtractMostSignificantBits and it started working on my brother's Macbook Air with M1 processor without x86 emulation.

To make it work, we also need to change this check

if (!Sse2.IsSupported)
{
throw new NotSupportedException("Simd SSe2 is not supported");
}

to

 if (!Vector128.IsHardwareAccelerated) 
 { 
     throw new NotSupportedException("Simd is not supported"); 
 } 

Isn't that cool or what?

Sometimes DenseMapSIMD Resize loses entries

Not sure why this is happening.

Minimum working example:

[TestMethod]
public void AnItemHasBeenLost()
{
	List<long> keysList = new List<long>()
	{
		3053810, 6107620, 9161430, 12215240, 39699530, 36645720, 33591910, 30538100,
		27484290, 109937160, 106883350, 103829540, 100775730, 97721920, 82452870, 79399060, 76345250, 73291440, 70237630,
		198497650, 195443840, 192390030, 189336220, 186282410, 171013360, 167959550, 164905740, 161851930, 158798120, 128260020,
		125206210, 122152400, 119098590, 116044780, 94668110, 91614300, 88560490, 296219570, 293165760, 290111950, 287058140,
		284004330, 268735280, 265681470, 262627660, 259573850, 256520040, 225981940, 222928130, 219874320, 216820510, 213766700,
		207659080, 204605270, 201551460, 180174790, 177120980, 174067170, 137421450, 134367640, 131313830, 433641020, 430587210,
		427533400, 424479590, 421425780, 406156730, 403102920, 400049110, 396995300, 393941490, 363403390, 360349580, 357295770,
		354241960, 351188150, 345080530, 342026720, 338972910, 335919100, 332865290, 329811480, 326757670, 323703860, 317596240,
		314542430, 311488620, 308434810, 305381000, 302327190, 299273380, 274842900, 271789090, 253466230, 247358610, 244304800,
		241250990, 238197180, 235143370, 232089560, 229035750, 210712890, 149636690, 146582880, 143529070, 140475260, 601600570,
		598546760, 595492950, 592439140, 589385330, 574116280, 571062470, 568008660, 564954850, 561901040, 531362940, 528309130,
		525255320, 522201510, 519147700, 513040080, 509986270, 506932460, 503878650, 500824840, 497771030, 494717220, 491663410,
		485555790, 482501980, 479448170, 476394360, 473340550, 470286740, 467232930, 464179120, 461125310, 458071500, 455017690,
		451963880, 442802450, 439748640, 436694830, 415318160, 412264350, 409210540, 390887680, 387833870, 384780060, 381726250,
		378672440, 375618630, 372564820, 369511010, 366457200, 348134340, 277896710, 250412420, 781775360, 778721550, 775667740,
		772613930, 769560120, 754291070, 751237260, 748183450, 745129640, 742075830, 711537730, 708483920, 705430110, 702376300,
		699322490, 693214870, 690161060, 687107250, 684053440, 680999630, 677945820, 674892010, 671838200, 665730580, 662676770,
		659622960, 656569150, 653515340, 650461530, 647407720, 644353910, 641300100, 638246290, 635192480, 632138670, 622977240,
		619923430, 616869620, 613815810, 610762000, 607708190, 604654380, 586331520, 583277710, 580223900, 577170090, 558847230,
		555793420, 552739610, 549685800, 546631990, 543578180, 540524370, 537470560, 534416750, 516093890, 488609600, 448910070,
		445856260, 418371970, 320650050, 280950520, 183228600, 992488250, 989434440, 986380630, 983326820, 980273010, 965003960,
		961950150, 958896340, 955842530
	};

	long missingKey = 439748640L;

	Assert.IsTrue(keysList.Contains(missingKey), "Sanity check failed");

	var keysSet = new DenseMapSIMD<long, long>(190);
	keysSet.Emplace(0L, 0L);

	foreach (var key in keysList)
	{
		if (!keysSet.Contains(key))
			keysSet.Emplace(key, key);

		if (key == missingKey)
			Assert.IsTrue(keysSet.Contains(missingKey), "Sanity check failed");
	}

	Assert.IsTrue(keysSet.Contains(missingKey), "Entry has been lost");
}

DenseMapSIMD Throws IndexOutOfRangeException inside Contains

DenseMapSIMD Throws IndexOutOfRangeException inside Contains when small length was passed into constructor.

Minimum working example:

var fmap = new DenseMapSIMD<long, long>(1);
fmap.Emplace(0L, 0L);

var r = fmap.Contains(1);

The piece of code above throws IndexOutOfRangeException inside GetArrayEntryRef
Most likely the reason is length instead of _length here:

if (BitOperations.IsPow2(length))
{
_length = length;
}

Benchmark for Contains

Hi,
Do you have any rough idea how the Contains method of these collections compares to those in Dictionary and SlimDictionary in terms of performance? Can you add benchmarks when you get free time for cases when the key is present and not present in the collection?

Thank you for the great work!

Potential optimization for big structs

Maybe we should consider changing this line inside Emplace()

var entry = GetArrayVal(_entries, indexAndOffset);

to

 ref readonly var entry = ref GetArrayValRef(_entries, indexAndOffset); 

and this line inside Get()

var entry = GetArrayVal(_entries, index + Unsafe.As<int, uint>(ref offset));

to

ref readonly var entry = ref GetArrayValRef(_entries, index + Unsafe.As<int, uint>(ref offset)); 

This way we can avoid copying large entries (when we have large TKey and/or TValue) from the _entries array. What do you think?

Possible bug in DenseMapSIMD.Copy(DenseMapSIMD<TKey, TValue> denseMap)

In the Copy method below:

public void Copy(DenseMapSIMD<TKey, TValue> denseMap)
{
for (var i = 0; i < denseMap._entries.Length; ++i)
{
if (_metadata[i] < 0)
{
continue;
}
var entry = denseMap._entries[i];
Emplace(entry.Key, entry.Value);
}
}

shouldn't this if statement check for denseMap._metadata[i] < 0 rather than _metadata[i] < 0 as it is now?

if (_metadata[i] < 0)
{
continue;
}

Question: Is it safe to make jump_distances static?

Hi,
I noticed that jump_distances is read-only and never modified. Is it safe to make it static, so it is not allocated per instance?

private readonly uint[] jump_distances = new uint[num_jump_distances]
{
// 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210, 231,
// 253, 276, 300, 325, 351, 378, 406, 435, 465, 496, 528, 561, 595, 630,
// results in
// * 16 - 16 starting point
32, 80, 144, 240, 320, 432, 560, 704, 864, 1040, 1232, 1440, 1664, 1905, 2160, 2432,
2720, 3344, 3680, 4032, 4400, 4784, 5184, 5600, 6032, 6480, 6944, 7424, 7920, 8432, 8960
};

Question about Fmix pipelining friendliness

Are you sure this code is pipelining friendly? It looks quite the opposite to me. Every line relies on the result from the previous line.

// pipelining friendly algorithm
h = (h ^ (h >> 16)) * 0x85ebca6b;
h = (h ^ (h >> 13)) * 0xc2b2ae35;
return h ^ (h >> 16);

The generated asm code also doesn't suggest pipelining friendliness.

       8BC1                 mov      eax, ecx
       C1E810               shr      eax, 16
       33C1                 xor      eax, ecx
       69C86BCAEB85         imul     ecx, eax, 0xFFFFFFFF85EBCA6B
       8BC1                 mov      eax, ecx  ;in order to start the calculations on the second line we need the result from ecx (previous line)
       C1E80D               shr      eax, 13
       33C1                 xor      eax, ecx
       69C835AEB2C2         imul     ecx, eax, 0xFFFFFFFFC2B2AE35
       8BC1                 mov      eax, ecx  ;here again we need the result from the previous line in order to calculate the return value
       C1E810               shr      eax, 16
       33C1                 xor      eax, ecx

Am I missing something?

Question about DenseMapSIMD.Emplace(TKey key, TValue value)

According to your documentation, Emplace method should return false if the key already exists

/// <summary>
/// Insert a key and value in the hashmap
/// </summary>
/// <param name="key">The key.</param>
/// <param name="value">The value.</param>
/// <returns>returns false if key already exists</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public bool Emplace(TKey key, TValue value)

Don't we check for the key existence here?
if (_compare.Equals(_entries[index + jumpDistance + offset].Key, key))
{
return true;
}

Suggestion for array bounds check

As you may know, unlike C++, in C# there's a runtime check if you attempt to access an element at an index that is outside the bounds of an array. This is by default. In some cases the JIT compiler is "smart" enough to recognize patterns which guarantee that the index would never be out of the bounds of the array.
Such pattern is

for (var i=0; i<array.Length; i++)
{
}

In such case the JIT does not emit native code that does the checks.
In our code though, there are dynamically calculated indexes that can't hint the JIT to remove the checks. If we are 100% sure that we won't try to get elements outside of the array bounds, we can remove the checks manually.
Here's what the generated native code looks like when you have checks

     private int GetArrayValTestWithChecks(int[] array, int index)
     {
         return array[index];
     }
Test.GetArrayValTestWithChecks(Int32[], Int32)
    L0000: sub rsp, 0x28
    L0004: cmp r8d, [rdx+8]
    L0008: jae short L0016     ;when out of bounds we branch here to the throw line
    L000a: mov eax, r8d
    L000d: mov eax, [rdx+rax*4+0x10]
    L0011: add rsp, 0x28
    L0015: ret
    L0016: call 0x00007fff38258b30   ;throws the exception
    L001b: int3

Here's the case when the checks are eliminated

   private int GetArrayValTestWithNoChecks(int[] array, int index)
   {
      ref var arr0 = ref MemoryMarshal.GetArrayDataReference(array);
      return Unsafe.Add(ref arr0, index);
   }
Test.GetArrayValTestWithNoChecks(Int32[], Int32)
    L0000: cmp [rdx], dl
    L0002: movsxd rax, r8d
    L0005: mov eax, [rdx+rax*4+0x10]
    L0009: ret

Less instructions to execute and no branches. The less branches you have in the code, the less wrong branch predictions and more consistent execution time

So I created these two helper methods

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static T GetArrayVal<T>(T[] array, uint index)
        {
#if DEBUG
            return array[index];
#else
            ref var arr0 = ref MemoryMarshal.GetArrayDataReference(array);
            return Unsafe.Add(ref arr0, index);
#endif
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static ref T GetArrayValRef<T>(T[] array, uint index)
        {
#if DEBUG
            return ref array[index];
#else
            ref var arr0 = ref MemoryMarshal.GetArrayDataReference(array);
            return ref Unsafe.Add(ref arr0, index);
#endif
        }

and amended these Get() method lines
from this

var right = Vector128.LoadUnsafe(ref _metadata[index], jumpDistance);

to this

var right = Vector128.LoadUnsafe(ref GetArrayValRef(_metadata, index), jumpDistance);

from this

var entry = _entries[index + jumpDistance + offset];

to this

var entry = GetArrayVal(_entries, index + jumpDistance + Unsafe.As<int, uint>(ref offset));

and finally, from this

jumpDistance = jump_distances[jumpDistanceIndex];

to this

jumpDistance = GetArrayVal(jump_distances, jumpDistanceIndex);

On my machine the final result for the GetBenchmark is from 5-7% better. It's not significant but it is some gain. Please note that if there's something wrong with the algorithm in the future (if we introduce a bug) and we try to access elements outside of the bounds of the array, it will throw an exception in DEBUG, but will crash in RELEASE.
The downside of this optimization is that you won't know what's wrong in RELEASE if there's a bug. It's up to you if you want to use it or not.

P.S.
A kind reminder that the current implementation of IndexOf can give you wrong result, because of the allocation of uninitialized array with GC.AllocateUninitializedArray. We should either remove it or amend it accordingly to check the metadata first before returning the index

Bug in the last commit

I've tested your last commit and it returns wrong results for Get()

Please note that the original code was calling LoadUnsafe with two parameters -> source (ref _metadata[index]) and offset (jumpDistance)

var right = Vector128.LoadUnsafe(ref _metadata[index], jumpDistance);

In the new code we call it with one parameter -> only the source (ref GetArrayValRef(_metadata, index))

var right = Vector128.LoadUnsafe(ref GetArrayValRef(_metadata, index));

which is the same as

var right = Vector128.LoadUnsafe(ref _metadata[index]);

but without the bounds check.

This is because we add the offset to the index variable later on in the code

//Calculate jumpDistance
index += GetArrayVal(jump_distances, jumpDistanceIndex);

However it is important to subtract the previous jumpDistance, before adding the new one, as it was in my code.

  //subtract last jump distance from the index
  index -= jumpDistance;

  //set the new jump distance
  jumpDistance = GetArrayVal(jump_distances, distanceIndex);

  //recalculate index, taking into account the new jump distance
  index += jumpDistance;

otherwise on the next loop iteration the code would look something like this
(this is pseudo code)

var right = Vector128.LoadUnsafe(ref _metadata[originalIndex +  oldJumpDistance + newJumpDistance]);

And this is not what we want. We want

var right = Vector128.LoadUnsafe(ref _metadata[originalIndex + newJumpDistance]);

Also can you please explain this part

if (index > _length)
{
index = 0;
jumpDistanceIndex = 0;
continue;
}

I thought it should be just

 if (index > _length) 
 {     
     return false;
 } 

I read it as "If index > length, start all over again" which looks like endless loop to me

[Question] What about the thread safety

Generally, I am interested in your implementation, because I was trying to implement concurrent open-addressed map in the past (and failed).

As I understood, it is not thread-safe variants?

Btw, how it compares to the DictionarySlim?

Missing Resize() in GetOrAdd

Method gets caught in endless while (true) loop ... adding Resize() similar to Emplace etc. solves it ...

[TestMethod]
public void AssertGetOrAddMultipleValues()
{
    //assign
    var map = new DenseMap<int, int>();

    for (int i = 0; i < 20; i++)
    {
        map.GetOrAdd (i);                                   // caught in endless while (true) loop at index 18
    }
}

Support also ARM SIMD

Aside of Sse2 for x86 platforms please also put Arm SIMD on your Roadmap

For Example AdvSimd.CompareEqual(), ...

Question about EmplaceInternal

Hey,
I have a question about this if block in EmplaceInternal():

if (index + jumpDistance + 16 > _length)
{
Resize();
goto start;
}

Is it ever theoretically possible to get inside this if block? I don't pretend to understand the algorithm perfectly, that's why I'm asking.

Currently, the only place EmplaceInternal() is called from is from Resize() where we first resize (double) the arrays and add the items from the old arrays (skipping the tombs). So I wonder whether it is possible at all with doubling the size of the arrays and skipping the tombs that we can still get to a moment where we would need to resize. This Resize()<-->EmplaceInternal() "recursion" kind of bothers me.
I have tested the code with the AddAndResizeBenchmark and with the data we have, we never get into that if block.

Another optimization suggestion for Resize()

This is a bit bold, but works nice for your algorithm. It is up to you if you want to amend it or not.
I have noticed that after the allocation of the new metadata array, you immediately initialize it with the emptyBucket value (-127). So basically the array is initialized twice - once with zeros (using new) and then with -127. Also, the entries array is never used in the code without first checking the metadata array (well, except for the IndexOf() method which is for testing purposes, but it also can be amended to make metadata lookups first)

_metadata = new sbyte[_length + 16];
_entries = new Entry<TKey, TValue>[_length + 16];
new Span<sbyte>(_metadata).Fill(_emptyBucket);

So basically we can safely allocate both arrays without initializing them to zero (you don't rely on those zeros in the code anyway) using GC.AllocateUninitializedArray. I have tested it and allocation of array of 1_000_000 ints without initialization is about 11 times faster.

 _metadata = GC.AllocateUninitializedArray<sbyte>((int)_length + 16);
 _entries = GC.AllocateUninitializedArray<Entry<TKey, TValue>>((int)_length + 16);

This brings also slight improvement to the final AddAndResize benchmark. Let me know what you think!

P.S.
If you don't like that the following line starts with new
new Span<sbyte>(_metadata).Fill(_emptyBucket);
the following is equivalent
_metadata.AsSpan().Fill(_emptyBucket);

Possible simplification in Emplace and EmplaceInternal

As the algorithm is currently implemented we have two special metadata states -> tombstone(-126) and emptyBucket(-127). Both of those are negative numbers. If we don't plan to add other special cases, we can safely remove some SIMD calls.

These lines in EmplaceInternal()

var emplaceVector = Sse2.CompareEqual(_emptyBucketVector, right);
//check for empty entries
int result = Sse2.MoveMask(emplaceVector);

can safely be reduced to

//check for empty entries 
int result = Sse2.MoveMask(right); 

and these in Emplace()

//use greaterThan so we can find al tombstones and empty entries (-126, -127)
var emplaceVector = Sse2.CompareGreaterThan(_emplaceBucketVector, right);
//check for tombstones - deleted and empty entries
result = Sse2.MoveMask(emplaceVector);

ca be reduced to

//check for tombstones - deleted and empty entries 
result = Sse2.MoveMask(right); 

The reason for this is that all the sbyte hashes in the metadata are positive numbers (most significant bit cleared) and all the tombstones and empty buckets are negative (most significant bit is set). So MoveMask will give us the same result as what the current Sse2.CompareGreaterThan(_emplaceBucketVector, right); and Sse2.CompareEqual(_emptyBucketVector, right); return.

Let me know if you disagree

Suggestion about Resize()

I have a suggestion for the code bellow in the Resize() method:

var oldEntries = new Entry<TKey, TValue>[_entries.Length];
Array.Copy(_entries, oldEntries, _entries.Length);
var oldMetadata = new sbyte[_metadata.Length];
Array.Copy(_metadata, oldMetadata, _metadata.Length);

It can be simplified to

var oldEntries = _entries.Clone();
var oldMetadata = _metadata.Clone();

It looks simpler and is slightly faster. I tried it with array of 1_000_000 ints and it was about 30% faster.
Both the Array.Copy and Array.Clone make shallow copy.

Performance regression after introduction of _maxDistance

I noticed that the code started to run a bit slower after the adding of the _maxDistance optimization. I have run two tests with the old while(true) and the new while (jumpDistanceIndex <= _maxDistance) loop. The tests differ from each other only in the number of the existent and the non-existent keys

900_000 : 100_000 (existent to non-existent keys) ratio

Method Mean Error StdDev Ratio RatioSD
DenseMapSIMD2_Get_WithMaxDistanceCheck 10.65 ms 0.030 ms 0.060 ms 1.00 0.00
DenseMapSIMD2_Get_WithoutMaxDistanceCheck 10.04 ms 0.167 ms 0.336 ms 0.94 0.03

500_000 : 500_000 (existent to non-existent keys) ratio

Method Mean Error StdDev Ratio RatioSD
DenseMapSIMD2_Get_WithMaxDistanceCheck 6.707 ms 0.0984 ms 0.1987 ms 1.00 0.00
DenseMapSIMD2_Get_WithoutMaxDistanceCheck 6.268 ms 0.0449 ms 0.0906 ms 0.94 0.02

As you can see, in both cases the old while(true) code works better. I've started to wonder why ...
After some examination of the code, I realized that this check is not necessary at all and I think it is partially my fault for this. I asked you in this issue whether the performance would be worse if the number of the non-existent keys is larger and kind of pushed you in the wrong direction.

It turns out that the code was already taking good care of the non-existent keys and the adding of additional code complexity was not necessary.
Here in this code snippet

result = Sse2.MoveMask(Sse2.CompareEqual(_emptyBucketVector, right));
//Contains empty buckets;
if (result != 0)
{
break;
}

we check for empty bucket and if we find one, we break. So basically we don't really iterate every entry (go through all the jump distances).

Am I correct or am I missing something obvious here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.