vorner / contrie Goto Github PK
View Code? Open in Web Editor NEWConcurrent hash trie
License: Apache License 2.0
Concurrent hash trie
License: Apache License 2.0
The master already contain the ConSet
type, with basic set of methods. However, two things are missing:
Extend
, .iter
, IntoIter
for reference and FromIter
).The ConSet
type is very similar to the ConMap
one, so the implementations of the above will end up being similar too. Few notable things:
Extend
and FromIter
work both on tuples (K, V)
and prepared Arc<Element>
s. The set has no such thing, they would simply take T
directly.If considering working on this, please write a comment to claim the issue (and specify if you want to tackle both or only one of the subtasks). It is fine to put into review in parts as smaller pull requests or as one bigger, as you prefer.
Don't hesitate to ask if you have any questions or want some help.
This is somewhat open-ended task. There was nearly no performance analysis done and certainly no optimisation. We want to do some research in that area:
It would also be worth running the benchmarks or perf on some other architecture than just x86-64, like Arm.
(Marked as easy ‒ while there are no specific instructions, I believe trying to stress-test the data structure in some way should not be hard to do; reading the outputs of perf might be a bit harder though).
Due to lifetime reasons (the crossbeam epoch is unpinned on method exit), we can't easily return references into the map. Currently, the ConSet
clones the entries when returning. This is mostly fine for small/cheap types (ints, IP addresses) and an Arc
can be used to handle types that can't be cloned or are expensive to do so.
The ConMap
, on the other hand, integrates the Arc
into itself directly. That's because the node contains both key and value and it is cheaper to clone (and store) only one Arc
instead of two. But this is higher overhead for small/cheap types. So we probably also want a flavour of the data type (maybe CloneConMap
) that clones K
and V
independently.
I expect the implementation to be highly inspired by both ConMap
and ConSet
. Some methods make no sense, though (like insert_entry
), the methods should take either both key and value or just key (depending on the method), and return tuples.
If anyone wants to work on it, please claim it by commenting first, so the work doesn't get duplicated.
It's fine to do it in multiple merge requests, implementing only a subset first. Please include tests too (they too can be inspired by the other implementations, and we probably don't need the full set of what is on ConMap
, because that one is used to also test the underlying Raw
).
Don't hesitate to ask in case of need for help or questions.
The Debug
implementation of ConSet
is yet not written.
This should be simple application of iteration + https://doc.rust-lang.org/std/fmt/struct.Formatter.html#method.debug_set.
Currently, the structures don't allow snapshots or consistent iteration through them. This makes the code simpler and probably a bit faster (though that is hard to confirm without a way to compare).
On the other hand, there certainly are use cases that would benefit from such feature. So it would be nice to allow that in the future.
My feeling is that it could make sense to actually have both implementations (either in the same crate, or in a separate). Maybe there's a way to share some code, though.
As I don't have this on top of the priority list, this is certainly free to be taken by anyone interested (claim the issue if you want to start working on it).
This is the more interesting part of rayon support (see #3), iterating through the maps and sets. It'll require some research around how exactly iterators for rayon work (src/iter/map.rs
and src/iter/plumbing/README.md
seem to be of interest in their repository).
On the conset side, we need to be able to „split“ (conceptually). The rough idea how to do that is by extending the raw::iterator::Iter
with bounds for accepted hash values, start
and end
. We can use these to stop iterating after going over end
, and initialize by finding the first value that has its hash >= start
. The current new
will then be a wrapper that calls this extended constructor with 0, u64::max_value()
.
It would be nice if a user could customize the branching factor.
Currently, the branching factor is configured by constants at the top of raw/mod.rs
(LEVEL_BITS
, LEVEL_MASK
, LEVEL_CELLS
, MAX_LEVELS
).
The first step here would be moving the constants into the Config
trait so it can be overridden. To protect from inconsistent values, add the tests from consts_consistent
as asserts to the Raw::with_hasher
constructor.
For now I believe this would be uncommon to customize, so it can probably stay on Raw
only.
If anyone wants to work on this, please claim it by commenting to avoid work duplication. In case of any questions or need for help, don't hesitate to ask.
The aim here is to be able to create the collections from parallel iterators in the „naturally“ parallel way. Please implement them for both ConMap
and ConSet
(not necessarily in the same pull request). I expect the traits to be quite similar to the basic ones (FromIterator
and Extend
, they are already implemented for ConMap
), except that they would not use single-threaded for cycle, but probably parallel iterator's for_each
to take advantage of multiple threads.
https://docs.rs/rayon/1.1.0/rayon/iter/trait.FromParallelIterator.html
https://docs.rs/rayon/1.1.0/rayon/iter/trait.ParallelExtend.html
Please also provide few tests trying this out (some trivial ones like collecting a range in parallel into the collection are enough) and put the dependency on rayon behind a feature gate (so the crate depends on it only if the user wants it).
Note that the third related trait (IntoParallelIterator
) is not part of this particular ticket, since I expect it to be significantly harder (read: I haven't thought it through yet).
I'm happy to answer questions or provide help. If you want to work on the task, please claim it by writing a comment.
rayon
feature enabled. It is probably enough to run with default features and with --all-features
.rayon
feature there, so users know they can turn it on.rayon
flags turned on. It's described here how to do that: https://docs.rs/about.Currently, the end data node is an array (well, SmallVec). It usually contains only a single element. In case we have a collision on the whole length of 64bit hash, we have more elements.
In case we add or remove items from the colliding data node, we make a brand new copy of the array and clone the relevant elements there. This is not elegant. Furthermore, it requires the Config::Payload: Clone
bound. If we got rid of that bound, a user with very special needs could use the Raw
directly with types that are not Clone
, but are not inside an Arc
either and get just references into the data structure (by providing their own crossbeam_epoch::Guard
).
To accomplish that we can replace the data nodes with (singly) linked lock-free lists. They should be relatively easy to implement (note that we need to be able to delete from the middle as well, not only pop ‒ watch out for race conditions on deleting two consecutive elements).
❓ Is there a linked list we could just reuse lying around somewhere?
❓ If we decide to implement our own list, should it go into a separate crate or should it live in this one, in a separate module?
❓ Is the feature (of supporting non-Clone types as Raw's payload) worth the added work and complexity?
There is no reason for get_or_insert_with_element
to take ownership of the key, since the whole key/value pair is inserted in one go as an Element<K, V>
.
Changing the method to pub fn get_or_insert_with_element<F>(&self, key: &K, create: F)
can enable some interesting new possibilities.
For example, if I create a ConMap<Vec<u8>, String>
, I can normally pass in &[u8]
as the lookup key to get()
. However, currently I cannot do this:
let key = b"hello";
let x = map.get_or_insert_with_element(key, || Arc::new(Element::new(key.to_vec(), "world".to_string())));
In other words, first look-up the element with a slice reference without allocating a key structure. If not found, then create the key structure for storage.
Currently, I must do:
let key = b"hello";
if map.get(key).is_none() {
let x = map.get_or_insert(key.to_vec(), "world".to_string());
}
Of course, then the programmer is responsible for not doing stupid things like having the key
and the key value in Element<K,V>
be different...
From the paper:
Intuitively, a concurrent insertion operation could start by locating the internal node it needs to modify and then create a copy of that node with both the bitmap and the array updated with a reference to the key being inserted. A reference to the newly created node could then be written into the array of the parent node using the CAS instruction. Unfortunately, this approach does not work. The fundamental problem here is due to races between an insertion of a key into some node C1 and an insertion of another key into its parent node C2. One scenario where such a race happens is shown in Figure 1. Assume we have a hash trie from the Figure 1A and that a thread T1 decides to insert a key k5 into the node C2 and creates an updated version of C2 called C2′. It must then do a CAS on the first entry in the internal node C3 with the expected value C2 and the new value C2′. Assume that another thread T2 decides to insert a key k4 into the node C1 before this CAS. It will create an updated version of C1 called C1′ and then do a CAS on the first entry of the array of C2 – the updated node C1′ will not be reachable from the updated node C2′. After both threads complete their CAS operations, the trie will correspond to the one shown in Figure 1B, where the dashed arrows represent the state of the branches before the CASes. The key k4 inserted by the thread T2 is lost.
We solve this problem by introducing indirection nodes, or I-nodes, which remain present in the Ctrie even as nodes above and below change. The CAS instruction is performed on the I-node instead of on the internal node array. We show that this eliminates the race between insertions on different levels.
I haven't found anything in the implementation that prevents this interleaving; the paper avoids it by dint of the I-nodes forcing such insertions to conflict on their CAS of the I-node, causing one of them to have to retry.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.