usethesource / capsule Goto Github PK
View Code? Open in Web Editor NEWThe Capsule Hash Trie Collections Library
License: BSD 2-Clause "Simplified" License
The Capsule Hash Trie Collections Library
License: BSD 2-Clause "Simplified" License
PersistentTrieMap
API?Best practice?
How to use the __...
methods?
How to best initialize the structure?
Is there a data structure for persistent lists?
The Capsule project contains two build descriptors, one for Maven and one for Gradle. The former is out of sync and, e.g., does not cover recent features (cf. #35). In order to avoid feature disparity and the two descriptors getting out of sync, I suggest removing the Maven build descriptors in favor of Gradle.
Steps for sunsetting the Maven build descriptor:
com.mycila:license-maven-plugin
There are some other parts of the app where static analysis has flagged me, like lack of equals and hashCode on some inner classes inside AbstractSpecialisedImmutableMap. Plenty of redundant casts and extra typing that could be replaced by diamonds. Indentation is also all over the place, with a mix of tabs and two and four spaces.
I could open another PR and help, if contributions are accepted.
Do you intend to implement other types of collections for example: Lists, Linked Lists, Queues, etc ...?
The current API of capsule that contains "__" prefixed operations (to avoid name clashes with JDK classes) shall be deprecated and replaced with a more readable natural API.
http://nexus.rascal-mpl.org/repository/maven-public/
, as given in the README, doesn't work.
For Map tries...
I'm trying to wrap my head around the implementation (great work and kudos to all). No, I'm not a masochist!
Re: BitmapIndexedMapNode 'nodes' array
TIA
a) change LICENSE file and headers to BSD 2-Clause (see http://usethesource.io/about/licenses.html)
b) agree on simplified license header for individual source files that should read similar to the ones uses in dotnet/corefx:
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
See issue #9 for a reported example that is not intuitive for the reader.
I previously already implemented an insertion-ordered immutable map for another open-source project. The goal is to backport this implementation to capsule and integrate them into capsule's API.
With a full grok of capsule attained, kudos again, I'm wondering why not Vectors (as in Clojure vectors)?
Does the sequential nature diminish the space/performance advantage of Trie?
Curious but not critical.
If this library is to be used as a reference implementation to CHAMP it should contain tests that will allow porters to verify the implementation.
The tests will also uncover bugs in this implementation.
Please cover this package with tests we can all use.
Is this logic for selecting the newDataMap
for a new singleton node correct?
Specifically, in the case where shift
is not 0
, newDataMap
is set to bitpos(mask(keyHash, 0))
. But keyHash
is the hash of the key that's being removed, right? (And the data map should reflect the key that remains.)
Or am I misunderstanding what's going on in that code?
Will capsule also provide sets/maps that maintain insertion order, or is this out of scope? And will it provide lists? Sorry for asking these questions here, didn't know where else to ask.
[ERROR] Failed to execute goal on project capsule-experimental: Could not resolve dependencies for project io.usethesource:capsule-experimental:jar:0.4.0-SNAPSHOT: Could not find artifact io.usethesource:capsule:jar:0.3.1-SNAPSHOT in usethesource (http://nexus.usethesource.io/content/repositories/public/) -> [Help 1]
See http://ci.usethesource.io/job/usethesource/job/capsule/job/master/82/
For small maps, there are some special instances which use fields instead of a Trie.
For a map of 2 entries, if you call .values()
on it, and the values are the same, you get an exception that no duplicate keys are allowed.
Looking at the code in io.usethesource.capsule.AbstractSpecialisedImmutableMap<K, V>
we see there is a TODO warning for this case:
@Override
public Collection<V> values() {
// TODO: will fail if two values are equals; return listOf(...)
return AbstractSpecialisedImmutableSet.setOf(val1, val2);
}
Instead of long-living experimental feature branches I'm considering to merge them into the master branch into special "experimental" packages. The API and implementations in those packages will be unstable until the code matures and moves to stable packages.
Merging the the branches should simplify code evolution of old and new features, and also show give users a sneak-peak of upcoming or experimental features.
I've had some big data structures that when printing looked the same, yet equals was saying they weren't.
After quite some stepping I found it out.
PersistentTrieSetMultimap$HashCollisionNode
is missing it's own equals
method that goes through the collisionContent
collection of itself and the other to see if they might be the same collision node. Instead it defaults to the Object.equals
which is reference equality.
We are (still?) on capsule 0.3.0, if possible I would appreciate a bug fix release :)
The seek method is described as follows, here https://arxiv.org/abs/1210.0481 :
seek(int seekKey): Position the iterator at a least
upper bound for seekKey,
i.e. the least key ≥ seekKey, or
move to end if no such key exists.
The sought key must be ≥ the
key at the current position.
Using this building block algorithms for trie compose ("join") and closure can make use of the orderedness of the hash keys:
This seek method can be used in many applications and it abstracts from the internal details of the data-structure. All it depends on is the hashCode/equals contract. Most implementation in capsule will use the structure of the trie to make sure seek
is done as quickly as possible. It would be best if we implement seek for all tries in capsule, IMHO.
Hi,
I see that CapsuleCollectors.toMap
is commented in the code. Is there any reason not to make it available?
That's how I reimplemented it:
public static <T, K, V> Collector<T, ?, io.usethesource.capsule.Map.Immutable<K, V>> toMap(
Function<? super T, ? extends K> keyMapper, Function<? super T, ? extends V> valueMapper
) {
/** extract key/value from type {@code T} and insert into map */
final BiConsumer<io.usethesource.capsule.Map.Transient<K, V>, T> accumulator =
(map, element) -> map.__put(keyMapper.apply(element), valueMapper.apply(element));
return new DefaultCollector<>(
io.usethesource.capsule.Map.Transient::of,
accumulator,
(left, right) -> {
left.__putAll(right);
return left;
},
io.usethesource.capsule.Map.Transient::freeze,
CapsuleCollectors.UNORDERED
);
}
Another one would be the same but using a function to merge values in case one already exists in the transient map.
Thanks!
I was curious about the statement in the README Capsule was recently extracted from the usethesource/vallang project
. Is Capsule an EPL derivative work? IANAL, so but I was reading about EPL earlier and it seems much less permissive than BSD and I'm wondering about how users should reason about this.
I think the second line here should say result = prime * result + (dataMap())
Cheers!
Highly recommend releasing to Maven Central if you think it's ready.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.