jchambers / jvptree Goto Github PK
View Code? Open in Web Editor NEWA generic vantage point tree (vp-tree) implementation in Java
License: MIT License
A generic vantage point tree (vp-tree) implementation in Java
License: MIT License
protected void anneal() {
if (this.points == null) {
final int closerSize = this.closer.size();
final int fartherSize = this.farther.size();
if (closerSize == 0 || fartherSize == 0) {
// One of the child nodes has become empty, and needs to be pruned.
this.points = new ArrayList<>(closerSize + fartherSize);
this.addAllPointsToCollection(this.points);
this.closer = null;
this.farther = null;
this.anneal();
} else {
this.closer.anneal();
this.farther.anneal();
}
I'm confused about the "this.addAllPointsToCollection(this.points);". Does this mean that you wanna add all non-zero-number children to the current node , then set the closer and farther points to the current node ? But actually I cannot figure out this works as expect. It will be helpful and great to get your explanation.
Hi Jon. I'd like to use the tree to find the elements of a collection that are spatially close to each other, rather than look for nearest neighbours of some other point. Can I use the iterator for that? For example, if I extract the contents of the tree into a list (as you do in the test code), is there any guarantee that points that are spatially close will be adjacent in the list?
I did some testing with the most recent version and found that sometimes not all the relevant points are returned when a filter is used, especially for highly selective filters. I think it's due to the value of distanceFromQueryPointToFarthestPoint
not having the same meaning when filters are applied since the collector may have seen a better point, but didn't want it due to the filter.
I found that removing the if cases in VPTreeNode
on lines 212 and 222 cause the function to always return the correct results, but then of course the queries are much slower.
Some contents of the tree are unexpectedly disappearing during repeated calls to "remove". I have a simple example:
private static class SimpleXY {
private final double x;
private final double y;
public SimpleXY(double x, double y) {
this.x = x;
this.y = y;
}
public double getX() {
return x;
}
public double getY() {
return y;
}
}
private static class SimpleDistance implements DistanceFunction<SimpleXY> {
@Override
public double getDistance(SimpleXY pos1, SimpleXY pos2) {
double xDiff = pos1.getX() - pos2.getX();
double yDiff = pos1.getY() - pos2.getY();
return Math.sqrt(xDiff * xDiff + yDiff * yDiff);
}
}
@Test
void testRemove() {
int nodeSize = 2;
int numEntries = 4;
List<SimpleXY> simpleXYList = new ArrayList<>(numEntries);
Random random = new Random(0);
for (int i = 0; i < numEntries; ++i) {
simpleXYList.add(new SimpleXY(random.nextDouble(), random.nextDouble()));
}
final VPTree<SimpleXY, SimpleXY> vpTree = new VPTree<>(new SimpleDistance(),
new SamplingMedianDistanceThresholdSelectionStrategy<>(
SamplingMedianDistanceThresholdSelectionStrategy.DEFAULT_NUMBER_OF_SAMPLES),
nodeSize, simpleXYList);
for (final SimpleXY s : simpleXYList) {
System.out.println(vpTree.size());
if (!vpTree.remove(s)) {
throw new RuntimeException("Entry not removed");
}
}
}
which has the following output:
4
0
java.lang.RuntimeException: Entry not removed
Hi jchambers. Thank you for the great library. I run into this issue of not being able to index more than 67 million points. I believe the error is being caused here:
You multiply i (index for number of samples, default is 32) by number of points. This multiplication crosses the integer limit (if the number of points are more than 67 million), wraps around and thus points.get() gets an index with a negative value causing an index out of bounds exception. Probably dividing number of points by number of samples first and then multiplying by i would do the trick.
Could you possibly look into this issue? I am attaching the stack trace here for reference.
java.lang.IndexOutOfBoundsException: Index -57824128 out of bounds for length 305574400
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
at java.base/java.util.Objects.checkIndex(Objects.java:373)
at java.base/java.util.ArrayList.get(ArrayList.java:426)
at com.eatthepath.jvptree.util.SamplingMedianDistanceThresholdSelectionStrategy.getSampledPoints(SamplingMedianDistanceThresholdSelectionStrategy.java:61)
at com.eatthepath.jvptree.util.SamplingMedianDistanceThresholdSelectionStrategy.selectThreshold(SamplingMedianDistanceThresholdSelectionStrategy.java:43)
at com.eatthepath.jvptree.VPTreeNode.anneal(VPTreeNode.java:88)
at com.eatthepath.jvptree.VPTreeNode.<init>(VPTreeNode.java:62)
at com.eatthepath.jvptree.VPTree.addAll(VPTree.java:286)
First I would like to apologize for asking this here. And I would like to thank you for this great library! I have tried many libraries and algorithms to index millions of image feature vectors and this worked like a charm.
So, can I call vptree.getAllWithinDistance() from multiple threads?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.