Code Monkey home page Code Monkey logo

Comments (57)

yogurtearl avatar yogurtearl commented on May 16, 2024 12

@kluever how about in 2017? :)

from guava.

bobtiernay-okta avatar bobtiernay-okta commented on May 16, 2024 9

Maybe 2019 is the ticket? :P

from guava.

idolizesc avatar idolizesc commented on May 16, 2024 5

2020?

from guava.

quickiwiki avatar quickiwiki commented on May 16, 2024 5

2021? Still no movement?

from guava.

kevinb9n avatar kevinb9n commented on May 16, 2024 4

I'm emotionally attached to the idea that Guava could one day have this, but I guess it's time to face facts: we just aren't in an active feature mode anymore (even, for something of this scale, if it's externally-driven).

So, I'll probably close this issue in 2033. Just kidding.

from guava.

debashisdeb avatar debashisdeb commented on May 16, 2024 2

2022? #weNeedTrie 😄

from guava.

gissuebot avatar gissuebot commented on May 16, 2024 1

Original comment posted by sberlin on 2007-10-01 at 08:10 PM


We use it internally for a few cases.
 1) An IP list. We have a KeyAnalyzer that analyzes ip addresses to store them
(and/or ranges of them) as compactly as possible. See:
https://www.limewire.org/fisheye/browse/~raw,r=1.18/limecvs/core/com/limegroup/gnutel
la/filters/IPList.java .

 2) Storing a Kademlia DHT's internal structure. This is probably a very specific
use-case, although it works very well. :)

Other use cases:
 3) A very efficient (both in memory & CPU) dictionary. It could be used for
something like a cellphone's phonebook. You start typing and it immediately finds
all strings that began with your string. At the same time the structure acts like a
Map preventing you from adding the same String twice. It's kind of like a super-
SortedMap.

I'm sure there's a ton of other cases, but those are the ones we use (and have plans
to use). The really awesome things about Patricia is the way it does lookups.
Consider the IPList -- IPv4 addresses have a fixed size of 32 bits. So no matter
how many items you have stored in the Trie, it will always do at most 32 bit
comparisons. And it will often do less (because it intelligently traverses the
tree), and it will keep things sorted and tell you all addresses that begin with
X.Y, etc...

from guava.

gissuebot avatar gissuebot commented on May 16, 2024 1

Original comment posted by jim.andreou on 2011-02-25 at 02:35 AM


Yep, will certainly take this into consideration, thanks for the links. I find the last method (#prefixMap(prefix)) especially nifty, exposing the recursive structure of a trie (though I would expect to see the self type being returned, instead of SortedMap, that's lossy).

That said, I have to admit that personally I'm a bit predisposed against patricia. My reasoning is that for the same kind of performance, we could go with a ternary trie, which is more flexible (e.g. no key analyzer, not necessarily unbalanced trees). But I don't trust what "that kind of performance" means for big tries. I'm still going through the literature hunting for other options (e.g. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499, http://www.naskitis.com/)

from guava.

udit-29 avatar udit-29 commented on May 16, 2024 1

2023?

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by kevinb9n on 2007-10-01 at 07:51 PM


thanks!

It would be really, really (really) helpful if we could collect, here, a good variety
of use cases for this trie.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by kevinb9n on 2007-10-16 at 03:37 PM


Great stuff, folks. Thanks so much.

With the sheer mountain of work involved in getting our existing stuff polished, I'm
afraid it may be some time before I can look at breaking into new territory. But, I
do think that this trie and tries in general are a very promising addition, so don't
lose hope!


Status: Accepted
Owner: kevinb9n

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by kevinb9n on 2007-10-23 at 04:29 AM


Disclaimer: I don't know much about Tries.

I can imagine that many use cases for a Trie break into at least two categories

  1. The SortedMap/NavigableMap the right API abstraction that the caller wants, but
    due to the nature of the data, a Trie can be much faster than existing
    implementations of these interfaces.
  2. What the caller wants to see doesn't look like a Map at all, but something much
    simpler that deals only with prefix-matching like (rough guess):

  public interface Trie<N, V> {
    boolean containsMatch(List<? extends N> sequence);
    V get(List<? extends N> sequence); // doesn't require exact match
    void put(List<? extends N> sequencePrefix, V value);
  }

A separate issue: should we consider providing an alternate interface that is
tailored to character-based tries, so this alternate API could use CharSequence in
place of List<Character>? Then there would be two simple methods
Tries.asCharacterTrie() and Tries.forCharacterTrie() to go back and forth between the
two.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by sberlin on 2007-10-23 at 05:49 AM


From my experience, #1 is exactly right. A Trie easily doubles as a super-efficient
SortedMap/NavigableMap.

You hit the nail on #2 with the CharSequence version. Trie's I've encountered
function with a similar interface, but tailored towards the combined sequence
instead of the pieces that build up the sequence. That is, the methods would look
like:

   public interface Trie<K, V> {
      boolean hasPrefix(K prefix);
      SortedMap<K, V> getPrefixedBy(K prefix); // returns a view over all entries
prefixed by K
      void put(K key, V value);
   }

The difference is the Trie is directly acting off a K value, which would be the
CharSequence to a Character, or a NumberSequence to a Number. This has the
advantage of being able to reuse the interface for arbitrary-length keys or fixed-
length keys whose contents don't divide easily into objects. (For example, IP
addresses being composed of bits, CharSequences, phone numbers, etc...)

The Trie interface in the submission includes a few additional convenience methods
and directly subclasses SortedMap (if it were targetted for 1.6, it would extend
NavigableMap too). The additions are for better use with fixed-size keys and being
able to visit the entries in the map while traversing (with a 'Cursor').

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by sberlin on 2007-10-23 at 05:50 AM


(Really, that getPrefixedBy could just return a List<V> - having it return a vw of
the map itself enables some really cool things.)

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by kevinb9n on 2007-11-03 at 05:32 PM


(No comment entered for this change.)


Labels: Post-1.0

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by kevinb9n on 2008-06-02 at 05:48 PM


(No comment entered for this change.)


Labels: -Post-1.0

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by tim.frey.online on 2009-03-22 at 10:31 AM


Hello,

currently I'm wirting something for my studies and did a bit of research about Tries.
I fall upon a Hash Table performance like Trie called HAT-Trie.
It's a proposal for a fast memory aware Trie that can deal with non equal memory costs.
http://crpit.com/confpapers/CRPITV62Askitis.pdf

In my opinion it maybe would be a great advantage to use this Trie.

Hope I could help,
Tim

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by ray.a.conner on 2009-08-03 at 05:53 PM


To completely generalize it, a Trie is kind of a particular way of representing a
  Set< List< E > >
or, easily enough, a
  Map< List< E >, V >
where E is typically a character (List<E> is a String). The representation allows
certain specific operations to be very efficient, and is efficient for storage when
there are many common prefixes among the List<E> (a spelling dictionary, for example).

There may be some benefit to completely abstracting it in this way, although
translating the common use case of a String key would be an excessive amount of overhead.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by ray.a.conner on 2009-08-03 at 06:05 PM


As a use case, I've used it for finding phrases within a large text document. I have
some set of phrases that I'm looking for (not Strings, but sequences of words, hence

  • List<String>) stored as a Trie. To search, I walk the normalized text (split on
    whitespace, punctuation removed, handle case insensitivity, etc.) which is also a
    List<String>, keeping track of possible matches as I go.

Generalizing, this is finding all possible subsequences of List<E> (the text) that
are members of a given Set<List<E>> (the search phrases).

The big benefit is that the text, which can be very large, only needs to be traversed
once to find any one (or all) of thousands of possible phrases. It scales very well.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by creswick on 2009-08-03 at 06:31 PM


I have the same use case as Ray.a.conner, and I'd also use it for scalable tab|auto-
completion in interactive consoles|text fields.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by sberlin on 2009-08-03 at 06:34 PM


Are you folks using the attached PatriciaTrie implementation, or a separate one?

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by jared.l.levy on 2009-08-03 at 06:44 PM


The Google code base includes multiple Java trie implementations. If we chose to add
a trie to the library, we'd probably start with one of those.


Labels: -Type-Defect, Type-Enhancement

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by ray.a.conner on 2009-08-03 at 08:13 PM


I rolled my own implementation, some years ago. I wasn't storing simple strings, and
pretty much everything out there was character-centric. Once I made the abstraction
to sequences of strings, sequences of objects was a no-brainer. Although I never
needed to implement many of the Trie operations that others find useful; I only
needed sub-sequence matching.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by kevinb9n on 2009-09-17 at 06:02 PM


(No comment entered for this change.)


Labels: Milestone-Post1.0

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by master.java.xiao on 2009-11-24 at 08:06 AM


(No comment entered for this change.)

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2010-07-30 at 03:54 AM


(No comment entered for this change.)


Owner: [email protected]
Labels: -Milestone-Post1.0

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2010-07-30 at 03:56 AM


(No comment entered for this change.)


Labels: -Priority-Medium

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2011-01-27 at 01:59 PM


(No comment entered for this change.)


Owner: ---

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2011-02-03 at 06:24 AM


Work is beginning... expect in maybe release 10 or 11.


Status: Started
Labels: Milestone-Release10

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by rkapsi on 2011-02-24 at 10:35 PM


Awesome! I don't know if it's of any use for you but our Trie has been under ASL 2.0 for a few years now and has undergone a few refacorings.

<http://code.google.com/p/patricia-trie>
<https://github.com/rkapsi/patricia-trie>
<https://github.com/rkapsi/simple-patricia-trie>

The Trie interfaces are maybe useful even if you're not going to use PATRICIA.

Cheers,
 Roger

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by sberlin on 2011-02-25 at 02:55 PM


It probably could return the Trie itself, but would require a bit more work to make sure the additional trie operations stayed with the sub-trie range. Roger & I ate, slept & breathed patricia for a couple weeks while putting this together years ago.. so don't hesitate to ask if you have any questions on it.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by wasserman.louis on 2011-03-07 at 05:41 PM


I'd be interested in working on this -- heh, I've been living and breathing http://hackage.haskell.org/package/TrieMap for some time now.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2011-03-23 at 01:48 AM


(No comment entered for this change.)


Labels: -Milestone-Release10

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2011-07-13 at 06:19 PM


(No comment entered for this change.)


Status: Accepted

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2011-12-10 at 03:11 PM


(No comment entered for this change.)


Labels: Package-Collect

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by ian.g.simpson on 2012-03-10 at 04:48 AM


Any update on this? I just checked out 11.0.2 and I still don't see a Trie implementation in it.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2012-03-11 at 11:08 PM


There's been some progress, but unfortunately it's not high-priority. :-( Unlikely for 13.0, maybe 14.0.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by ian.g.simpson on 2012-03-22 at 06:23 PM


So what would it take to make it a higher priority? Getting more people to star the issue and/or rant and rave?

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by wasserman.louis on 2012-03-22 at 06:31 PM


It's just not a data structure that has that many use cases or appears that often, I think?

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by wasserman.louis on 2012-03-22 at 06:35 PM


Or to put it another way, the hardness of designing a trie API to Guava's standards is very high -- it's such a general structure! -- and it isn't very widely applicable.

I think starring the issue can't hurt, and might help; ranting and raving can hurt and definitely won't help; talking about why you need this, discussing your priorities for the data structure and outlining some detailed use cases helps more than anything else, but won't necessarily guarantee that it'll be a top priority.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by ian.g.simpson on 2012-03-22 at 08:24 PM


I was trying to be cynical in regard to ranting and raving and was implying people making a strong use case for it actually :)

I just wanted to get a better idea into what provoked priority in Guava.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2012-03-27 at 04:10 PM


I feel that guava could benefit from Trie internal implementation. This can be very efficient for java devs looking to use Scala like data structures. http://www.meetup.com/saylambda/events/47056842/

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by Lichtenberger.Johannes on 2012-03-28 at 05:24 PM


I'm interested in a persistent approach based on the COW-principle, that is revisions of the tree are stored in a very compact form and updates affect a minimum of nodes (which I think should be the normal case for tries). But most likely I will have to modify an existing implementation or write it from scratch. That is if Lucene can't be used, as I want to provide a full-text index for a versioned tree-structure.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by jon.h.clark on 2012-03-28 at 05:30 PM


I'd just like to the sentiment that tries are very useful and find uses all over the place. As a PhD student in CS, I work on natural language processing and machine translation. There, we use tries for efficiently storing language models (a bunch of probabilities associated with n-grams) and even translation models (a list of target phrases associated with some source language phrase/n-gram and, of course, more probabilities).

NLP people would love to see a trie in Guava. :)

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2012-05-30 at 07:43 PM


(No comment entered for this change.)


Labels: -Type-Enhancement, Type-Addition

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by Sam.Halliday on 2012-08-01 at 07:30 PM


+1 for Trie, I find myself wanting a solid Java implementation on occasion, and usually fall back to a TreeMap<String> hack which gets the job done (albeit with a much larger memory footprint).

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by onlynone on 2012-08-22 at 02:28 PM


I'd be very interested in a Trie implementation in guava. I use hadoop and need to filter the incoming data by a large set of strings, on the order of 2 million strings. Right now I have a file containing these strings, one per line, that each mapper reads into a hashset to do the filtering. The speed performance is fine, it just takes a ton of memory. I've used the exact same approach with the exact same set of strings in c++ and it never took close to the amount of memory that java is using. The strings I'm filtering by have a good deal of common leading characters, I think our memory savings would be significant with a trie. I'd like to see a TrieMap and a TrieSet.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by phraktle on 2012-11-15 at 11:53 PM


An interesting project for concurrent tries: http://code.google.com/p/concurrent-trees/

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2013-01-24 at 01:32 PM


I am presently finding Trie structures very useful in my code generation / dependency injection library. Scanning classpaths is expensive, and java packages often share long, repetitive package names; couple that with the fact my codegen library descends not just through the package structure, but the class, method, field and annotation structure as well (into a single uber-trie of your module), and a fast, concurrent trie is absolutely vital.

Compared to the org.reflections classpath scanner (which uses MultiMap and single threaded scanning), my Trie-backed implementation can, if scanning multiple jars with multiple threads, run over seven times faster, and deliver deep iterators, instead of multiple MultiMaps delivering set views.

+1 For Trie

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2013-01-24 at 02:22 PM


Sorry for the double post, but one more place I use Trie's for great effect: My StringTrie which uses char[] or CharSequence as keys, internally uses a CharPool that extends the base Trie structure, and translates a char[], start index and end index into singleton char[] instances (which pass == tests). This changed my memory footprint from double that of TreeMap to less than half (on larger datasets).

My implementation does not store a node per character; rather, it will store a char[] matching a unique sequence of chars, and using a trie-backed CharPool, it will only create as many arrays as there are unique sequences in your data (and using a shared CharPool shared across instances ensures I never allocate a byte more memory than I need).

In my benchmarks, I found that memory usage on highly-variant, dense data structures (like long sequences of toStringd() random numbers) was quite high (a new char[] per node), but with a CharPool to the rescue, memory usage was based on number of unique sequences + number of unique sequence combinations, instead of unique sequence combinations * number of sequences per combination.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by knut.wannheden on 2013-01-24 at 02:40 PM


Hi James,

Any chance that you'll share your implementation? I also have a use case where I'm currently using a TreeMap (which allows for a poor man's prefix matching) and I would like to experiment with a Trie to save some memory.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by [email protected] on 2013-01-24 at 04:59 PM


I'll package it up, put it on github in the near future, and reply back here with a link. Star the issue, and you'll get emailed.

The concurrency support is tied directly into my own internal library (which does let you inject your own implementations for handling threads or ignoring multithreading), but the whole library itself is Not Safe For Work (tm).

If I can't deploy a new version soon, I can put a cut out standalone of the trie and the charpool in a gist instead; just be warned that for small maps, you won't be saving space. Large maps, especially ones with common prefixes (or subsequences, using a shared CharPool) WILL save you on memory.

For my purposes, I save lots, because I am working with long, repetitive strings (especially when descending into inner classes, methods, fields, vars, etc). If you just maintain a big map of mostly-unique strings, trie is probably not for you.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by oliver.schrenk on 2013-03-15 at 12:04 PM


Hi James,

Did you have time to put your trie implementation up on github? I'm interested because I have a large pool of keys sharing very long prefixes and I'm interested in your approach.

from guava.

gissuebot avatar gissuebot commented on May 16, 2024

Original comment posted by ashwin.jayaprakash on 2013-03-15 at 05:01 PM


Jetty util pkg has some Trie structures if it helps - http://git.eclipse.org/c/jetty/org.eclipse.jetty.project.git/tree/jetty-util/src/main/java/org/eclipse/jetty/util/ArrayTernaryTrie.java

from guava.

noss avatar noss commented on May 16, 2024

There was a request for use cases.

I started to use the Trie in apache commons4 collections to use in lookups of longest matching prefix. Its not uncommon with rules based on phone number prefixes.

Next is to do the same but for IP networks, almost like a routing table. Apache commons4 4.0 doesnt allow me to create a patricia tree on other types though. But I guess limewire is almost doing this, but refactoring has made the codebases diverge.

The lookup tables will have around ~100k entries, and there about 100k lookups per second.

from guava.

kluever avatar kluever commented on May 16, 2024

We're not working on this anytime soon...and it will require a lot more investigation and studying.

from guava.

evanyang1 avatar evanyang1 commented on May 16, 2024

Given that this issue was opened in 2014, what would it take for this issue to be resolved + closed?

from guava.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.