Code Monkey home page Code Monkey logo

intervaltree's People

Contributors

jblachly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

intervaltree's Issues

API harmonization: insert

Currently, AVL tree and Splay tree insert function signatures differ:

AVL:
Node *insert(Node *x, out uint cnt)

Splay:
Node * insert(IntervalType i)

In practice then calling code looks like:

                    foreach(link; c.links)
                    {
                        version(avl)    { uint cnt; (*tree).insert( new IntervalTreeNode!ChainLink(*link), cnt ); }
                        version(splay)  (*tree).insert(*link);

Proposal:
Both should take an IntervalType (current behavior of splay tree)

As a bonus, could also include overload that takes pointer

cgranges is not @safe

With the proposed transition to @safe by default, we should correctly annotate functions. The cgranges interface directly manipulates pointers, and the functions should be annotated @trusted

Probabalistic splaying

This strategy is a huge win for certain serial workloads and was introduced in commit a112efb

However, it currently uses a hardcoded probability threshold (ubyte). This should be templatized so a probability, p, can be passed at compile time (debtably, parameterize the function to allow runtime selection as well, but I don't like this as much)

splay tree optimizations

  1. reduce use of updateMax (may not need to bubble up since we update on the way down on insert)

  2. templatize zig, ziz-zig, and zig-zag by direction to factor out if/else (although these if/else may be branch predicted well already)

  3. findOverlapsWith: static array stack. Will need to instrument this to see how deep an unbalanced tree we have after loading chain file.

  4. findOverlapsWith: static array ret. (would also want to change IntervalAVLTree to match)

  5. (Benchmark) When # overlapping intervals > 1, splay once or more

Harmonize GC handling

Source of really hard to track down bugs.

Currently, all three tree implementations manage their own memory. Being containers, they copy in whatever object is passed to them. If this has, say, a string, with a pointer to the GC heap it may be reaped when its original reference goes out of scope, even though the tree holds on to it. Potential disaster. Of course this can be avoided by passing only value types.

Because I first discovered this when dealing with IITree (indeed at that time I think the splaytree and avltree were using the GC to alloc anyway so it wasn't manifest there) I did implement in the IITree insert options
(a) specify whether the passed in pointer (IITree takes a pointer instead of an object) was a GC managed pointer (GCptr=true), and
(b) indicate whether it may contain other pointers to the GC heap (trackGC=true)

See discussion here:

/// Insert interval for contig
///
/// Note this differs from the other trees in the package in inclusion of "contig" parameter,
/// because underlying cgranges has built-in hashmap and essentially stores multiple trees.
/// Passing a \0-terminated C string contig will be faster than passing a D string or char[],
/// due to the need to call toStringz before calling the C API.
///
/// last param "label" of cr_add not used by cgranges as of 2019-05-04
///
/// Params:
/// S = string-like contig name (if absent will pass "default" to cr_add)
/// i = IntervalType or IntervalType*
/// trackGC = (Default true) add i to the GC scanned ranges. see discussion
/// GCptr = (Default true) when i is IntervalType*, is the pointer to GC-mgd memory?
///
/// Non-payload variant: params contig, start, end
///
/// Discussion:
/// This container structure may store an IntervalType in one of two ways.
/// First, you can pass a pointer which will be stored directly without additional
/// memory allocations. Note however that if this pointer is to garbage collected
/// memory and no other reference exists in the D code, it could be swept away.
/// Even if it is not to GC memory, if it encapsulates a pointer (e.g., a string )
/// you can still be bitten. Thus trackGC defaults to true. If unneeded, you might
/// obtain a marginal speedup by setting trackGC = false.
///
/// The interval to be contained may also be passed by value. In this case,
/// memory will be allocated with malloc and the pointer will be stored. In this case
/// the same caveats regarding contained GC-pointing objects like D-arrays and strings
/// still apply. For example, passing a struct by value may not necessitate tracking
/// the memory in the GC if it contains only value types, but if the struct contains
/// a D-style array or string, it must be tracked
///
/// Finally, there is a variant missing the first contig parameter. Unlike other trees,
/// cgranges requires a string contig parameter, so a default value of "default"
/// will be supplied.
cr_intv_t* insert(S)(S contig, IntervalType* i, bool trackGC = true, bool GCptr = true)

Need to add trackGC option to the other tree implementations. Since they don't take pointer, probably skip adding GCptr.

Also TODO: document the difference in the APIs, and document that non-value-types shouldn't be added to splaytree and avltree for now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.