The mango's discuss from bigdatagenomics

Use working set for reference

Currently a new RDD is created upon each reference to get the reference.

To solve this, we should support reference files in LazyMaterialization or keep a global RDD[NucleotideContigFragment] in VizReads

Lay out tracks more efficiently

Currently, large amounts of data elements takes very long to process. In TrackedLayout.scala.

Use Standardized DNA Color Scheme

http://www.umass.edu/molvis/tutorials/dna/atgc.htm

Click Reads/Variants/Features for More Information

To show more than just hovering over. Note that the information contained is selected by the projection in VizReads.scala. This issue may involve modifying the "print methods" such as printVariationJson and the case classes for the corresponding Json object to send over to the frontend.

Support VCF Indexing

Tabix Indexing and htsjdk.tribble.index.tabix

Check reference query against SequenceDictionary

Ensure correct user input, and give feedback

mapping quality filter on overall not working

Draw mate pairs in a single line

Display sample name for each track of variants

Select Chromosome and Region In Single Request Box

Eg. chr20: 90000-95000

Currently the chromosome is specified at the command line, and the region need to be entered in the start and end boxes.

Metrics Timers Break When HTTP Requests Break

For example, in the overall view, if the reference request errors out, the following requests will not be carried out due to the following error:

java.lang.AssertionError: assertion failed: Timer name from on top of stack [/GET reference(55,false)/GET features(0,false)/GET reads(0,false)/collect at VizReads.scala:424(58,true)] did not match passed-in timer name [GET features]

Certain Reference Intervals Don't Load With Parquet Predicates

Cleanup viewRegion/Region variables in VizReads

The two variables are interchangeably used in the server response to get("/reads/:ref")

Fisheye for Variants

To allow easier view of variants. Could be applied to other features.

Track Current Position Across Pages

#8 was fixed by #9, but fixing a bug when displaying Reference has made this pop up again.

Specifically, the ReferenceRegion keeping track of the current position was removed when performing a quick fix to displaying the bases in reference files in #39 .

get("/reference/:ref") {
    VizTimers.RefRequest.time {
      val viewRegion = ReferenceRegion(params("ref"), params("start").toLong, params("end").toLong)

Draw mismatches/INDELs in reads

No searchbar and multiple tracks in variant page

Make file input optional

Users should be able to choose not to provide reads/variants/etc. The reference should be required IMO, but otherwise the user shouldn't be forced to provide one of every input.

Can't load the UI

Hi @fnothaft and @erictu,

I understand that Mango is still in very early stages.
I was curious about it and wanted to see how it works.

I tried building it on a linux machine (ubuntu).
I was able to start the server but when I go to http://localhost:8080, I see a error

Any idea on what I am doing wrong?

java.lang.NoSuchMethodError: javax.servlet.http.HttpServletResponse.getStatus()I
    at org.scalatra.servlet.RichResponse.status(RichResponse.scala:16)
    at org.scalatra.ScalatraContext$class.status(ScalatraContext.scala:29)
    at org.scalatra.ScalatraServlet.status(ScalatraServlet.scala:49)
    at org.scalatra.ScalatraBase$class.runActions$1(ScalatraBase.scala:165)
    at org.scalatra.ScalatraBase$$anonfun$executeRoutes$1.apply$mcV$sp(ScalatraBase.scala:175)
    at org.scalatra.ScalatraBase$$anonfun$executeRoutes$1.apply(ScalatraBase.scala:175)
    at org.scalatra.ScalatraBase$$anonfun$executeRoutes$1.apply(ScalatraBase.scala:175)
    at org.scalatra.ScalatraBase$class.org$scalatra$ScalatraBase$$cradleHalt(ScalatraBase.scala:193)
    at org.scalatra.ScalatraBase$class.executeRoutes(ScalatraBase.scala:175)
    at org.scalatra.ScalatraServlet.executeRoutes(ScalatraServlet.scala:49)
    at org.scalatra.ScalatraBase$$anonfun$handle$1.apply$mcV$sp(ScalatraBase.scala:113)
    at org.scalatra.ScalatraBase$$anonfun$handle$1.apply(ScalatraBase.scala:113)
    at org.scalatra.ScalatraBase$$anonfun$handle$1.apply(ScalatraBase.scala:113)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
    at org.scalatra.DynamicScope$class.withResponse(DynamicScope.scala:80)
    at org.scalatra.ScalatraServlet.withResponse(ScalatraServlet.scala:49)
    at org.scalatra.DynamicScope$$anonfun$withRequestResponse$1.apply(DynamicScope.scala:60)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
    at org.scalatra.DynamicScope$class.withRequest(DynamicScope.scala:71)
    at org.scalatra.ScalatraServlet.withRequest(ScalatraServlet.scala:49)
    at org.scalatra.DynamicScope$class.withRequestResponse(DynamicScope.scala:59)
    at org.scalatra.ScalatraServlet.withRequestResponse(ScalatraServlet.scala:49)
    at org.scalatra.ScalatraBase$class.handle(ScalatraBase.scala:111)
    at org.scalatra.ScalatraServlet.org$scalatra$servlet$ServletBase$$super$handle(ScalatraServlet.scala:49)
    at org.scalatra.servlet.ServletBase$class.handle(ServletBase.scala:43)
    at org.scalatra.ScalatraServlet.handle(ScalatraServlet.scala:49)
    at org.scalatra.ScalatraServlet.service(ScalatraServlet.scala:54)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
    at org.eclipse.jetty.server.Server.handle(Server.java:370)
    at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
    at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
    at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
    at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745)

Thanks,
Nikhil

Load Files in From Webapp

Ideally, command line just boots up the webapp, and all further activity is performed through the webapp. This allows loading different files (reads, etc.) without quitting and relaunching the applications.

Serve Multiple HTTP Requests in Parallel

So Spark jobs run concurrently, not sequentially.

FilterPredicate overlapping logic doesn't seem right

https://github.com/bigdatagenomics/mango/blob/master/mango-cli/src/main/scala/org/bdgenomics/mango/cli/VizReads.scala#L251

Support Newer Spark Versions

Only works on 1.2.1

Lazy Materialization of Data

3 Part Hierarchy (Top down)

LazyMaterialization[V]
Where V is the datatype e.g. AlignmentRecord
This layer manages loading of data from disk in different ways (if it doesn't exist in the RDD)
RDD[IntervalTreePartition[K, S, V]]
Where K is the key (interval of start and end), and S is the entity identifier.
This layer manages which data is stored in which partition.
IntervalTreePartition[K, S, V]
Where K is the key (interval of start and end), and S is the entity identifier.
This layer manages putting and getting data by use of a 2-dimensional range index (interval).

Draw read orientation

Reads aligned forward strand should have an arrow pointing to the left from the end of the read and reads aligned reverse strand should have an arrow pointing to the right from the start of the read.

Eliminate Local Network Latency

When running mango on localhost, the request time for large files contains a significant time downloading the JSON created by the scalatra servlet. The json can get quite large (100 MB+)

Find some way to eliminate this latency, at least on localhost, perhaps by outputting json to a working directory on disk, and reading that file in from the froontend

Support Fasta Indexing

Using faidx and htsjdk.samtools.reference.FastaSequenceIndex

Create Toggles For Display Options

E.g. Show mismatched bases

Variant Frequency Page Doesn't Track ViewRegion

Currently defaults to 0-100 despite traversing a different region. Perhaps add in a box?

Double Call to Features

Avoid reading files in twice upon initial render

/reads and /overall load in an RDD to calculate the number of tracks. This loading is again done when issuing a GET request to /reads/:ref that actually gets the Json information to render. Find some way to eliminate the initial loading, and calculate tracks, as this is redundant.

Handle D3 Update Correctly

Currently very naively just removes all elements and re-renders all svg groupings.
Utilize enter(), update, exit() correctly to only re-render elements needed, while cleanly shifting existing elements to the new correct position on the page.

Use Parquet predicates to load files more quickly

Currently the speed is the same whether Parquet predicates are used or not. (Though Parquet files load much more quickly than non-Parquet)

Tooltip/Vertical Guide Bar for Frequency

Dynamically Load Multiple Samples

Currently just allows two

Fetch callset from variant files

Hardcoded for now in VizReads.scala

Indels and mismatches disappear after reload

If you resize the reads page and mismatches are visible, the mismatches disappear although they are selected in the view menu.

Double Call for Reference Data

Both the reads and reference http request fetch from the reference. We should implement a working set.

Track current position across pages

E.g., if I am at chr20:29828000-29830000 on the /overall page, I should view chr20:29828000-29830000 when I switch to /freq page. Currently, we "reset" to the start of the chromosome.

No hover over information for reads over multiple samples

When loading in two reads files with the same sampleName, the visualization for the second sample won't display.

Add Projection for RDD[NucleotideContigFragment]

After #47 is resolved.

Specify region to view after bootup

Currently defaults to a region between 0 and 100

Cannot load multiple feature files

Eliminate/Reduce Jetty Server Delay

HTTP request takes additional time to receive request. These exists delay between after the server receives processed JSON, and when the web browser receives the HTTP response with the JSON.

Feature Menu/Index Lookup on Features

Perhaps using KV Store

Draw cursor across all tracks

Support SAM/BAM Indexing

Update gitignore for data and .idea files

High coverage sections cause display to mess up

If you try to visualize a very high coverage region on the overall page, you can run into funny issues where the non-read data isn't displayed. @erictu I have a set of files that reproduces this bug; they're pretty small so I'll go ahead and tar them up and send them to you.

Eliminate Use of ReferenceRegion When Creating TrackedLayout

ReferenceRegion places extra overhead when creating TrackedLayout. This overhead includes creating a ReferenceRegion for each record, and projecting the contig field of a record. Directly accessing the start and end fields in a record can possibly reduce this overhead.

bigdatagenomics / mango Goto Github PK

mango's Issues

Recommend Projects

Recommend Topics

Recommend Org