Code Monkey home page Code Monkey logo

chunkflow.jl's People

Contributors

femtocleaner[bot] avatar jingpengw avatar nicholasturner1 avatar shangmu avatar tartavull avatar torms3 avatar wongwill86 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chunkflow.jl's Issues

feat: Bounding Box in provenance Files

I was doing a downsample of an inference run, but I suspect that inference run covered only a small region of the dataset. If I knew the bounding box, I could restrict it somewhat. In Igneous, I found this is a good format:

      'bounds': [
        bounds.minpt.tolist(),
        bounds.maxpt.tolist()
      ],

rescale affinitymap to 8 bit?

affinitymap and semantic map is close to binary, but was formatted as Float32. In Phase II, this will cost about 24(3+4) = 56 peterbyte without compression! We got about 50% compression rate using blosclz, so it will still cost about 28 peterbyte storage!

at least for real time visualization, we can downsample and push 8-bit chunks to neuroglancer directly.

they could potentially be remapped to 8 bit to reduce the storage to 2*(3+4)= 14 without compression.

remapping

  • histogram equalization to get remapping function

test effect of segmentation

occasional failure of internet connection

the task producer is not stable, can fail occasionally. need to manually restart it. it is really annoying.

ERROR: LoadError: readcb: connection reset by peer (ECONNRESET)                                       │················
 in yieldto(::Task, ::ANY) at ./event.jl:136                                                          │················
 in wait() at ./event.jl:169                                                                          │················
 in wait(::Condition) at ./event.jl:27                                                                │················
 in wait_readnb(::TCPSocket, ::Int64) at ./stream.jl:303                                              │················
 in readbytes!(::TCPSocket, ::Array{UInt8,1}, ::Int64) at ./stream.jl:725                             │················
 in readbytes!(::TCPSocket, ::Array{UInt8,1}, ::UInt64) at ./stream.jl:714                            │················
 in f_recv(::Ptr{Void}, ::Ptr{UInt8}, ::UInt64) at /usr/people/jingpeng/.julia/v0.5/MbedTLS/src/ssl.jl│················
:103                                                                                                  │················
 in macro expansion at /usr/people/jingpeng/.julia/v0.5/MbedTLS/src/error.jl:3 [inlined]              │················
 in handshake(::MbedTLS.SSLContext) at /usr/people/jingpeng/.julia/v0.5/MbedTLS/src/ssl.jl:145        │················
 in open_stream(::HttpCommon.Request, ::MbedTLS.SSLConfig, ::Float64, ::Nullable{URIParser.URI}, ::Nul│················
lable{URIParser.URI}) at /usr/people/jingpeng/.julia/v0.5/Requests/src/streaming.jl:209               │················
 in #do_stream_request#23(::Dict{String,String}, ::Void, ::Void, ::Void, ::Array{Requests.FileParam,1}│················
, ::Void, ::Dict{Any,Any}, ::Bool, ::Int64, ::Array{HttpCommon.Response,1}, ::MbedTLS.SSLConfig, ::Boo│················
l, ::Bool, ::Bool, ::Nullable{URIParser.URI}, ::Nullable{URIParser.URI}, ::Requests.#do_stream_request│················
, ::URIParser.URI, ::String) at /usr/people/jingpeng/.julia/v0.5/Requests/src/Requests.jl:381         │················
 in (::Requests.#kw##do_stream_request)(::Array{Any,1}, ::Requests.#do_stream_request, ::URIParser.URI│················
, ::String) at ./<missing>:0                                                                          │················
 in #do_request#22(::Array{Any,1}, ::Function, ::URIParser.URI, ::String) at /usr/people/jingpeng/.jul│················
ia/v0.5/Requests/src/Requests.jl:311                                                                  │················
 in (::Requests.#kw##do_request)(::Array{Any,1}, ::Requests.#do_request, ::URIParser.URI, ::String) at│················
 ./<missing>:0                                       

potential solutions:

  • add error catch, and restart automatically
  • find out why this happens first. it is hard to reproduce this error...

SQSChannel

SQS serves as Julia Channel, should be able to create a package to handle message. Will be easy to switch to other messaging softwares.

data flow model

abstract each step as a computational node, and use data flow model.
can explore parallelism among algorithms automatically.

this is not urgent.

crop will overwrite the original image file

current pipeline works without boundary mirror. It crop the original image according to the field of view. this will change the original image, which do not have problem in AWS. We store original image stackes in AWS S3 and have a copy locally. This will corrupt original image locally.

Solution is exchange data in memory rather than write to files.

shuffle the task list for more random IO

Currently, the task chunk is neighboring from each other, this might create concurrent IO to the cloud storage server. shuffle the task will make a more evenly distributed IO requests across the cloud storage servers.

continuous computation across stages

problem

the pipeline was split to two stages: convnet inference and segmentation. We run pipeline stage by stage. This is time consuming and is not efficient if the dataset size is large.

existing solutions

Theoretically, we can manually control the stages and make them parallel, but it could potentially introduce human errors in the process.

better solution

To do it fully automatically, we need task dependency according to the offset of finished chunks of previous stage. Google pub/sub or AWS SNS might be a potential solution.

scheduling based on spacial dependency

may need a scheduling server

  • luigi is a good dependency engine. consider using python for all the distribution work and use Julia as backend for fast computation and slicing. luigi is also supported be AWS Batch, which might be a good choice for Phase II.
  • airflow. newer, not sure whether it is good for this job or not.

run schedule engine in a cluster

  • kubernetes: luigi have support for kubernetes cluster
  • docker swarm: new, easy to use, but did not see documentation of working with luigi and other scheduling packages.

parallel task producing

current task producing is sensitive with internet connection, could be interrupted due to unstable internet connection. We are using single process to ease resubmitting tasks right after the interrupted point.

Retry.jl aims to fix this issue. After fixing this, should be able to produce tasks parallelly.

evaluation of affinity map

There are a few bad affinity chunks in zebrafish. should have an evaluation process to double check the quality of affinity map.

There are a few approaches.

comparison with image

  • if the image is not black, the affinity map should not be zero, and should have reasonable variation. This might not apply to cell body.
  • We need to read both affinity map and image chunks, so this could be done without much overhead.

statistics of affinity map along

  • this approach do not need image chunk.
  • since the chunks covering the dataset boundary might have similar statistics with bad affinity chunks, this might not be robust enough.

compare with neighboring chunks

  • have a lot of overhead

prefetch input chunks

data IO takes a lot of time, we can use prefetch to hide IO cost.

a worker keep doing cutout and put data in a RemoteChannel, and a worker fetch and process the data.

The data should be organized as a Dict, and the keys corresponds to placeholders.

split out the scheduling part?

The scheduling part could be independent with the execution part. After abstracting the scheduling part, we may be able to use multiple different scheduling tools, such as SimpleTasks.jl and Google pub/sub.

"synchronization" of processes

although the processes were not synchronized, but they were launched at the same time and competing resources at the same time.

There are three processes running in one instance, but there is a gap with cpu usage and the gpu is idling. this is a wast of resources.

image

the three processes finished almost at the same time and started downloading data at the same time. To avoid this case, simply have a time lapse for launching processes. this timelaps should be longer than data downloading.

create instance gradually using auto-scaling

the auto-scaling policy have an option to set interval of instance creation.

DNS caching

massive processing requires a lot of DNS requests, which affects university network. Might also slowdown the processing a little bit. Setting up DNS caching in docker image can solve this problem.

a optional tool: Dnsmasq

docker container hangs

observed a few times that the container just hangs and do nothing. After over 10 mins, the program could continue doing work.

image

Retry's exponential waiting time

another possiblity is that the repeat waiting time is exponential! using 5 repeat times will be much longer than 4 making the docker container hang.
https://github.com/samoconnor/Retry.jl

s3 chunk meta-data

the coding-content of chunks in s3 should be gzip, otherwise neuroglancer will not decode it with gzip and can not be visualized.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.