seung-lab / chunkflow.jl Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 45.56 MB

Distributed pipeline inside and beyond the cloud

License: Apache License 2.0

Julia 69.93% Python 10.59% Shell 0.77% HTML 0.28% TypeScript 9.99% JavaScript 1.22% Dockerfile 7.21%

chunkflow.jl's People

Contributors

Stargazers

Watchers

chunkflow.jl's Issues

compile portable Julia for distribution

compiled julia is more efficient at least for watershed. but current compiled Julia in docker image is not portable, only works in my workstation!

distribution Julia compilation

https://github.com/JuliaLang/julia/blob/master/DISTRIBUTING.md

compile Julia in `user_data`

compile Julia first when launching an instance

feat: Bounding Box in provenance Files

I was doing a downsample of an inference run, but I suspect that inference run covered only a small region of the dataset. If I knew the bounding box, I could restrict it somewhat. In Igneous, I found this is a good format:

      'bounds': [
        bounds.minpt.tolist(),
        bounds.maxpt.tolist()
      ],

global consistent Watershed and mean-affinity agglomeration

rescale affinitymap to 8 bit?

affinitymap and semantic map is close to binary, but was formatted as Float32. In Phase II, this will cost about 24(3+4) = 56 peterbyte without compression! We got about 50% compression rate using blosclz, so it will still cost about 28 peterbyte storage!

at least for real time visualization, we can downsample and push 8-bit chunks to neuroglancer directly.

they could potentially be remapped to 8 bit to reduce the storage to 2*(3+4)= 14 without compression.

remapping

histogram equalization to get remapping function

test effect of segmentation

occasional failure of internet connection

the task producer is not stable, can fail occasionally. need to manually restart it. it is really annoying.

ERROR: LoadError: readcb: connection reset by peer (ECONNRESET)                                       │················
 in yieldto(::Task, ::ANY) at ./event.jl:136                                                          │················
 in wait() at ./event.jl:169                                                                          │················
 in wait(::Condition) at ./event.jl:27                                                                │················
 in wait_readnb(::TCPSocket, ::Int64) at ./stream.jl:303                                              │················
 in readbytes!(::TCPSocket, ::Array{UInt8,1}, ::Int64) at ./stream.jl:725                             │················
 in readbytes!(::TCPSocket, ::Array{UInt8,1}, ::UInt64) at ./stream.jl:714                            │················
 in f_recv(::Ptr{Void}, ::Ptr{UInt8}, ::UInt64) at /usr/people/jingpeng/.julia/v0.5/MbedTLS/src/ssl.jl│················
:103                                                                                                  │················
 in macro expansion at /usr/people/jingpeng/.julia/v0.5/MbedTLS/src/error.jl:3 [inlined]              │················
 in handshake(::MbedTLS.SSLContext) at /usr/people/jingpeng/.julia/v0.5/MbedTLS/src/ssl.jl:145        │················
 in open_stream(::HttpCommon.Request, ::MbedTLS.SSLConfig, ::Float64, ::Nullable{URIParser.URI}, ::Nul│················
lable{URIParser.URI}) at /usr/people/jingpeng/.julia/v0.5/Requests/src/streaming.jl:209               │················
 in #do_stream_request#23(::Dict{String,String}, ::Void, ::Void, ::Void, ::Array{Requests.FileParam,1}│················
, ::Void, ::Dict{Any,Any}, ::Bool, ::Int64, ::Array{HttpCommon.Response,1}, ::MbedTLS.SSLConfig, ::Boo│················
l, ::Bool, ::Bool, ::Nullable{URIParser.URI}, ::Nullable{URIParser.URI}, ::Requests.#do_stream_request│················
, ::URIParser.URI, ::String) at /usr/people/jingpeng/.julia/v0.5/Requests/src/Requests.jl:381         │················
 in (::Requests.#kw##do_stream_request)(::Array{Any,1}, ::Requests.#do_stream_request, ::URIParser.URI│················
, ::String) at ./<missing>:0                                                                          │················
 in #do_request#22(::Array{Any,1}, ::Function, ::URIParser.URI, ::String) at /usr/people/jingpeng/.jul│················
ia/v0.5/Requests/src/Requests.jl:311                                                                  │················
 in (::Requests.#kw##do_request)(::Array{Any,1}, ::Requests.#do_request, ::URIParser.URI, ::String) at│················
 ./<missing>:0

potential solutions:

add error catch, and restart automatically
find out why this happens first. it is hard to reproduce this error...

SQSChannel

SQS serves as Julia Channel, should be able to create a package to handle message. Will be easy to switch to other messaging softwares.

data flow model

abstract each step as a computational node, and use data flow model.
can explore parallelism among algorithms automatically.

this is not urgent.

crop will overwrite the original image file

current pipeline works without boundary mirror. It crop the original image according to the field of view. this will change the original image, which do not have problem in AWS. We store original image stackes in AWS S3 and have a copy locally. This will corrupt original image locally.

Solution is exchange data in memory rather than write to files.

shuffle the task list for more random IO

Currently, the task chunk is neighboring from each other, this might create concurrent IO to the cloud storage server. shuffle the task will make a more evenly distributed IO requests across the cloud storage servers.

parallel writing of hypersquare

hypersquare takes about the same time with watershed and agglomeration.
hypersquare writes different data representations to google cloud, and should be parallizable.

use `@spawn` to execute different writing function.

use `@parallel for` to parallelize writting of images

reference of this discussion thread:
https://discourse.julialang.org/t/using-threads-with-i-o-to-processing-many-files-in-parallel/1112/4

logging in the cloud

google cloud stack driver API
https://github.com/joshbode/GoogleCloud.jl/blob/master/src/api/logging.jl

continuous computation across stages

problem

the pipeline was split to two stages: convnet inference and segmentation. We run pipeline stage by stage. This is time consuming and is not efficient if the dataset size is large.

existing solutions

Theoretically, we can manually control the stages and make them parallel, but it could potentially introduce human errors in the process.

better solution

To do it fully automatically, we need task dependency according to the offset of finished chunks of previous stage. Google pub/sub or AWS SNS might be a potential solution.

scheduling based on spacial dependency

may need a scheduling server

luigi is a good dependency engine. consider using python for all the distribution work and use Julia as backend for fast computation and slicing. luigi is also supported be AWS Batch, which might be a good choice for Phase II.
airflow. newer, not sure whether it is good for this job or not.

run schedule engine in a cluster

kubernetes: luigi have support for kubernetes cluster
docker swarm: new, easy to use, but did not see documentation of working with luigi and other scheduling packages.

parallel task producing

current task producing is sensitive with internet connection, could be interrupted due to unstable internet connection. We are using single process to ease resubmitting tasks right after the interrupted point.

Retry.jl aims to fix this issue. After fixing this, should be able to produce tasks parallelly.

automatic scheduling using MXNet

integrate the pipeline to MXNet, make every function as an operator of MXNet.

This should be doable after the customized operator implementation of MXNet:
dmlc/MXNet.jl#166

evaluation of affinity map

There are a few bad affinity chunks in zebrafish. should have an evaluation process to double check the quality of affinity map.

There are a few approaches.

comparison with image

if the image is not black, the affinity map should not be zero, and should have reasonable variation. This might not apply to cell body.
We need to read both affinity map and image chunks, so this could be done without much overhead.

statistics of affinity map along

this approach do not need image chunk.
since the chunks covering the dataset boundary might have similar statistics with bad affinity chunks, this might not be robust enough.

compare with neighboring chunks

have a lot of overhead

prefetch input chunks

data IO takes a lot of time, we can use prefetch to hide IO cost.

a worker keep doing cutout and put data in a RemoteChannel, and a worker fetch and process the data.

The data should be organized as a Dict, and the keys corresponds to placeholders.

split out the scheduling part?

The scheduling part could be independent with the execution part. After abstracting the scheduling part, we may be able to use multiple different scheduling tools, such as SimpleTasks.jl and Google pub/sub.

"synchronization" of processes

although the processes were not synchronized, but they were launched at the same time and competing resources at the same time.

There are three processes running in one instance, but there is a gap with cpu usage and the gpu is idling. this is a wast of resources.

the three processes finished almost at the same time and started downloading data at the same time. To avoid this case, simply have a time lapse for launching processes. this timelaps should be longer than data downloading.

create instance gradually using auto-scaling

the auto-scaling policy have an option to set interval of instance creation.

DNS caching

massive processing requires a lot of DNS requests, which affects university network. Might also slowdown the processing a little bit. Setting up DNS caching in docker image can solve this problem.

a optional tool: Dnsmasq

docker container hangs

observed a few times that the container just hangs and do nothing. After over 10 mins, the program could continue doing work.

Retry's exponential waiting time

another possiblity is that the repeat waiting time is exponential! using 5 repeat times will be much longer than 4 making the docker container hang.
https://github.com/samoconnor/Retry.jl

Is it worth to use tcmalloc for faster meshing?

https://github.com/seung-lab/spipe/blob/master/segm2omprj.jl#L48

 sudo apt-get install libtcmalloc-minimal4
 LD_LIBRARY_PATH=/usr/lib/libtcmalloc_minimal.so.4 omni --headless --cmdfile

run pipeline once to distribute processes on multiple GPUs in a single machine

currently, I need to manually launch processes for each GPU.

s3 chunk meta-data

the coding-content of chunks in s3 should be gzip, otherwise neuroglancer will not decode it with gzip and can not be visualized.