seung-lab / chunkflow.jl Goto Github PK
View Code? Open in Web Editor NEWDistributed pipeline inside and beyond the cloud
License: Apache License 2.0
Distributed pipeline inside and beyond the cloud
License: Apache License 2.0
compiled julia is more efficient at least for watershed. but current compiled Julia in docker image is not portable, only works in my workstation!
https://github.com/JuliaLang/julia/blob/master/DISTRIBUTING.md
user_data
compile Julia first when launching an instance
I was doing a downsample of an inference run, but I suspect that inference run covered only a small region of the dataset. If I knew the bounding box, I could restrict it somewhat. In Igneous, I found this is a good format:
'bounds': [
bounds.minpt.tolist(),
bounds.maxpt.tolist()
],
affinitymap and semantic map is close to binary, but was formatted as Float32. In Phase II, this will cost about 24(3+4) = 56 peterbyte without compression! We got about 50% compression rate using blosclz
, so it will still cost about 28 peterbyte storage!
at least for real time visualization, we can downsample and push 8-bit chunks to neuroglancer directly.
they could potentially be remapped to 8 bit to reduce the storage to 2*(3+4)= 14 without compression.
the task producer is not stable, can fail occasionally. need to manually restart it. it is really annoying.
ERROR: LoadError: readcb: connection reset by peer (ECONNRESET) │················
in yieldto(::Task, ::ANY) at ./event.jl:136 │················
in wait() at ./event.jl:169 │················
in wait(::Condition) at ./event.jl:27 │················
in wait_readnb(::TCPSocket, ::Int64) at ./stream.jl:303 │················
in readbytes!(::TCPSocket, ::Array{UInt8,1}, ::Int64) at ./stream.jl:725 │················
in readbytes!(::TCPSocket, ::Array{UInt8,1}, ::UInt64) at ./stream.jl:714 │················
in f_recv(::Ptr{Void}, ::Ptr{UInt8}, ::UInt64) at /usr/people/jingpeng/.julia/v0.5/MbedTLS/src/ssl.jl│················
:103 │················
in macro expansion at /usr/people/jingpeng/.julia/v0.5/MbedTLS/src/error.jl:3 [inlined] │················
in handshake(::MbedTLS.SSLContext) at /usr/people/jingpeng/.julia/v0.5/MbedTLS/src/ssl.jl:145 │················
in open_stream(::HttpCommon.Request, ::MbedTLS.SSLConfig, ::Float64, ::Nullable{URIParser.URI}, ::Nul│················
lable{URIParser.URI}) at /usr/people/jingpeng/.julia/v0.5/Requests/src/streaming.jl:209 │················
in #do_stream_request#23(::Dict{String,String}, ::Void, ::Void, ::Void, ::Array{Requests.FileParam,1}│················
, ::Void, ::Dict{Any,Any}, ::Bool, ::Int64, ::Array{HttpCommon.Response,1}, ::MbedTLS.SSLConfig, ::Boo│················
l, ::Bool, ::Bool, ::Nullable{URIParser.URI}, ::Nullable{URIParser.URI}, ::Requests.#do_stream_request│················
, ::URIParser.URI, ::String) at /usr/people/jingpeng/.julia/v0.5/Requests/src/Requests.jl:381 │················
in (::Requests.#kw##do_stream_request)(::Array{Any,1}, ::Requests.#do_stream_request, ::URIParser.URI│················
, ::String) at ./<missing>:0 │················
in #do_request#22(::Array{Any,1}, ::Function, ::URIParser.URI, ::String) at /usr/people/jingpeng/.jul│················
ia/v0.5/Requests/src/Requests.jl:311 │················
in (::Requests.#kw##do_request)(::Array{Any,1}, ::Requests.#do_request, ::URIParser.URI, ::String) at│················
./<missing>:0
potential solutions:
SQS serves as Julia Channel, should be able to create a package to handle message. Will be easy to switch to other messaging softwares.
abstract each step as a computational node, and use data flow model.
can explore parallelism among algorithms automatically.
this is not urgent.
current pipeline works without boundary mirror. It crop the original image according to the field of view. this will change the original image, which do not have problem in AWS. We store original image stackes in AWS S3 and have a copy locally. This will corrupt original image locally.
Solution is exchange data in memory rather than write to files.
Currently, the task chunk is neighboring from each other, this might create concurrent IO to the cloud storage server. shuffle the task will make a more evenly distributed IO requests across the cloud storage servers.
hypersquare takes about the same time with watershed and agglomeration.
hypersquare writes different data representations to google cloud, and should be parallizable.
@spawn
to execute different writing function.@parallel for
to parallelize writting of imagesreference of this discussion thread:
https://discourse.julialang.org/t/using-threads-with-i-o-to-processing-many-files-in-parallel/1112/4
google cloud stack driver API
https://github.com/joshbode/GoogleCloud.jl/blob/master/src/api/logging.jl
the pipeline was split to two stages: convnet inference and segmentation. We run pipeline stage by stage. This is time consuming and is not efficient if the dataset size is large.
Theoretically, we can manually control the stages and make them parallel, but it could potentially introduce human errors in the process.
To do it fully automatically, we need task dependency according to the offset of finished chunks of previous stage. Google pub/sub or AWS SNS might be a potential solution.
may need a scheduling server
current task producing is sensitive with internet connection, could be interrupted due to unstable internet connection. We are using single process to ease resubmitting tasks right after the interrupted point.
Retry.jl aims to fix this issue. After fixing this, should be able to produce tasks parallelly.
integrate the pipeline to MXNet, make every function as an operator of MXNet.
This should be doable after the customized operator implementation of MXNet:
dmlc/MXNet.jl#166
There are a few bad affinity chunks in zebrafish. should have an evaluation process to double check the quality of affinity map.
There are a few approaches.
data IO takes a lot of time, we can use prefetch to hide IO cost.
a worker keep doing cutout and put data in a RemoteChannel
, and a worker fetch and process the data.
The data should be organized as a Dict, and the keys corresponds to placeholders
.
The scheduling part could be independent with the execution part. After abstracting the scheduling part, we may be able to use multiple different scheduling tools, such as SimpleTasks.jl and Google pub/sub.
although the processes were not synchronized, but they were launched at the same time and competing resources at the same time.
There are three processes running in one instance, but there is a gap with cpu usage and the gpu is idling. this is a wast of resources.
the three processes finished almost at the same time and started downloading data at the same time. To avoid this case, simply have a time lapse for launching processes. this timelaps should be longer than data downloading.
the auto-scaling policy have an option to set interval of instance creation.
massive processing requires a lot of DNS requests, which affects university network. Might also slowdown the processing a little bit. Setting up DNS caching in docker image can solve this problem.
a optional tool: Dnsmasq
observed a few times that the container just hangs and do nothing. After over 10 mins, the program could continue doing work.
another possiblity is that the repeat
waiting time is exponential! using 5 repeat times will be much longer than 4 making the docker container hang.
https://github.com/samoconnor/Retry.jl
https://github.com/seung-lab/spipe/blob/master/segm2omprj.jl#L48
sudo apt-get install libtcmalloc-minimal4
LD_LIBRARY_PATH=/usr/lib/libtcmalloc_minimal.so.4 omni --headless --cmdfile
currently, I need to manually launch processes for each GPU.
the coding-content of chunks in s3 should be gzip, otherwise neuroglancer will not decode it with gzip and can not be visualized.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.