Code Monkey home page Code Monkey logo

Comments (11)

jefferis avatar jefferis commented on May 26, 2024

Dear Albert, thanks for your interest. Please use read.neurons to read all of your neurons in one go into a neuronlist object. Then use dotprops to convert into the preferred format for NBLAST. See e.g. https://groups.google.com/forum/#!topic/nat-user/_4d1XCtbM90 for an example. Best, Greg.

from nat.nblast.

albertplus007 avatar albertplus007 commented on May 26, 2024

thanks, i have an another question, the function we use is dotprops(skel, resample = 1, k = 5) the resample = 1, but the resolution of my data is 4x4x40nm, should i also use the resample = 1?

then i use resample = 100 to create the preferred format, cause the resample = 1 will use lots of memory.

Question 2:
i random select a neuron for nblast in my own data, then scores = nblast(skel_dps, skel_all_dps)

but the scores are very weird.
image
except the query neurons (the first line) the others scores are totally negative, what happened to this?
then i change resample = 10, the same thing happened again, except the query neuron, the others are all negative.

from nat.nblast.

jefferis avatar jefferis commented on May 26, 2024

The solution to both of these issues is that you need to concert your neurons from units of nl to microns before calculating the “dotprops” objects. You can just divide The neuronlist generated by read.neurons by 1000.

from nat.nblast.

albertplus007 avatar albertplus007 commented on May 26, 2024

thanks a lot, i have another question, all the flycircuits neuron data were in dps, which is a large neuronlistfh, the i change the neuronlistfh to neuronlist named dps_neuronlist, but when i run the following code to cluster all neurons, it costs a lots of time, and it does not finished yet
scores = nblast_allbyall(dps_neuronlist)
the neuronlistfh was download buy the paper NBLAST's code

i don't know why it costs such a long time, does it not suitable for large data (there are 16129 neurons in the neuronlist)

from nat.nblast.

jefferis avatar jefferis commented on May 26, 2024

See bottom of this page (linked from the nblast website)

https://gist.github.com/jefferis/bbaf5d53353b3944c090

It is recommended to use remotesync to download all the flycircuit neurons.

from nat.nblast.

albertplus007 avatar albertplus007 commented on May 26, 2024

Thanks a lot.
I used it, and all flycircuit neurons are in my local memory, and i chage the neuronlistfh to neuronlist
but a run scores = nblast_allbyall(dps_neuronlist), it still costs a lots of time, and it does not finished yet, i don't know what caused it?

I just want to cluster all flycircuit neurons, it seems to have a big impact on big data

from nat.nblast.

jefferis avatar jefferis commented on May 26, 2024

If you want to do this:

scores = nblast_allbyall(dps_neuronlist)

Although NBLAST is fast by the standards of neuron comparison tools, typically ~2ms per comparison for the flycircuit dataset, this will indeed take a lot of time. There are 16129 neurons in dps_neuronlist. That means that there are 16129^2=260,144,641 comparisons to make. At 2 ms per comparison that would be 16129^2*2/1000/3600/24 = 6 days.

In fact you can make an estimate on your own machine like this by choosing a sample of 1% of the neurons:

set.seed(42)
dpsample=sample(dps, round(length(dps)/100))
system.time(nblast_allbyall(dpsample, .progress='text'))

For me on my laptop this took 70s. The whole dataset will take 100^2 times longer i.e. 69.952*100^2/3600/24=8.09 days.

It is possible to speed this up by running comparisons in parallel. You can do this as follows:

# 4 parallel jobs
doMC::registerDoMC(cores = 4)
system.time(nblast_allbyall(dpsample, .parallel=T))

Note that it is worth testing different numbers of cores as the speedup will not be linear across the full range of cores. It will also depend on how long the whole jobs take as there is a fixed cost associated with starting each parallel job. Be careful also as using more cores will use more memory and eventually may exhaust the memory on your machine and make everything much slower. For me, 4 cores did give a 4x speedup so therefore 2d for the whole dataset. Note that it was a bit quicker in the command line version of R than inside Rstudio.

from nat.nblast.

albertplus007 avatar albertplus007 commented on May 26, 2024

Thanks a lot, i got it.

from nat.nblast.

albertplus007 avatar albertplus007 commented on May 26, 2024

I still have a question, when we got the scores of the flycircuit dataset, then we want to cluster them, but the function nhclust can only process 4000 neurons, so, should i just seletct 4000 neurons, for each cluster? such as hcc = nhclust(files[1:4000], scoremat = scores). does it the same as cluster all them? sucha as hcc = nhclust(scoremat = scores)(if there is no limit for maxneurons)?

from nat.nblast.

jefferis avatar jefferis commented on May 26, 2024

You can increase the maximum number (it was left as a sort of safety threshold). See the maxneurons argument of http://natverse.org/nat.nblast/reference/nhclust.html.

from nat.nblast.

albertplus007 avatar albertplus007 commented on May 26, 2024

Thanks a lot.
Sorry to disturb you, I asked a stupid question and solved it after I submitted the issue

from nat.nblast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.