The nat.nblast's discuss from natverse

sub_score_mat complains if it receives an ff matrix

is.matrix is false

clarify relationship of nblast with version=1 to Kohl 2013 code

and ideally give an example replicating a relevant figure from the paper. Note that Kohl 2013 immediately mentions a normalised score

Zenodo DOI

Hi @jefferis et al. Could you please generate a Zenodo DOI for nat.nblast?

Make documentation for nblast comprehensive

nblast() should be kept as simple as possible for novice users, with the more advanced arguments being dealt with in NeuriteBlast() and WeightedNNBasedLinesetMatching(). Links should be present to these more advanced methods in nblast()'s documentation.

plot3d.hclust should enable recolouring of selected groups

For example if just 1 group is selected, should be allowed to rainbow colour them

Should nblast version parameter be character or integer?

accept scorematrix as first arg of nhclust

a lot of people expect to do this and the current message is not that clear in this circumstance

Teach NeuriteBlast to cope with neuronlist query arguments

To simplify all by all blast

nblast (or nblast_allbyall) fails with error "Cloud has no points"

Example here:
https://groups.google.com/forum/#!topic/nat-user/Oe40-OjRDd8
Paavo also had the same error recently. Happens when nabor::knn is called with matrix containing 0 rows. Maybe nat.nblast:::WeightedNNBasedLinesetMatching.default should return NA if either of the inputs have 0 rows.

Check out RcppEigen sparse matrices

spam seems a bit ...

but RcppEigen would require R class defs etc

Have nblast work with Labels in given neuronlist/dotprops objects

Have a UseLabels option for nblast, which would work similarly to UseAlpha. Labels could be numeric (2) or character ("axon").

Some sample dotprops data from the hemibrain, two sets of cells types that should match to each other but not between, also with differing axon/dendrite locations: PD2a, PD2b1, AV1a1, LHCENT1, LHCENT2, LHCENT3:

Code used to fetch:

library(hemibrainr)
db = hemibrain_neurons()
chosen = subset(db, type%in%c("LHPD2a1","LHPD2b1_a","LHPD2b1_b","LHAV1a1","LHCENT1","LHCENT2","LHCENT3"))
dps = dotprops(chosen)
table(chosen[[1]]$d$Label)
table(dps[[1]]$labels)

Output:

> table(chosen[[1]]$d$Label)                                                           

    0     2     3     4     7 
10809  3186 41315   213   161 
> table(dps[[1]]$labels)

    0     2     3     4     7 
10809  3186 41315   213   161

Where:
0 - non-computing(no 'flow'), possible erroneuous
1- soma
2- axon
3 - dendrite
4 - linker
7 - cell body fibre

dps.rda.zip

Teach nblast to accept all combinations of dotprops and neuron objects

Include neuron list of superclusters and clusters from nblast online?

I was wondering if you had a dataset listing all the neurons belonging to each cluster/supercluster from online in your nblast R package, or whether this information is only accessible on the website. Thanks very much!

edit: Just found https://github.com/jefferislab/NBLAST_clusters_online --thanks!

Release nat.nblast 1.6.5

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
Update cran-comments.md
devtools::submit_cran()
Approve email

Wait for CRAN...

nblast for swc files

Hi, nice library

recently, i use the nblast to search the similar neurons which were produced by myself, and the neuron type was save as .swc, i read the neuron as the following code:

newskel1 = read.neuron.swc("/usr/skel_1.swc")
newskel2 = read.neuron.swc("/usr/skel_2.swc")

and i transform the neuron to neuronlist

newskel = neuronlist(skel1, skel2)

and then nblast

scores = nblast(skel1, newskel)

but it gives me the following:

more than 1 point in .CleanupParentArray, choosing first from: 2 11more than 1 point in .CleanupParentArray, choosing first from: 13 15more than 1 point in .CleanupParentArray, choosing first from: 191 196more than 1 point in .CleanupParentArray, choosing first from: 202 208Warning messages:
1: In .CleanupParentArray(d1[, "Parent"]) : 
2: In .CleanupParentArray(d1[, "Parent"]) : 
3: In .CleanupParentArray(d1[, "Parent"]) : 
4: In .CleanupParentArray(d1[, "Parent"]) :

so, i just wondering is the neuronlist i created not correct? what should i do?

then i transform the neuron to neuronlist

newskel1 = as.neuronlist(newskel1)
newskel2 = as.neuronlist(newskel2)
newskel = neuronlist(skel1, skel2)
scores = nblast(skel1, newskel)

but it gives me the following:

Error in `[.data.frame`(df, i, j) : undefined columns selected

Conversion to dotprops for nblast(neuron, dotprops) [or vice versa] is rather inefficient

As the conversion happens in WeightedNNBasedLinesetMatching(), this means that the objects are converted for every comparison, not just once at the start, making everything much slower than it needs to be.

CRAN release?

Is there anything blocking a CRAN release? I notice that there was recently a spam update – not sure if it fixed anything.

plot3d.hclust fails to evaluate colour

This presently only works for me when nat.as is loaded and plot3d.character is aliased to plot3dfc

Add auto progress bars for nblast

See natverse/nat#275

create_scoringmatrix should have a UseAlpha option

Since you need to have a different scoring matrix when using the alpha factor to scale the calculate dot products.

add plot3d.nblastres function and give nblast option to return per segment scores

In order to satisfy one of the reviewer comments we should add a new function / example that shows which points are being matched for a pair of neurons and colours one of the neurons by the quality of the match. One way to do this would be allow the nblast function to return per segment scores (perhaps wrapping them in an object with a class like nblastres. A corresponding plot3d method could then be used to make a plot with sensible defaults.

Alternatively, a lower tech version would be to include an example in the nblast docs.

Collecting per segment results could be done by playing with the NNDistFun argument (which gets passed down to WeightedNNBasedLinesetMatching.default.

correct docs for UseAlpha option of nblast

add fitting functions for nat.nblast

makeprobmat, scorematrix. Distances like this:

DL2nnlist=list()
for(n1 in DL2names){
    DL2nnlist[[n1]]=lapply(fcupndps[DL2names[DL2names!=n1]],
        WeightedNNBasedLinesetMatching.dotprops,fcupndps[[n1]],NNDistFun=list)
}

Add nblast2

Currently, nblast() is equivalent to nblast1, which used the negative exponential weighting with sigma based on registration error.

refactor hclustfc and related functions out of flycircuit

We're going to want to make these accessible in other contexts

Creating sparse matrices is rather slow

Creating a sparse matrix for 1,000 neurons from the 16,000-neuron full score matrix has been running for more than 90 minutes and still hasn't finished. This is with the full score matrix loaded into memory, so the slowness is not caused by disk access issues.

[BUG] in scaling when neurons don't have names

Example:

> testneurons <- readRDS('testdata/testneurons.rds')
> names(testneurons) <- NULL
> scoresaba <- nblast_allbyall(testneurons,
+                              version=2,
+                              normalisation = 'normalised')
> scoresaba
<0 x 0 matrix>
attr(,"scaled:scale")
numeric(0)

scoringmatrix() generic with methods for neuronlist and matrices

... and list of dot prods/distances

or should we have a db argument and just name the neurons?

In general I think this approach would at least simplify naming while still providing a convenient entry point.

Multi-core performance and memory consumption

Dear all,

We have found nblast really helpful to our current project, especially when doing nblast against the FlyEM database.

On my laptop (6 cores 12 threads), it takes about 4 min for a one against all NBlast when running on single core.

As I want to reduce the time, I used doParallel to define a multi-core backend and run NBlast with .parallel = TRUE. Interestingly, I could confirm that all my 12 cores were running with a 100% RAM consumption, and it ended up with more than 10 min for the same task.

Then I tried running NBlast on only two cores to avoid the high memory consumption, and it took 5 min for the task.

Take the longer time and high memory consumption into consideration, I am a little bit confused about how exactly nblast using .parallel. As I have a 4 processors 40 cores 80 threads CPU and 48 GB RAM, and my dps_flyEM is 2.32 GB, is it the best to run NBlast on only 16 cores rather than 80?

Best wishes,
Jiajun Zhang

fix finding diagonals of bigmemory objects

if indices = NULL we get an error

nhclust and friends should use distfun

Currently ignored

Add report generation

It would be useful to have functions that generate reports on NBLAST results, perhaps as a knitr document. For example, a histogram of scores and a 3D plot of top hits could be produced automagically, along with some clustering of those top hits.

Allow nblast to use basic topological information about neurons

@dokato a placeholder

plot3d.hclust fails when db is not set

NeuriteBlast fails for FAFB neurons

Use appropriate credentials for FAFB login

library(catmaid)
source("../catmaid_fafb_login.R")
test_skids = c(21999,22132)
test_n=read.neurons.catmaid(test_skids, conn=conn)
r = NeuriteBlast(test_n[[1]], test_n[[2]])

The above script returns the following error:

Error in findDirectionVectorsFromParents(target, query, idxArray, ReturnAllIndices = TRUE, :
Some points do not have a parent: therefore impossible to calculate direction vector
In addition: Warning messages:
1: In .CleanupParentArray(d1[, "Parent"]) :
no points to choose in .CleanupParentArray using original value
2: In .CleanupParentArray(d2[, "Parent"]) :
no points to choose in .CleanupParentArray using original value

The error is from this function.

It seemswp=which(pa==p) will only work if the ids in d$Parent are row indices, which is not true for FAFB.

enable plyr

for parallelisation and progress

nblast / NeuriteBlast should not be error tolerant by default

These days it is much more likely that an error while searching is caused by the wrong invocation, resulting in thousands of error messages being sent to the console.

Make it optionally fault tolerant using e.g. nlapply

nblast fails to find smat.fcwb if nblast package is not attached.

If using nblast inside another function when importing but not attaching nat.nblast

from elmr package.

 get(smat) 
2 nat.nblast::nblast(xdp, db, normalised = normalised, .parallel = .parallel, 
    ...) at nblast_fafb.R#64
1 nblast_fafb(27884, mirror = FALSE)

workaround is to attach. Fix will involve something to do with scope of get statement (being pointed to objects in package)

Should nblast2 be renamed nblast?

fix imports for CRAN

We should submit a version to CRAN to accompany the paper. v 1.5 is coming up with errors on r-devel due to stricter namespace checking. These buglets still exsist.

https://www.r-project.org/nosvn/R.check/r-devel-osx-x86_64-clang/nat.nblast-00check.html

checking R code for possible problems ... NOTE
WeightedNNBasedLinesetDistFun : <anonymous>: no visible global function
  definition for ‘dnorm’
nhclust: no visible binding for global variable ‘as.dist’
nhclust: no visible global function definition for ‘hclust’
plot3d.hclust: no visible binding for global variable ‘rainbow’
show_similarity: no visible global function definition for
  ‘colorRampPalette’
sub_dist_mat: no visible global function definition for ‘as.dist’
Undefined global functions or variables:
  as.dist colorRampPalette dnorm hclust rainbow
Consider adding
  importFrom("grDevices", "colorRampPalette", "rainbow")
  importFrom("stats", "as.dist", "dnorm", "hclust")
to your NAMESPACE file.

Release v1.6.2 to CRAN?

optimise for memory for very large all by all NBLAST

Use a pattern of small (e.g. 100 x 100) blocks that might 10s of seconds / a few minutes to compute
this should work better than doing a whole row or column that might have 20-50k neurons.
need to implement an x by y nblast function instead of all by all NBLAST for each block (would current NBLAST be ok?)
inputs could be neuronlistfh and read in for each process. I suspect that read time will be trivial compared with search time so long as blocks take 10s of seconds to compute. This might work well for memory.
ideally we would parallelise across those blocks with progress
if doing mean scores, we might want to do forward and reverse scores at the same time since they use the same sets of neurons
we might wish to fill a sparse matrix with the results with a threshold

natverse / nat.nblast Goto Github PK

nat.nblast's Issues

Recommend Projects

Recommend Topics

Recommend Org