natverse / nat.nblast Goto Github PK

R package implementing the NBLAST neuron search algorithm, as an add-on for the NeuroAnatomy Toolbox (nat) R package.

Home Page: http://natverse.github.io/nat.nblast

R 100.00%

nblast neuroanatomy-toolbox neurons neuroanatomy morphological-analysis

nat.nblast's Introduction

natverse

The natverse package is a wrapper for all of the commonly used NeuroAnatomy Toolbox packages. This is convenient both for package installation and for loading/attaching these packages without many calls to library().

See http://natverse.org for more details.

Installation

The recommended approach to install the full natverse is to use a helper package natmanager, which is available on CRAN. You can therefore do:

install.packages("natmanager")
natmanager::install("natverse")

See http://natverse.org/install for more details.

Use

Once installed, you can load the natverse package:

library(natverse)
#> Loading required package: elmr
#> Loading required package: catmaid
#> Loading required package: httr
#> Warning: package 'httr' was built under R version 3.6.2
#> Loading required package: nat
#> Loading required package: rgl
#> Warning: package 'rgl' was built under R version 3.6.2
#> Registered S3 method overwritten by 'nat':
#>   method             from
#>   as.mesh3d.ashape3d rgl
#> 
#> Attaching package: 'nat'
#> The following object is masked from 'package:rgl':
#> 
#>     wire3d
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
#> Loading required package: nat.flybrains
#> Loading required package: nat.templatebrains
#> Loading required package: nat.nblast
# example 3D plot of some neurons
plot(kcs20, col=type)

Installation Details

Conflicts and Dependencies

The natverse package contains many packages and it is possible that there could be conflicts where functions in the natverse have the same name as another package.

natverse_conflicts()
#> ── Conflicts ─────────────────────────────────────────────────────────────── natverse_conflicts() ──
#> x nat::intersect() masks base::intersect()
#> x nat::setdiff()   masks base::setdiff()
#> x nat::union()     masks base::union()
#> x nat::wire3d()    masks rgl::wire3d()

You can always choose the correct version by prepending the specific package name e.g. nat::flip() will select the nat version rather than any other.

Updates

Once installed, you check the status of all natverse packages and their dependencies like so:

natverse_update()
#> 
#> The following packages are either locally installed or information about them is missing!
#> 
#>   blob, formattable, import, mockr, nycflights13, pingr, fafbsegdata, reticulate, nat
#> 
#> Please install them manually from their appropriate source locations
#> 
#> The following natverse dependencies are out-of-date, see details below:
#> 
#> We recommend updating them by running:
#> natverse_update(update=TRUE)
#> 
#> package         remote         local          source   repo                        status 
#> --------------  -------------  -------------  -------  --------------------------  -------
#> bit64           0.9-7.1        0.9-7          CRAN     https://cran.rstudio.com/   x      
#> data.table      1.13.0         1.12.8         CRAN     https://cran.rstudio.com/   x      
#> elmr            deb0e27df...   7a2be4537...   GitHub   natverse                    x      
#> ff              2.2-14.2       2.2-14         CRAN     https://cran.rstudio.com/   x      
#> flycircuit      1b7b48e29...   cc4594f47...   GitHub   natverse                    x      
#> git2r           0.27.1         0.26.1         CRAN     https://cran.rstudio.com/   x      
#> insectbrainr    6331b4df6...   8fef94a05...   GitHub   natverse                    x      
#> mouselightr     9c2ce1c31...   8e26b7702...   GitHub   natverse                    x      
#> nat.flybrains   28ff33213...   36c622a15...   GitHub   natverse                    x      
#> nat.jrcbrains   85ed4a791...   44c95667e...   GitHub   natverse                    x      
#> neuprintr       7403d3ce2...   8ab03b744...   GitHub   natverse                    x      
#> RCurl           1.98-1.2       1.98-1.1       CRAN     https://cran.rstudio.com/   x      
#> tibble          b4eec19dd...   3f4e5dfae...   GitHub   tidyverse                   x      
#> tidyr           1.1.0          1.0.3          CRAN     https://cran.rstudio.com/   x      
#> xfun            0.16           0.15           CRAN     https://cran.rstudio.com/   x      
#> XML             NA             3.99-0.3       CRAN     https://cran.rstudio.com/   x

You can then update like so:

natverse_update(update = TRUE)

However, if you are in a hurry and want to save time from the questions use like below:

natverse_update(update=TRUE, upgrade = 'always')

If want to upgrade the natverse package itself:

remotes::update_packages('natverse')

nat.nblast's People

Contributors

Stargazers

Watchers

Forkers

fowlkes zhihaozheng elifesciences-publications dokato

nat.nblast's Issues

plot3d.hclust should enable recolouring of selected groups

For example if just 1 group is selected, should be allowed to rainbow colour them

Conversion to dotprops for nblast(neuron, dotprops) [or vice versa] is rather inefficient

As the conversion happens in WeightedNNBasedLinesetMatching(), this means that the objects are converted for every comparison, not just once at the start, making everything much slower than it needs to be.

nhclust and friends should use distfun

Currently ignored

Add auto progress bars for nblast

See natverse/nat#275

Teach NeuriteBlast to cope with neuronlist query arguments

To simplify all by all blast

accept scorematrix as first arg of nhclust

a lot of people expect to do this and the current message is not that clear in this circumstance

nblast / NeuriteBlast should not be error tolerant by default

These days it is much more likely that an error while searching is caused by the wrong invocation, resulting in thousands of error messages being sent to the console.

Make it optionally fault tolerant using e.g. nlapply

Add report generation

It would be useful to have functions that generate reports on NBLAST results, perhaps as a knitr document. For example, a histogram of scores and a 3D plot of top hits could be produced automagically, along with some clustering of those top hits.

clarify relationship of nblast with version=1 to Kohl 2013 code

and ideally give an example replicating a relevant figure from the paper. Note that Kohl 2013 immediately mentions a normalised score

CRAN release?

Is there anything blocking a CRAN release? I notice that there was recently a spam update – not sure if it fixed anything.

NeuriteBlast fails for FAFB neurons

Use appropriate credentials for FAFB login

library(catmaid)
source("../catmaid_fafb_login.R")
test_skids = c(21999,22132)
test_n=read.neurons.catmaid(test_skids, conn=conn)
r = NeuriteBlast(test_n[[1]], test_n[[2]])

The above script returns the following error:

Error in findDirectionVectorsFromParents(target, query, idxArray, ReturnAllIndices = TRUE, :
Some points do not have a parent: therefore impossible to calculate direction vector
In addition: Warning messages:
1: In .CleanupParentArray(d1[, "Parent"]) :
no points to choose in .CleanupParentArray using original value
2: In .CleanupParentArray(d2[, "Parent"]) :
no points to choose in .CleanupParentArray using original value

The error is from this function.

It seemswp=which(pa==p) will only work if the ids in d$Parent are row indices, which is not true for FAFB.

fix finding diagonals of bigmemory objects

if indices = NULL we get an error

sub_score_mat complains if it receives an ff matrix

is.matrix is false

[BUG] in scaling when neurons don't have names

Example:

> testneurons <- readRDS('testdata/testneurons.rds')
> names(testneurons) <- NULL
> scoresaba <- nblast_allbyall(testneurons,
+                              version=2,
+                              normalisation = 'normalised')
> scoresaba
<0 x 0 matrix>
attr(,"scaled:scale")
numeric(0)

nblast (or nblast_allbyall) fails with error "Cloud has no points"

Example here:
https://groups.google.com/forum/#!topic/nat-user/Oe40-OjRDd8
Paavo also had the same error recently. Happens when nabor::knn is called with matrix containing 0 rows. Maybe nat.nblast:::WeightedNNBasedLinesetMatching.default should return NA if either of the inputs have 0 rows.

Creating sparse matrices is rather slow

Creating a sparse matrix for 1,000 neurons from the 16,000-neuron full score matrix has been running for more than 90 minutes and still hasn't finished. This is with the full score matrix loaded into memory, so the slowness is not caused by disk access issues.

nblast fails to find smat.fcwb if nblast package is not attached.

If using nblast inside another function when importing but not attaching nat.nblast

from elmr package.

 get(smat) 
2 nat.nblast::nblast(xdp, db, normalised = normalised, .parallel = .parallel, 
    ...) at nblast_fafb.R#64
1 nblast_fafb(27884, mirror = FALSE)

workaround is to attach. Fix will involve something to do with scope of get statement (being pointed to objects in package)

Check out RcppEigen sparse matrices

spam seems a bit ...

but RcppEigen would require R class defs etc

Release v1.6.2 to CRAN?

optimise for memory for very large all by all NBLAST

Use a pattern of small (e.g. 100 x 100) blocks that might 10s of seconds / a few minutes to compute
this should work better than doing a whole row or column that might have 20-50k neurons.
need to implement an x by y nblast function instead of all by all NBLAST for each block (would current NBLAST be ok?)
inputs could be neuronlistfh and read in for each process. I suspect that read time will be trivial compared with search time so long as blocks take 10s of seconds to compute. This might work well for memory.
ideally we would parallelise across those blocks with progress
if doing mean scores, we might want to do forward and reverse scores at the same time since they use the same sets of neurons
we might wish to fill a sparse matrix with the results with a threshold

Include neuron list of superclusters and clusters from nblast online?

I was wondering if you had a dataset listing all the neurons belonging to each cluster/supercluster from online in your nblast R package, or whether this information is only accessible on the website. Thanks very much!

edit: Just found https://github.com/jefferislab/NBLAST_clusters_online --thanks!

Should nblast2 be renamed nblast?

Multi-core performance and memory consumption

Dear all,

We have found nblast really helpful to our current project, especially when doing nblast against the FlyEM database.

On my laptop (6 cores 12 threads), it takes about 4 min for a one against all NBlast when running on single core.

As I want to reduce the time, I used doParallel to define a multi-core backend and run NBlast with .parallel = TRUE. Interestingly, I could confirm that all my 12 cores were running with a 100% RAM consumption, and it ended up with more than 10 min for the same task.

Then I tried running NBlast on only two cores to avoid the high memory consumption, and it took 5 min for the task.

Take the longer time and high memory consumption into consideration, I am a little bit confused about how exactly nblast using .parallel. As I have a 4 processors 40 cores 80 threads CPU and 48 GB RAM, and my dps_flyEM is 2.32 GB, is it the best to run NBlast on only 16 cores rather than 80?

Best wishes,
Jiajun Zhang

Should nblast version parameter be character or integer?

Make documentation for nblast comprehensive

nblast() should be kept as simple as possible for novice users, with the more advanced arguments being dealt with in NeuriteBlast() and WeightedNNBasedLinesetMatching(). Links should be present to these more advanced methods in nblast()'s documentation.

plot3d.hclust fails to evaluate colour

This presently only works for me when nat.as is loaded and plot3d.character is aliased to plot3dfc

add plot3d.nblastres function and give nblast option to return per segment scores

In order to satisfy one of the reviewer comments we should add a new function / example that shows which points are being matched for a pair of neurons and colours one of the neurons by the quality of the match. One way to do this would be allow the nblast function to return per segment scores (perhaps wrapping them in an object with a class like nblastres. A corresponding plot3d method could then be used to make a plot with sensible defaults.

Alternatively, a lower tech version would be to include an example in the nblast docs.

Collecting per segment results could be done by playing with the NNDistFun argument (which gets passed down to WeightedNNBasedLinesetMatching.default.

plot3d.hclust fails when db is not set

correct docs for UseAlpha option of nblast

enable plyr

for parallelisation and progress

add fitting functions for nat.nblast

makeprobmat, scorematrix. Distances like this:

DL2nnlist=list()
for(n1 in DL2names){
    DL2nnlist[[n1]]=lapply(fcupndps[DL2names[DL2names!=n1]],
        WeightedNNBasedLinesetMatching.dotprops,fcupndps[[n1]],NNDistFun=list)
}

create_scoringmatrix should have a UseAlpha option

Since you need to have a different scoring matrix when using the alpha factor to scale the calculate dot products.

Zenodo DOI

Hi @jefferis et al. Could you please generate a Zenodo DOI for nat.nblast?

scoringmatrix() generic with methods for neuronlist and matrices

... and list of dot prods/distances

or should we have a db argument and just name the neurons?

In general I think this approach would at least simplify naming while still providing a convenient entry point.

fix imports for CRAN

We should submit a version to CRAN to accompany the paper. v 1.5 is coming up with errors on r-devel due to stricter namespace checking. These buglets still exsist.

https://www.r-project.org/nosvn/R.check/r-devel-osx-x86_64-clang/nat.nblast-00check.html

checking R code for possible problems ... NOTE
WeightedNNBasedLinesetDistFun : <anonymous>: no visible global function
  definition for ‘dnorm’
nhclust: no visible binding for global variable ‘as.dist’
nhclust: no visible global function definition for ‘hclust’
plot3d.hclust: no visible binding for global variable ‘rainbow’
show_similarity: no visible global function definition for
  ‘colorRampPalette’
sub_dist_mat: no visible global function definition for ‘as.dist’
Undefined global functions or variables:
  as.dist colorRampPalette dnorm hclust rainbow
Consider adding
  importFrom("grDevices", "colorRampPalette", "rainbow")
  importFrom("stats", "as.dist", "dnorm", "hclust")
to your NAMESPACE file.

refactor hclustfc and related functions out of flycircuit

We're going to want to make these accessible in other contexts

Have nblast work with Labels in given neuronlist/dotprops objects

Have a UseLabels option for nblast, which would work similarly to UseAlpha. Labels could be numeric (2) or character ("axon").

Some sample dotprops data from the hemibrain, two sets of cells types that should match to each other but not between, also with differing axon/dendrite locations: PD2a, PD2b1, AV1a1, LHCENT1, LHCENT2, LHCENT3:

Code used to fetch:

library(hemibrainr)
db = hemibrain_neurons()
chosen = subset(db, type%in%c("LHPD2a1","LHPD2b1_a","LHPD2b1_b","LHAV1a1","LHCENT1","LHCENT2","LHCENT3"))
dps = dotprops(chosen)
table(chosen[[1]]$d$Label)
table(dps[[1]]$labels)

Output:

> table(chosen[[1]]$d$Label)                                                           

    0     2     3     4     7 
10809  3186 41315   213   161 
> table(dps[[1]]$labels)

    0     2     3     4     7 
10809  3186 41315   213   161

Where:
0 - non-computing(no 'flow'), possible erroneuous
1- soma
2- axon
3 - dendrite
4 - linker
7 - cell body fibre

dps.rda.zip

newskel1 = read.neuron.swc("/usr/skel_1.swc")
newskel2 = read.neuron.swc("/usr/skel_2.swc")

and i transform the neuron to neuronlist

newskel = neuronlist(skel1, skel2)

and then nblast

scores = nblast(skel1, newskel)

but it gives me the following:

more than 1 point in .CleanupParentArray, choosing first from: 2 11more than 1 point in .CleanupParentArray, choosing first from: 13 15more than 1 point in .CleanupParentArray, choosing first from: 191 196more than 1 point in .CleanupParentArray, choosing first from: 202 208Warning messages:
1: In .CleanupParentArray(d1[, "Parent"]) : 
2: In .CleanupParentArray(d1[, "Parent"]) : 
3: In .CleanupParentArray(d1[, "Parent"]) : 
4: In .CleanupParentArray(d1[, "Parent"]) :

so, i just wondering is the neuronlist i created not correct? what should i do?

then i transform the neuron to neuronlist

newskel1 = as.neuronlist(newskel1)
newskel2 = as.neuronlist(newskel2)
newskel = neuronlist(skel1, skel2)
scores = nblast(skel1, newskel)

but it gives me the following:

Error in `[.data.frame`(df, i, j) : undefined columns selected

Teach nblast to accept all combinations of dotprops and neuron objects

Release nat.nblast 1.6.5

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
Update cran-comments.md
devtools::submit_cran()
Approve email

Wait for CRAN...