Comments (9)
Thanks for another interesting data experiment. I'm not able to reproduce this. Could you please provide the random seed. It may be that it was just an unlucky initialization. I acknowledge the current random initialization is not ideal, but I haven't been able to get any other initialization tried so far to produce better average case performance in terms of convergence time, numerical stability etc. It may also be helpful to look at the deviance trace plot as a diagnostic for failed convergence (plot(fit$dev,type="l")
) where fit is the glmpca object in the metadata of the SingleCellExperiment. By the way, when I ran this with L=2 it took 22 sec on my laptop (it took 24 sec with L=20) and this is what the factors look like:
from scry.
Interesting. I am consistently getting that error above. If it helps, I'll set the seed, let's say to 50:
library(BiocFileCache)
bfc <- BiocFileCache(ask=FALSE)
qcdata <- bfcrpath(bfc, "https://github.com/LuyiTian/CellBench_data/blob/master/data/mRNAmix_qc.RData?raw=true")
env <- new.env()
load(qcdata, envir=env)
sce.8qc <- env$sce8_qc
sce.8qc$mix <- factor(sce.8qc$mix)
# Library size normalization and log-transformation.
library(scuttle)
library(scran)
set.seed(50)
sce.8qc <- logNormCounts(sce.8qc)
dec.8qc <- modelGeneVar(sce.8qc)
hvgs.8qc <- getTopHVGs(dec.8qc, n=1000)
library(scry)
sce.8qc2 <- GLMPCA(sce.8qc[hvgs.8qc,], fam="nb",
L=20, sz=sizeFactors(sce.8qc), verbose=TRUE)
## Control parameter 'penalty' should be provided as an element of 'ctl' rather than a separate argument.
## Control parameter 'verbose' should be provided as an element of 'ctl' rather than a separate argument.
## Control parameter 'eps' is deprecated. Coercing to equivalent 'tol'. Please use 'tol' in the future as 'eps' will eventually be removed
## Error: Poor model fit (final deviance higher than initial deviance). Try modifying control parameters to improve optimization.
With verbose=TRUE
, it rambles on for a while:
Trying AvaGrad with learning rate: 5.12e-08
Iteration: 1 | deviance=2.01e+05 | nb_theta: 1
Iteration: 2 | deviance=2.486e+05 | nb_theta: 1.54
Iteration: 3 | deviance=2.805e+05 | nb_theta: 2
Iteration: 4 | deviance=3.018e+05 | nb_theta: 2.37
Iteration: 5 | deviance=3.163e+05 | nb_theta: 2.65
Iteration: 6 | deviance=3.262e+05 | nb_theta: 2.85
Iteration: 7 | deviance=3.33e+05 | nb_theta: 3.01
Iteration: 8 | deviance=3.378e+05 | nb_theta: 3.12
Iteration: 9 | deviance=3.411e+05 | nb_theta: 3.2
Iteration: 10 | deviance=3.435e+05 | nb_theta: 3.25
Iteration: 11 | deviance=3.451e+05 | nb_theta: 3.29
Iteration: 12 | deviance=3.463e+05 | nb_theta: 3.32
Iteration: 13 | deviance=3.471e+05 | nb_theta: 3.34
Iteration: 14 | deviance=3.477e+05 | nb_theta: 3.36
Iteration: 15 | deviance=3.481e+05 | nb_theta: 3.37
Iteration: 16 | deviance=3.484e+05 | nb_theta: 3.38
Iteration: 17 | deviance=3.486e+05 | nb_theta: 3.38
Iteration: 18 | deviance=3.487e+05 | nb_theta: 3.38
Iteration: 19 | deviance=3.488e+05 | nb_theta: 3.39
Iteration: 20 | deviance=3.489e+05 | nb_theta: 3.39
Iteration: 21 | deviance=3.49e+05 | nb_theta: 3.39
Iteration: 22 | deviance=3.49e+05 | nb_theta: 3.39
Iteration: 23 | deviance=3.49e+05 | nb_theta: 3.39
Iteration: 24 | deviance=3.49e+05 | nb_theta: 3.39
Iteration: 25 | deviance=3.491e+05 | nb_theta: 3.39
Iteration: 26 | deviance=3.491e+05 | nb_theta: 3.39
Iteration: 27 | deviance=3.491e+05 | nb_theta: 3.39
Iteration: 28 | deviance=3.491e+05 | nb_theta: 3.39
Iteration: 29 | deviance=3.491e+05 | nb_theta: 3.39
Iteration: 30 | deviance=3.491e+05 | nb_theta: 3.39
Iteration: 31 | deviance=3.491e+05 | nb_theta: 3.39
Error: Poor model fit (final deviance higher than initial deviance). Try modifying control parameters to improve optimization.
It's also kind of bemusing that the deviance is steadily increasing... I would have expected it to decrease.
from scry.
I haven't forgotten about this, sorry for the delay. One clarification- do you install all the packages from bioc development or from their respective github repositories?
from scry.
BioC-devel. Of course, I could use the GitHub repos locally, but that just kicks the can down the road as the OSCA book will only build using packages from the official BioC-devel channel.
from scry.
OK I finally was able to reproduce this bug. The bug occurs for a variety of random seeds, what matters is that scry must be installed from Bioconductor development branch. When I installed scry from github, the bug does not occur for any random seed. Also, it doesn't occur if I run if I call glmpca directly from the cran package without the scry wrapper. For example, replace the last two lines in Aaron's original code with
library(glmpca) # from CRAN
m<-assay(sce.8qc[hvgs.8qc, ], "counts")
fit<-glmpca(m, L=20, fam="nb", sz=sizeFactors(sce.8qc), verbose=TRUE)
What seems to be happening is the convergence happens at about 3.5e5 deviance for all initializations, but the Bioc development version weirdly produces an initialization having a lower deviance (2.0e5). Both the scry github version and the glmpca CRAN version initialize with deviance about 7.3e5. Another wrinkle is with scry github and glmpca CRAN, the nb_theta is initialized to 100, whereas with the bioc devel version it initializes to 1.
from scry.
By the way it is possible for the deviance to increase slightly sometimes at the end of the optimization due to the momentum aspect of the optimizer. Also, the deviance is only defined conditional on a particular nb_theta value so it's possible to have a lower deviance but a worse set of latent factors. This further underscores the need to improve the estimation of nb_theta in glmpca.
from scry.
OK I have now reproduced this bug in all implementations by setting nb_theta=1. @LTLA as a quick fix please try re-running your example but set nb_theta=100. It's still a bug because it shouldn't be defaulting to such a low nb_theta initialization.
from scry.
confirmed by checking source code that bioc-devel branch has an old implementation (bioc-release is actually further along). This old default has nb_theta=1. The new version has nb_theta=100. Once we update the bioc-devel the problem should go away.
from scry.
Yes, this fixes the problem. Finishes faster too.
Bit surprised that BioC-release is further along than BioC-devel. Probably preaching to the choir here, but I'd be pretty scared to push stuff to BioC-release without it having run the BBS gauntlet in BioC-devel.
from scry.
Related Issues (20)
- Store GLM-PCA factors as LinearEmbeddingMatrix in SingleCellExperiment HOT 2
- Update the URL field on the bioc package page HOT 1
- Assay should default to `counts` in `SummarizedExperiment` objects HOT 1
- Null residuals with HDF5/DelayedArray/Matrix matrices HOT 4
- Experiences from trying to get GLMPCA to run with the book HOT 11
- cache sizeFactors in SingleCellExperiment slot HOT 2
- Benchmark scry using in-memory vs `DelayedArray` objects
- Deviance feature selection- increase speed HOT 3
- Including covariates with nullResiduals() HOT 4
- Negative binomial null residuals
- more than one "batch" HOT 2
- Stability issues and warnings in 10x Genomics Visium data HOT 4
- `NaN` gene deviance scores HOT 1
- Build error from Bioconductor HOT 2
- Support for large matrices HOT 2
- Cleaning up old branches HOT 7
- bigdata vignette not compiling HOT 2
- Merge celltype branch to master? HOT 2
- No genes names when computing devianceFeatureSelection() with batch HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scry.