jefworks-lab / stdeconvolve Goto Github PK
View Code? Open in Web Editor NEWReference-free cell-type deconvolution of multi-cellular spatially resolved transcriptomics data
Home Page: http://jef.works/STdeconvolve/
Reference-free cell-type deconvolution of multi-cellular spatially resolved transcriptomics data
Home Page: http://jef.works/STdeconvolve/
Ideas for additional tutorials that may be helpful for users:
What is the best way to choose the optimal number of K's or cell-types for a dataset?
Is it just by observing the plot from the below code? How does one know what to set the upper limit to? i.e. in the below example, there could be more than 9 cell types present.
ldas <- fitLDA(t(as.matrix(cd)), Ks = seq(2, 9, by = 1))
Also, my R session often crashes when running the above code.
Thanks for the tool, I am truly impressed by the remarkable findings presented in your work.
I have been working on implementing your simulation for MPOA. However, when attempting to compare the deconvolved cell-types with the ground truth, I've encountered a challenge. The heatmap generated does not exhibit a clearly defined diagonal block, making it difficult to match the deconvolved cell-types with the ground truth cell-types. In this context, I am seeking guidance on the process of matching the deconvolved cell-types and calculating the Root Mean Square Error (RMSE) concerning the ground truth for STdeconvolve.
Furthermore, I have been working through the GSEA tutorial to assign deconvolved cell types to their corresponding ground truth cell types. However, I have encountered challenges, as there are instances where the process does not yield the desired results.
Even when following the tutorial's provided examples, it remains a complex task to effectively match the deconvolved cell types to a specific annotation. I am seeking guidance on how to achieve a more reliable and accurate matching process.
Any insights or suggestions you can provide regarding this issue would be immensely valuable to me.
Thank you once again for your time and assistance.
Hi,
I have tried to run my data using your visium tutorial. I keep getting this error in the initial step. This happens with both your example data and my data. Could you please tell me how to sort this out.
f <- "visiumTutorial/"
if(!file.exists(f)){
dir.create(f)
}
se <- SpatialExperiment::read10xVisium(samples = f,
type = "sparse",
data = "filtered")
Error in names(xyz) <- names(sfs) <- sids :
'names' attribute [1] must be the same length as the vector [0]
Dear Brendan,
I am using stdeconvolve with an 8-sample visium integrated seurat object. All samples were individually normalized with seurat's sctransform algorithm before anchor-based integration (also seurat toolkit).
The data output from sctransform (and consequently also after integration) can be negative and is of data format double. Hence, it cannot be used with stdeconvolve.
I have two questions:
Thank you so much in advance for your time!
Best,
Christoph
Hi @bmill3r , thanks for your excellent work. I saw that the MERFISH MPOB data are split as several different slices by the column called 'Bregma' and there are multimuple values which confused me. They are "-0.04" "-0.09" "-0.14" "-0.19" "-0.24" "-0.29" "0.01" "0.06" "0.11" "0.16" "0.21" "0.26". Could you help me sort these so that I could know the correct order from posterior to anterior? Thanks again for your time.
Hi, @bmill3r
Sorry to trouble you. I'm new to analyzing spatial transcriptomes. I am trying to do it with your tools to deconvolve. I have two questions.First, I see the tutorial in “https://github.com/JEFworks-Lab/STdeconvolve/blob/devel/docs/visium_10x.md”, but I don't quite understand how you annotate spots .After I finished running the following code, I got a graph that splits spots into several topics.Are you annotating the genes that are highly expressed in each topic?How do you present the annotated results of each cell? Which object to add to, can you give an example?
optLDA <- optimalModel(models = ldas, opt = 15)
results <- getBetaTheta(optLDA, perc.filt = 0.05, betaScale = 1000)
deconProp <- results$theta
deconGexp <- results$beta
plt <- vizAllTopics(theta = deconProp,
pos = pos,
r = 45,
lwd = 0,
showLegend = TRUE,
plotTitle = NA) +
ggplot2::guides(fill=ggplot2::guide_legend(ncol=2)) +
ggplot2::geom_rect(data = data.frame(pos),
ggplot2::aes(xmin = min(x)-90, xmax = max(x)+90,
ymin = min(y)-90, ymax = max(y)+90),
fill = NA, color = "black", linetype = "solid", size = 0.5) +
ggplot2::theme(
plot.background = ggplot2::element_blank()
) +
ggplot2::guides(colour = "none")
Second,in the tutorial "https://github.com/JEFworks-Lab/STdeconvolve/blob/devel/docs/visium_10x.md#compare-to-transcriptional-clustering" ,I don't quite understand the relationship between the topics in the diagram above and the cluster in this tutorial.What if the number of topics and clusters doesn't match?Can I determine which topic belongs to which cluster based on best match (highest correlation) of clusters (coms) and topics (decon)?
Hello STdeconvolve devs,
I am looking for a good tool to use in my analysis of a large visium dataset, consisting of brain tissue from 2 different conditions. Ultimately, I am interested in comparing gene expression for a given cell type across these conditions.
I think STdeconvolve would be the perfect tool for this assuming it produces cell type specific counts that can be plugged into DE testing algorithms. Is this a feature of STdeconvolve, and if so, do you have any advice for how I can extract said counts and normalize them for DE testing? I have noticed it implemented in certain ref-based deconvolution algorithms, but I am specifically interested in the reference-free approach of this tool. Any advice would be appreciated!
-Dillon
Hi @bmill3r ,
Thank you for this amazing tool!
I have run the workflow you have suggested and seems like the topics are closely related to the clusters identified by the Seurat workflow (at least for my Visium data). To double check this, it would be great if I can add a new column to the the meta.data
of my Seurat object with the topic that each cell barcode belongs to, or better yet, both the topic and the final cell type annotation for that topic (including NAs) as two new columns to the [email protected]
.
Is there a wrapper somewhere that I missed to get this information summarises from the theta
or beta
matrices?
Many thanks,
Shani.
I am following the Visium tutorial and the plots that are being generated with my Visium data are being flipped around (mirror-image) for each plot. When I read them into the Spatial Experiment object, they are orientated the correct way. How can I correct this? I am following the exact code in the visium tutorial
Dear Brendan,
Sorry to bother you.
I wanted to generate the gene counts and metadata tables of individual cells profiled by MERFISH from the Moffit et al. 2018 raw data. And I followed the code at https://jef.works/STdeconvolve/process_mpoa_data.html.
but when I generate hash tables of simulated spots for each bregma separately by:
FN7_hash <- simulateBregmaSpots(annot.table_,
counts = counts_, # gene counts matrix
patch_size = 100) # size of the simulated pixels in um2 (units of the Centroids)
I get the following error:
Error in simulateBregmaSpots(annot.table_, counts = counts_, patch_size = 100) :
could not find function "simulateBregmaSpots"
I dont't konw how to solve this problem, could you please give me some help?
Thank you so much in advance for your time!
Best,
Yi Liu
Dear all,
I am writing regarding a way of plotting results when having 2 replicates of the same condition. I am comparing healthy condition (n=2) with pathological condition (n=2), meaning 4 brain slices in total. I have run them individually first. I have used a K=9 (identifying 9 topics each). The most upregulated gene gives a name to each Topic and generally, due to the high variability in sample prep, it is hard to identify the same Topic gene label. Interestingly, one topic appeared to be present in both replicates of my pathological condition and not in my healthy one. Now I would like to plot this result but I don't have any good idea to highlight this result. Have you thought how to improve such thing when having biological replicates?
Thank you so much in advance,
Best,
it is a great tools!If i want to use my own data, i would like to know how to make the input annotation? I have seen the issue on how to annotation celltype , but i can't understand the input format of annotation and the finnal output. Actually, in my opinion, the final result should be a dataframe, for example, containing the celltype(col), cell ID(row) and the proportion. Thanks.
Hi,
Thank you very much for this amazing tool!
After running STdeconvolve on human femoral & carotid plaque Visium data, I'm thinking of doing cell-type annotation with Azimuth. Do you have any suggestions on how to implement Azimuth on STdeconvolve outputs?
Thanks,
Daniel Lee
Hello Stdeconvolve developers,
I am interested in using Stdeconvolve to better understand a large spatial dataset I'm working with, and am most excited about its ability to generate cell type specific transcriptomes. However, when I tried it out, I had some difficulty isolating a cell type that interests me most. Would it be possible to tailor the analysis pipeline to focus in on a cell type I care most about, using known markers for that given cell type? I would be grateful for any help with this.
-Dillon
Hello,
I am using Spatial 10X Visium data and I have been able to load my data most of the time, except when your example is set with the mOB or data(mOB) dataset. It works great and is very useful. Is there a way to understand what was done to obtain the mOB included in STdeconvolve or any other repositories as if I was processing it from a Spatial 10X Visium SpaceRanger output? Thank you!
Best regards,
Hi Brendan,
I hope this inquiry finds you well, and thank you very much for kindly answering my questions. :)
I'm also trying to validate my cell type deconvolution results by integrating 3 Visium data before running the deconvolution function. However, when I followed the dataset integration instructions you've suggested (https://github.com/JEFworks-Lab/STdeconvolve/blob/devel/docs/process_bcl_data.md), ran the deconvolution functions (fitLDA, optimalModel, and getBetaTheta), and tried to visualize the deconvolution scene for each sample, I could not recall any spots. Here's the error message I've got: "Plotting scatterpies for 0 pixels with 20 cell-types...this could take a while if the dataset is large". I also looked into your dialogue with AteeqMKhaliq about this (#21) but I still couldn't make anything other than an empty plot. Could you give any pointers on how I may be able to fix this?
Here's the entire code I ran:
###########################################
#1 Data integration steps
se1 <- SpatialExperiment::read10xVisium("/Users/.../Matrix/PIC83", "PIC83", type = "sparse", data = "filtered")
se2 <- SpatialExperiment::read10xVisium("/Users/.../Matrix/PIC84", "PIC84", type = "sparse", data = "filtered")
se3 <- SpatialExperiment::read10xVisium("/Users/.../Matrix/PIC85-1", "PIC85-1", type = "sparse", data = "filtered")
all_counts <- cbind(as.matrix(counts(se1)), as.matrix(counts(se2)), as.matrix(counts(se3)))
all_pos <- rbind(spatialCoords(se1), spatialCoords(se2), spatialCoords(se3))
se1_slice <- rep(1, nrow(spatialCoords(se1)))
se2_slice <- rep(2, nrow(spatialCoords(se2)))
se3_slice <- rep(3, nrow(spatialCoords(se3)))
all_slice <- c(se1_slice, se2_slice, se3_slice)
names(all_slice) <- rownames(all_pos)
allClean <- cleanCounts(counts = all_counts,
min.reads = 10,
min.lib.size = 10,
verbose = TRUE)
all_paths <- list()
all_paths[[1]] <- as.matrix(t(all_counts[, names(all_slice[all_slice == 1])]))
all_paths[[2]] <- as.matrix(t(all_counts[, names(all_slice[all_slice == 2])]))
all_paths[[3]] <- as.matrix(t(all_counts[, names(all_slice[all_slice == 3])]))
all_genes <- lapply(all_paths, function(p){
print(dim(p))
genes <- colnames(p)
print(length(genes))
genes
})
all_genes <- Reduce(intersect, all_genes)
length(all_genes)
all_ODgenes <- lapply(all_paths, function(p) {
dat <- preprocess(p,
extractPos = FALSE,
selected.genes = all_genes,
nTopGenes = 5,
genes.to.remove = NA,
removeAbove = 0.95,
removeBelow = NA,
min.reads = 10,
min.lib.size = 10,
min.detected = 1,
ODgenes = TRUE,
od.genes.alpha = 0.05,
gam.k = 5,
nTopOD = 500)
colnames(dat$corpus)
})
unionAllGenes <- Reduce(union, all_ODgenes)
length(unionAllGenes)
allCorpus <- preprocess(t(as.matrix(all_counts)),
extractPos = FALSE,
selected.genes = unionAllGenes,
min.reads = 0,
min.lib.size = 0)
allCorpus$pos <- all_pos[rownames(allCorpus$corpus), ]
all_slice_filt <- all_slice[names(all_slice) %in% rownames(allCorpus$pos)]
length(all_slice_filt)
allCorpus$slice <- all_slice_filt
dim(allCorpus$corpus)
allCorpus$slm
dim(allCorpus$pos)
length(allCorpus$slice)
###########################################
#2 Running the LDA model
ldas <- fitLDA(allCorpus$corpus, Ks = seq(20, 20, by = 1),
perc.rare.thresh = 0.05,
plot=TRUE,
verbose=TRUE)
optLDAS <- optimalModel(models = ldas, opt = 20)
results <- getBetaTheta(optLDAS, perc.filt = 0.05, betaScale = 1000)
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
ensembl_gene_mapping <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"), mart = ensembl)
convert_ensembl_to_symbol <- function(ensembl_id, mapping) {
gene_symbol <- mapping$external_gene_name[match(ensembl_id, mapping$ensembl_gene_id)]
return(gene_symbol)
}
colnames(results$beta) <- sapply(colnames(results$beta), convert_ensembl_to_symbol, mapping = ensembl_gene_mapping)
any_na <- any(is.na(colnames(results$beta)))
topGenes(results$beta, n=10)
###########################################
#3 Deconvolution Visualization
theta1 <- results$theta[rownames(allCorpus$pos)[allCorpus$slice == 1],]
pos_int1 <- allCorpus$pos[rownames(allCorpus$pos)[allCorpus$slice == 1],]
colnames(pos_int1) <- c("y", "x")
vizAllTopics(theta = theta1,
pos = pos_int1,
r = 5,
lwd = 0,
showLegend = TRUE,
plotTitle = NA) +
ggplot2::guides(fill = ggplot2::guide_legend(ncol = 1)) +
ggplot2::geom_rect(data = data.frame(pos_int1),
ggplot2::aes(xmin = min(x) - 90, xmax = max(x) + 90,
ymin = min(y) - 90, ymax = max(y) + 90),
fill = NA, color = "black", linetype = "solid", size = 0.5) +
ggplot2::theme(
plot.background = ggplot2::element_blank()
) +
ggplot2::guides(colour = "none")
###########################################
Here are the outputs I get from running "dim(theta1) and dim(pos_int1)":
[1] 1644 20
[1] 1644 2
Also, here are few first-line overview of theta1 and pos_int1 when I run head functions:
head(theta1):
1 2 3 4
AAACAAGTATCTCCCA-1 0.09521523 0.06524648 0.000000 0.0000000
AAACAGAGCGACTCCT-1 0.00000000 0.00000000 0.000000 0.0000000
AAACATTTCCCGGATT-1 0.00000000 0.00000000 0.000000 0.3612481
AAACCCGAACGAAATC-1 0.05972998 0.00000000 0.000000 0.0000000
AAACCTAAGCAGCCGG-1 0.00000000 0.00000000 0.000000 0.0000000
AAACCTCATGAAGTTG-1 0.00000000 0.00000000 0.254858 0.0000000
5 6 7 8 9 10
AAACAAGTATCTCCCA-1 0.0000000 0.00000000 0 0 0.0000000 0.2516840
AAACAGAGCGACTCCT-1 0.0000000 0.00000000 0 0 0.1006190 0.7903896
AAACATTTCCCGGATT-1 0.0000000 0.00000000 0 0 0.0000000 0.4040613
AAACCCGAACGAAATC-1 0.0000000 0.06014447 0 0 0.4313043 0.0000000
AAACCTAAGCAGCCGG-1 0.3908903 0.00000000 0 0 0.0000000 0.0000000
AAACCTCATGAAGTTG-1 0.0000000 0.00000000 0 0 0.0000000 0.0000000
11 12 13 14 15 16
AAACAAGTATCTCCCA-1 0.4222455 0 0 0.1656088 0.0000000 0.0000000
AAACAGAGCGACTCCT-1 0.0000000 0 0 0.0000000 0.0000000 0.0000000
AAACATTTCCCGGATT-1 0.0000000 0 0 0.0000000 0.2346906 0.0000000
AAACCCGAACGAAATC-1 0.2131955 0 0 0.0000000 0.0000000 0.0858006
AAACCTAAGCAGCCGG-1 0.1892510 0 0 0.0000000 0.0000000 0.2062755
AAACCTCATGAAGTTG-1 0.4748901 0 0 0.0000000 0.0000000 0.0000000
17 18 19 20
AAACAAGTATCTCCCA-1 0.0000000 0 0.0000000 0
AAACAGAGCGACTCCT-1 0.1089915 0 0.0000000 0
AAACATTTCCCGGATT-1 0.0000000 0 0.0000000 0
AAACCCGAACGAAATC-1 0.0000000 0 0.1498251 0
AAACCTAAGCAGCCGG-1 0.0000000 0 0.2135833 0
AAACCTCATGAAGTTG-1 0.0000000 0 0.2702520 0
head(pos_int1):
y x
AAACAAGTATCTCCCA-1 9041 14153
AAACAGAGCGACTCCT-1 16203 13154
AAACATTTCCCGGATT-1 6843 13606
AAACCCGAACGAAATC-1 10054 15630
AAACCTAAGCAGCCGG-1 6027 12012
AAACCTCATGAAGTTG-1 11521 4619
I know this is a wildly lengthy post, but I couldn't figure out what the issue is after spending almost the entire day debugging, so I would really appreciate your help!!
Thank you very much,
Daniel
Hi,@bmill3r,
I recently read your paper "Reference-free cell type deconvolution of multi-cellular pixel-resolution spatially resolved transcriptomics data" and found it very insightful.
I tried to replicate the results of the MOB dataset using the code shared on GitHub and got good results. But I have a question: the transcriptional profiles of different layers of MOB were provided on github. How did you calculate these transcriptional profiles? Where can I find the corresponding single-cell data set about layer transcriptional profiles?
Hello,
Hope you are all doing well. I've had success running MERINGUE and implementing the "getting started" tutorial for STdeconvolve with our data. I had no issues loading in our counts (cd) matrix (large dgCMatrix) until the pre-processing step in the "additional features" tutorial:
corpus1 <- preprocess(t(cd),
alignFile = NA, # if there is a file to adjust pixel coordinates this can be included.
extractPos = FALSE, # optional argument
selected.genes = NA, #
nTopGenes = 3, # remove the top 3 expressed genes (genes with most counts) in dataset
genes.to.remove = c("^Trmt"), # ex: remove tRNA methyltransferase genes (gene names that begin with "Trmt")
removeAbove = 0.95, # remove genes present in 95% or more of pixels
removeBelow = 0.05, # remove genes present in 5% or less of pixels
min.reads = 10, # minimum number of reads a gene must have across pixels
min.lib.size = 100, # minimum number of reads a pixel must have to keep (before gene filtering)
min.detected = 1, # minimum number of pixels a gene needs to have been detected in
ODgenes = TRUE, # feature select for over dispersed genes
nTopOD = 100, # number of top over dispersed genes to use, otherwise use all that pass filters if `NA`
od.genes.alpha = 0.05, # alpha param for over dispersed genes
gam.k = 5, # gam param for over dispersed genes
verbose = TRUE)
I get the error:
Error in t.default(cd) : argument is not a matrix
Was wondering if you had any insight as to why it does not recognize the cd counts matrix considering that it was working for the other analyses-- can share what our matrix looks like if need be.
Thanks in advance!
Hi,
Sorry to bother you, but I am stuck again at one point here, I am currently having a combined object with ~130,000 cells and when I run the restrictCorpus I am getting the following error
## filter for features in less than 100% of pixels but more than 5% of pixels
pdac.mat <- restrictCorpus(pdach1.mat, removeAbove = 1.0, removeBelow = 0.05)
and the error is as follows
Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102
Calls: restrictCorpus ... is.data.frame -> as.matrix -> as.matrix.Matrix -> as -> asMethod
Execution halted
Please let me know how best I can handle this error.
Thanks again, and have a great day.
Regards,
AMK
Hi,
I have a question related to STdeconvolve and if it's possible to fit the model on multiple samples at the same time? I have in total 16 samples that are all similar/equal to one another. Where I would like to run STdeconvolve and fit the model on all 16 samples. I've created a script for that to run it on the spatial raw data but I was wondering if the model can be used on 16 samples at the same time and still detect differences in cell type compositions between samples if there are any? The hypothesis is that we do not have cell type differences between the samples. However, we would like to know where which cell type is located within our samples.
Thank you,
Joy
Hi there, thanks for developing STdeconvolve. Is there a way to define a particular area on the slide and analyze only that, e.g. in case the slide has two sections of tissue or within the same section?
I tried different ways to define the topic cell type but not mentioned in the algorithm I guess.Wanted to ask you guys If we can actually say what a topic here instead of groups.
@bmill3r Would it be possible to plot the output of vizAllTopics
as individual scatterplot circles instead of scatterpie circles? For example with geom_point()
or some other way. Could I edit the vizAllTopics
function to do this? And if so where can I access the code to do so?
Thanks.
I'm trying to annotate cell types of visium 10x data using the annotateCellTypesGSEA function. This is a minimal script to reproduce the error
library(STdeconvolve)
library(SpatialExperiment)
se <- SpatialExperiment::read10xVisium(samples = "./116_run",
type = "sparse",
data = "filtered")
se
# this is the genes x barcode sparse count matrix
cd <- se@assays@data@listData$counts
pos <- SpatialExperiment::spatialCoords(se)
# change column names to x and y
# for this dataset, we will visualize barcodes using "pxl_col_in_fullres" = "y" coordinates, and "pxl_row_in_fullres" = "x" coordinates
colnames(pos) <- c("y", "x")
counts <- cleanCounts(cd, min.lib.size = 100, min.reads = 10)
corpus <- restrictCorpus(counts, removeAbove=1.0, removeBelow = 0.05, nTopOD = 1000)
#ldas <- fitLDA(t(as.matrix(corpus)), Ks = seq(2, 15, by = 1),
ldas <- fitLDA(t(as.matrix(corpus)), Ks = 10,
perc.rare.thresh = 0.05,
plot=TRUE,
verbose=TRUE)
optLDA <- optimalModel(models = ldas, opt = "min")
results <- getBetaTheta(optLDA, perc.filt = 0.05, betaScale = 1000)
deconProp <- results$theta
deconGexp <- results$beta
gset <- list(
test_type1 = c("FLG", "FLG2"),
test_type2 = c("KTR25", "KTR71")
)
celltype_annotations <- annotateCellTypesGSEA(beta = results$beta, gset = gset, qval = 0.05)
The error itself is
initial: [1e+02 - Error in stats::ecdf(as.numeric(rvl$p/s.pm)) :
'x' must have 1 or more non-missing values
Calls: annotateCellTypesGSEA -> <Anonymous> -> bulk.gsea -> <Anonymous>
Execution halted
I can provide the dataset by email if needed
Hi Brendan,
Sorry for bothering you again, I have deconvolved the cell types and I have compared them in terms of clustering the barcodes transcriptionally as detailed in 10X tutorials. Now I have 18 beautiful clusters, is there a way I can add or convert or create a 2D t-SNE embedding into a Seurat object? I wanted to do the downstream analysis between these clusters. Sorry if I am asking too many questions, but trust me you are my last hope. Thanks again for creating such a beautiful tool.
Best wishes,
Regards,
AK
Hi all!
I performed my ST analysis from 10X using Seurat framework. Then I discovered this amazing tool to deepen in my data without using Single Cell. Would have any option to combine Seurat objects with STdeconvolve?
Thanks in advance.
Hi,
This is a great software!
I am currently using 10x visium data for analysis and want to define cell types, but when running "vizAllTopics", I get an error:
“Error: 'transparentCol' is not an exported object from 'namespace:STdeconvolve'”
This is my code:
plt <- vizAllTopics(theta = deconProp, pos = pos, groups = rep("0", dim(deconProp)[1]), group_cols = c("0" = STdeconvolve::transparentCol("black", percent = 100)), r = 45, lwd = 0, showLegend = TRUE, plotTitle = NA)
Hi there,
Thank you for developing such a useful and well-documented tool.
While processing my data, I have noticed inconsistencies in the transcriptional clustering.
Running this exact same block of code (adapted from the vignette) back-to-back on the same sample has resulted in several different figures, some with different numbers of clusters. See figures below.
pcs.info <- stats::prcomp(t(log10(as.matrix(counts)+1)), center=TRUE)
nPcs <- 8
pcs <- pcs.info$x[,1:nPcs]
emb <- Rtsne::Rtsne(pcs,
is_distance=FALSE,
perplexity=30,
num_threads=1,
verbose=FALSE)$Y
rownames(emb) <- rownames(pcs)
colnames(emb) <- c("x", "y")
k <- 25
com <- MERINGUE::getClusters(pcs, k, weight=TRUE, method = igraph::cluster_louvain)
tempCom <- com
dat <- data.frame("emb1" = ppos[,"x"],
"emb2" = ppos[,"y"],
"Cluster" = tempCom)
plt <- ggplot2::ggplot(data = dat) +
ggplot2::geom_point(ggplot2::aes(x = emb1, y = emb2,
color = Cluster), size = 1.2) +
ggplot2::scale_color_manual(values = rainbow(n = 15)) +
# ggplot2::scale_y_continuous(expand = c(0, 0), limits = c( min(dat$emb2)-1, max(dat$emb2)+1)) +
# ggplot2::scale_x_continuous(expand = c(0, 0), limits = c( min(dat$emb1)-1, max(dat$emb1)+1) ) +
ggplot2::labs(title = "",
x = "x",
y = "y") +
ggplot2::theme_classic() +
ggplot2::theme(axis.text.x = ggplot2::element_text(size=15, color = "black"),
axis.text.y = ggplot2::element_text(size=15, color = "black"),
axis.title.y = ggplot2::element_text(size=15),
axis.title.x = ggplot2::element_text(size=15),
axis.ticks.x = ggplot2::element_blank(),
plot.title = ggplot2::element_text(size=15),
legend.text = ggplot2::element_text(size = 12, colour = "black"),
legend.title = ggplot2::element_text(size = 15, colour = "black", angle = 0, hjust = 0.5),
panel.background = ggplot2::element_blank(),
plot.background = ggplot2::element_blank(),
panel.grid.major.y = ggplot2::element_blank(),
axis.line = ggplot2::element_line(size = 1, colour = "black")
# legend.position="none"
) +
ggplot2::guides(colour = ggplot2::guide_legend(override.aes = list(size=2), ncol = 2)
) +
ggplot2::coord_equal()
plt
I know the tSNE/UMAP plots are often vary slightly in their arrangement, but I haven't encountered a situation where the number of clusters change. Can you clarify if this is normal, or a bug, or something I'm missing/doing incorrectly? (I apologize in advance if this is an issue I should have opened in the Meringue GitHub.)
Thank you!
Hi,
Very useful software!
I just wonder how to use it with the 10 x visium spaceranger output?
Do you have a brief intro for that?
Thank you.
Dear Dr. Miller (@bmill3r ),
I am working on spatial transcriptomics data that I created using the 10X visium kit.
I used this tool to deconvolute the cell types in my tissue and it works like a charm.
Now I am trying to superimpose the scatterpies I got on top of the H&E images.
I saw in the vizAllTopics function that you added an "overlay" argument but I didn't succeeded to place the scatterpies and the image on top of each other.
Can I please get your assistance on how to do so?
Thank you so much for creating STdeconvolve,
Best regards,
Peleg
Hi, Thanks for developing reference free Deconvolve method, I thoroughly enjoy running it on the example data. Going forward, I have an integrated spatial Seurat object, we have 9 spatial samples(images ) as one object. My question is how best I can use this method on our dataset. Should I run STdeconvolve on individual samples or can I run this together? My codes are as follows.
` extract the counts matrix
mat <- Matrix((st@assays$Spatial@counts), sparse = TRUE)
remove poor genes and pixels
mat <- cleanCounts(mat, min.lib.size = 100)
filter for features in less than 100% of pixels but more than 5% of pixels
pdach1.mat <- restrictCorpus(mat, removeAbove = 1.0, removeBelow = 0.05)
pdf("pdach1.perplexity.pdf")
ldas = fitLDA(mat, Ks = seq(2, 9, by=1), plot=T, verbose=TRUE)
dev.off()
optLDA <- optimalModel(models = ldas, opt = "kneed")
results <- getBetaTheta(optLDA, perc.filt = 0.05, betaScale = 1000)
deconvolution resultsattach(frame)
deconProp =results$theta
write.table(deconProp, "All_results_decon.txt", quote=F)
predicted transcriptional profiles of the deconvolved cell-types as the beta matrix
deconGexp = results$beta
#SAMPLE 1
sample1.pos <- GetTissueCoordinates(st@images$sample1)
change column names to x and y
colnames(sample1.pos) <- c('x','y')
`
Please let me know how best I can approach this issue.
Thank you for your help.
Regards,
AK
Dear STdeconvolve developers,
I am very interested in using STdeconvolve to explore my dataset, I know this maybe a silly question.
But how to load Slide-seq data to the workflow of STdeconvolve?
I am familiar with R and Python, maybe point me to some documents or tell me the strategy would be fine.
Thanks!
Thank you so much for creating this tool for deconvolution without scRNAseq data.
I am working with an extremely rare pediatric tumor where the samples are few and far between and there are basically no publicly available scRNAseq reference dataset.
I am currently running my samples through Visium platform for SRT and using Giottosuite to ultimate analyze my data. The output is a Giotto object. I have read through the documentation using 10x data but I wasn't sure how to utilize Giotto object in the code for setting up to perform STdeconvolve. Can I please get some assistance.
Thank you so much I cannot wait to use STdeconvolve
Hi,
I've been following the 10x Visium protocol and it seems to work well, but before i annotate the cell type, i would love to do some sanity check of the log2fc genes expression. I am trying to pull all the log2fc gene expression but after following the instruction, it only allows me to view top 6 gene after applying the following command
"log2topgene2 <- lapply(1:20, function(i) {
head(sort(log2(deconGexp[i,]/colMeans(deconGexp[-i,])), decreasing=TRUE))
})"
or
"ps <- lapply(colnames(deconProp), function(celltype) {
celltype <- as.numeric(celltype)
highgexp <- names(which(deconGexp[celltype,] > 3))
log2fc <- sort(log2(deconGexp[celltype,highgexp]/colMeans(deconGexp[-celltype,highgexp])), decreasing=TRUE)
markers <- names(log2fc)[1] ## label just the top gene"
Is there a way not limit the gene and maybe display top 200 genes?
As I'm using STdeconvolve
on new datasets, here are some enhancements that I believe will help improve the user experience. This is a running list. Please feel free to add and check off as needed.
fitLDA
would benefit from a progress bar that shows if verbose=TRUE
. See https://github.com/r-lib/progressvizAllTopics
should check if groups
is a factor and cast if not (otherwise throws errors)vizAllTopics
would benefit from automatically setting an appropriate r
based on the scale of pos
unless users override with a specific choice (ex. max(0.5, max(pos)/nrow(pos)*10
or something)restrictCorpus
should limit the number of genes to some top most variable set (similar to how veloviz
does it) in the event there are too many features (ex > 1000 by default or some other user-specified number)spatial_data <- read10xVisium(
samples='data/sample',
sample_id='10X_Mouse_Brain',
type='HDF5',
data = 'filtered',
images='hires',
load=FALSE
)
spatial_data
class: SpatialExperiment
dim: 32285 2702
metadata(0):
assays(1): counts
rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ...
ENSMUSG00000095019 ENSMUSG00000095041
rowData names(1): symbol
colnames(2702): AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 ...
TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
colData names(4): in_tissue array_row array_col sample_id
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
imgData names(4): sample_id image_id data scaleFactor
x <- spatial_data@assays@data@listData$counts
counts <- cleanCounts(x, min.lib.size = 100, min.reads = 10)
Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...): 'x' must be an array of at least two dimensions
Traceback:
counts <- Matrix::Matrix(counts, sparse = TRUE)
which will convert count matrix of SpatialExperiment into a sparse matrix with shape (87234070 x 1). And when apply filter metrics to that matrix, the result will be 0 cells and 0 genes.Hi all,
I don't have single cell data. I want to use this tool to determine the cell type of my 10x visium data. I would like to get the top5 marker genes in each cluster instead of the top one. What should I do? Thanks!
Thanks for your great work. I'm wondering in the MERFISH benchmarking, how do you align the cell types of scRNA-seq which is used as a reference to the cell types of cells in MERFISH and where do you obtain the scRNA-seq data. Could you please explain this detailedly to me? Thanks.
Hi bmill3r,
Thank you very much for this exciting package. It is very useful, especially when retrospectively analyzing archival tissue with Visium FFPE.
Since you feature the BayesSpace package in the Nat Com Paper (https://github.com/edward130603/BayesSpace), I was wondering if an upstream BayesSpace resolution enhancement would improve the performance of the STdeconvolve algorithm. I could imagine that STdeconvolve would in principle benefit from a higher (although predicted) resolution of the underlying gene expression matrix.
Do you think that this approach would be statistically sound or would it violate the underlying STdeconvolve assumptions?
Thank you very much for time in advance and thanks again for your great work.
Hi,
I am trying to reproduce the coronal mouse brain section tutorial https://jef.works/STdeconvolve/visium_10x.html.
If I use exactly the same code as the one shown in that tutorial, I get some different results mainly because of the filtering step:
library(STdeconvolve)
library(MERINGUE)
library(MUDAN)
f <- "visiumTutorial/"
if(!file.exists(f)){
dir.create(f)
}
if(!file.exists(paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"))){
tar_gz_file <- "http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"
download.file(tar_gz_file,
destfile = paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"),
method = "auto")
}
untar(tarfile = paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"),
exdir = f)
if(!file.exists(paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"))){
spatial_imaging_data <- "http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz"
download.file(spatial_imaging_data,
destfile = paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"),
method = "auto")
}
untar(tarfile = paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"),
exdir = f)
se <- SpatialExperiment::read10xVisium(samples = f,
type = "sparse",
data = "filtered")
se
cd <- se@assays@data@listData$counts
pos <- SpatialExperiment::spatialCoords(se)
colnames(pos) <- c("y", "x")
counts <- cleanCounts(cd, min.lib.size = 100, min.reads = 10)
The above line outputs: > colnames(pos) <- c("y", "x")
counts <- cleanCounts(cd, min.lib.size = 100, min.reads = 10)
Filtering matrix with 2702 cells and 32285 genes ...
Resulting matrix has 26 cells and 1596 genes
Further to this,
corpus <- restrictCorpus(counts, removeAbove=1.0, removeBelow = 0.05, nTopOD = 1000)
outputs:
Removing 23 genes present in 100% or more of pixels...
1573 genes remaining...
Removing 0 genes present in 5% or less of pixels...
1573 genes remaining...
Restricting to overdispersed genes with alpha = 0.05...
Calculating variance fit ...
Using gam with k=5...
20 overdispersed genes ...
Using top 1000 overdispersed genes.
number of top overdispersed genes available: 20
However, in your tutorial the same lines of code produce:
Can you please check this and help me understand why there is mismatching? Is it just a typo in your tutorial? If yes can you please let me know the correct filter you applied?
One more question!
Is it possible to run stDeconvolve on the whole dataset - i.e. apply no filtering at all ?
Thanks!!
Hi @bmill3r ,
Thank you for developing this tool. Deconvolution and annotation have worked well for me thus far. I am now looking to attempt to do CNV analysis on the deconvolved cell types (I have already done them on the non-deconvolved samples) as a form of comparison and achieving more granularity. I am struggling with creating the initial infercnv Object. I'm not sure your familiarity with the package, so I'll briefly detail what is needed.
In order to create an infercnv object, I need a raw_counts_matrix which is the matrix of genes (rows) vs. cells (columns) containing the raw counts. If I am correct, I can use the corpus for my raw counts matrix so that isn't an issue.
You then need an annotations_file which for me is typically a matrix of the barcodes (rows) vs Seurat clusters (columns).
The goal would be to have the annotations file be the barcodes corresponding to each cell type. I know the results$theta are proportions of each cell type for each spot, so I am having difficulty trying to circumvent that issue either by making each barcode correspond to the cell type of the highest proportion, or another idea if you have one.
I greatly appreciate any assistance you can give.
Thanks,
Rob
Formal class 'Seurat' [package "SeuratObject"] with 13 slots
..@ assays :List of 2
.. ..$ RNA:Formal class 'Assay' [package "SeuratObject"] with 8 slots
.. ..$ SCT:Formal class 'SCTAssay' [package "Seurat"] with 9 slots
..@ meta.data :'data.frame': 11714 obs. of 22 variables:
.. ..$ orig.ident : Factor w/ 1 level "SeuratProject": 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ nCount_RNA : num [1:11714] 4083 3920 8126 8507 4338 ...
.. ..$ nFeature_RNA : int [1:11714] 1429 1466 2578 2524 1752 1650 3310 4130 2476 2823 ...
.. ..$ slide : chr [1:11714] "D17" "D17" "D17" "D17" ...
.. ..$ treatment : chr [1:11714] "PBS" "PBS" "PBS" "PBS" ...
.. ..$ id : int [1:11714] 27 29 31 33 35 37 39 41 43 45 ...
.. ..$ labels : chr [1:11714] "Default" "Default" "Default" "Default" ...
.. ..$ sample_id : chr [1:11714] "section_1" "section_1" "section_1" "section_1" ...
.. ..$ slide1 : Factor w/ 6 levels "D17","D19","D54",..: 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ percent.mito : num [1:11714] 29.5 27.6 21.7 21.7 20.1 ...
.. ..$ percent.ribo : num [1:11714] 8.55 8.55 9.61 10.09 9.84 ...
.. ..$ nCount_SCT : num [1:11714] 11606 11413 11185 11262 11202 ...
.. ..$ nFeature_SCT : int [1:11714] 1991 2066 2589 2533 2202 2120 3310 3857 2486 2823 ...
.. ..$ SCT_snn_res.0.1: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ SCT_snn_res.0.2: Factor w/ 8 levels "0","1","2","3",..: 1 1 1 1 1 1 1 6 1 1 ...
.. ..$ SCT_snn_res.0.3: Factor w/ 10 levels "0","1","2","3",..: 1 1 5 5 1 5 5 7 5 5 ...
.. ..$ SCT_snn_res.0.4: Factor w/ 11 levels "0","1","2","3",..: 1 1 3 3 1 3 3 6 3 3 ...
.. ..$ SCT_snn_res.0.6: Factor w/ 13 levels "0","1","2","3",..: 1 1 2 2 1 2 2 5 2 2 ...
.. ..$ SCT_snn_res.0.8: Factor w/ 16 levels "0","1","2","3",..: 1 1 3 7 1 7 3 9 7 3 ...
.. ..$ SCT_snn_res.1 : Factor w/ 18 levels "0","1","2","3",..: 1 1 2 16 1 16 2 4 16 2 ...
.. ..$ SCT_snn_res.1.2: Factor w/ 21 levels "0","1","2","3",..: 1 1 2 7 1 7 2 21 7 2 ...
.. ..$ seurat_clusters: Factor w/ 8 levels "0","1","2","3",..: 1 1 1 1 1 1 1 6 1 1 ...
..@ active.assay: chr "SCT"
..@ active.ident: Factor w/ 8 levels "0","1","2","3",..: 1 1 1 1 1 1 1 6 1 1 ...
.. ..- attr(*, "names")= chr [1:11714] "CTGCACCTGGAACCGC-1_1" "GTTCTTCCCTCGATGT-1_1" "AACGATAGAAGGGCCG-1_1" "AGGCCCATTGTACAGG-1_1" ...
..@ graphs :List of 2
.. ..$ SCT_nn :Formal class 'Graph' [package "SeuratObject"] with 7 slots
.. ..$ SCT_snn:Formal class 'Graph' [package "SeuratObject"] with 7 slots
..@ neighbors : list()
..@ reductions :List of 4
.. ..$ pca :Formal class 'DimReduc' [package "SeuratObject"] with 9 slots
.. ..$ tsne :Formal class 'DimReduc' [package "SeuratObject"] with 9 slots
.. ..$ umap :Formal class 'DimReduc' [package "SeuratObject"] with 9 slots
.. ..$ umap.3d:Formal class 'DimReduc' [package "SeuratObject"] with 9 slots
..@ images : list()
..@ project.name: chr "SeuratProject"
..@ misc : list()
..@ version :Classes 'package_version', 'numeric_version' hidden list of 1
.. ..$ : int [1:3] 4 0 2
..@ commands :List of 6
.. ..$ SCTransform.RNA :Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
.. ..$ RunPCA.SCT :Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
.. ..$ FindNeighbors.SCT.pca:Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
.. ..$ FindClusters :Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
.. ..$ RunTSNE :Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
.. ..$ RunUMAP.SCT.pca :Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
..@ tools :List of 1
.. ..$ Staffli:Formal class 'Staffli' [package "STutility"] with 12 slots
I wanted to load a filtered data from a spatial 10X experiment by:
se <- SpatialExperiment::read10xVisium(samples = "outs/filtered_feature_bc_matrix.h5", type = "sparse", data = "filtered")
However, I get en error:
Error in SpatialExperiment::read10xVisium(samples = "outs/filtered_feature_bc_matrix.h5", : No matching files found for 'images=c(“lowres”)
I tried to change the code to:
se <- SpatialExperiment::read10xVisium(samples = "outs/filtered_feature_bc_matrix.h5", type = "sparse", data = "filtered", images = c("lowres"), load = TRUE)
but didn't work. What would I do?
Thanks
Dear Team,
Thank you for your amazing package!
I wanted to ask you if it's possible to run STdeconvolve on multiple sections at the same time (I have done this by following the instructions from breast cancer sections. I have in total 16 similar sections obtained from 10x Visium that I put through. However, I'm trying to plot the topics on the ST data by the following code:
plt <- vizAllTopics(theta = deconProp,
pos = pos,
r = 45,
lwd = 0,
showLegend = TRUE,
plotTitle = NA) +
ggplot2::guides(fill=ggplot2::guide_legend(ncol=2)) +
ggplot2::geom_rect(data = data.frame(pos),
ggplot2::aes(xmin = min(x)-90, xmax = max(x)+90,
ymin = min(y)-90, ymax = max(y)+90),
fill = NA, color = "black", linetype = "solid", size = 0.5) +
ggplot2::theme(
plot.background = ggplot2::element_blank()
) +
ggplot2::guides(colour = "none")
However, this results into 1 plot for all 16 samples. Is there a possibility to plot all 16 samples separately?
With kind regards,
Joy Otten
I attempted to visualize some of the results but when I apply the commands, I cannot see the deconvolved cell types properly or distinctly. Also, I should mention that:
1- I get no error. Just the visualized images do not seem to be proper ones.
2- I do follow this link for my analysis: https://github.com/JEFworks-Lab/STdeconvolve/blob/devel/docs/visium_10x.md
I hope I am on the right way.
Dear Dr. Miller (@bmill3r ),
Sorry to bother you, I am trying to run through the pseudo synthetic ST data (MPOA) analysis using STDeconvolve MPOA tutorial, but cannot find the cell volume data for animal FN7, datasets 171021_FN7_2_M22_M26 and 171023_FN7_1_M22_M26 in order to get discrete transcipt counts for each of the genes in order to continue with the LDA modelling. Could you perhaps share them?
Best regards,
paulmorio
Hi there, and thank you so much for developing such a great package! I have gone through the tutorial with my own data and everything is working well except for the groups
parameter of the vizAllTopics()
function. I have two different columns of meta.data that I would like to color code the scatterpies by: 1) cluster and 2) tissue region (like shown in this tutorial). However, when I attempt to color code my scatterpies by either of these meta.data values, the resulting plot is wrong and it looks like the annotations are being assigned to the completely wrong spots (this is especially obvious when looking at my tissue region annotations, which should be grouped together throughout the plot but are instead scattered like confetti). The resulting Topics that I get back look correct, and seem to be identifying known cell types in the correct spots. It's just the groups
annotation that is not working. If this is a user issue, do you have any thoughts or advice on where I might need to tweak the code? I have followed everything according to the tutorials thus far.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.