xuranw / music Goto Github PK

View Code? Open in Web Editor NEW

214.0 5.0 76.0 136.5 MB

Multi-subject Single Cell Deconvolution

Home Page: https://github.com/xuranw/MuSiC

License: GNU General Public License v3.0

R 78.37% Dockerfile 0.62% JavaScript 19.05% CSS 1.97%

single-cell-rna-seq statistical-genetics

music's Introduction

MuSiC

MuSiC is an analysis toolkit for single-cell RNA-Seq experiments. To use this package, you will need the R statistical computing environment (version 3.0 or later) and several packages available through Bioconductor and CRAN.

Update (01/15/2024)

We're excited to announce two significant enhancements to the MuSiC toolkit:

1. Integration with R Devcontainer and Docker for Codespace Environments:

We have now enabled the setup of an R development container, specifically tailored for MuSiC, in a Docker environment. This allows users to seamlessly edit and run MuSiC in a cloud-based development environment such as GitHub Codespaces.

With this update, users can easily access a pre-configured R environment with all necessary dependencies and settings, ensuring a consistent and reproducible coding experience regardless of the local machine setup.

2. Enhanced Code Documentation:

To improve user experience and code readability, we have added comprehensive comments and annotations throughout the MuSiC codebase.

These annotations provide clear explanations for each line of code and function, assisting users in understanding the underlying logic and facilitating easier modifications or customizations to the toolkit.

These updates are part of our ongoing efforts to make MuSiC more accessible and user-friendly for researchers and developers in the RNA-Seq community. We believe that these enhancements will significantly streamline the workflow for both new and experienced users of MuSiC.

Update (09/26/2022)

MuSiC (v1.0.0) now support SingleCellExperiment class as single cell reference!
Please See updated Tutorial for guidance!

Update (09/26/2022)

MuSiC2 is available! You can use MuSiC2 for cell type deconvolution for multi-condition bulk RNA-seq data.
MuSiC2 functions can be accessed with either latest version of MuSiC(v1.0.0) or installed from this github repo of Dr. Jiaxin Fan.

The original release of MuSiC is a deconvolution method that utilizes cross-subject scRNA-seq to estimate cell type proportions in bulk RNA-seq data.

MuSiC2 is an iterative algorithm aiming to improve cell type deconvolution for bulk RNA-seq data using scRNA-seq data as reference when the bulk data are generated from samples with multiple clinical conditions where at least one condition is different from the scRNA-seq reference.

How to cite `MuSiC`

Please cite the following publications:

Bulk tissue cell type deconvolution with multi-subject single-cell expression reference
X. Wang, J. Park, K. Susztak, N.R. Zhang, M. Li
Nature Communications. 2019 Jan 22 https://doi.org/10.1038/s41467-018-08023-x

MuSiC2: cell type deconvolution for multi-condition bulk RNA-seq data
J. Fan, Y. Lyu, Q. Zhang, X. Wang, R. Xiao, M. Li
Briefings in Bioinformatics. 2022 https://doi.org/10.1093/bib/bbac430

Installation

Both MuSiC and MuSiC2 functions are available in one package.

# install devtools if necessary
install.packages('devtools')

# install the MuSiC package
devtools::install_github('xuranw/MuSiC')

# load
library(MuSiC)

Special Steps for GitHub Codespaces

What is Codespaces?

Codespaces is a feature provided by GitHub offering a cloud-based, integrated development environment (IDE). This IDE allows developers to write, run, and debug code directly within GitHub. Essentially, Codespaces delivers a complete, configurable development environment accessible anywhere via a web browser or supported code editors like Visual Studio Code.

How to Use Codespaces

1. Fork the Repository

Click on the “Fork” button in the upper right corner of the repository page to create a copy of the repository in your GitHub account.

2. Create a new Codespace

Navigate to the repository (either the original one or your fork). Look for the "Code" dropdown button near the top of the page and click on it to see various options for working with the repository.
Click on Codespaces, then click on the PLUS (+) sign to create a new codespace.
If this is your first time creating a codespace for this repository, initial setup might take a while.

3. Using the Codespace

Once the Codespace is created, it will open a VS Code-like editor in your browser, complete with a terminal, code editor, and debugger.
You can now write, edit, run, and debug code directly in your browser.

4. Open RStudio

To open RStudio Server, click the Forwarded Ports "Radio" icon at the bottom of the VS Code Online window.
In the Ports tab, hover over the "Local Address" column for the RStudio row and click the Open in Browser "World" icon.
This will launch RStudio Server in a new window. Log in with the username and password rstudio/rstudio.
- NOTE: Sometimes, the RStudio window may fail to open with a timeout error. If this happens, try again, or restart the Codespace.
In RStudio, use the File menu to open the /music folder and then browse to open the file test.R.

5. Test with the `test.R` File

Now, play around with the test.R file. Click on "Run" several times until you see the plot.

More Information

Please see Tutorials for MuSiC and MuSiC2.

music's People

Stargazers

Watchers

Forkers

ahy1221 zorrodong sharks28 chitrita sqsun ectopicapkc rlopez18 junjuanzheng mwsill liangdp1984 rhreynolds wenrurumon ksosina sunqiangzai xflicsu kerwin12580 jrfischer pythseq hzongyao matei-ionita eclipse233 shenyang1981 duosu mengchengyao nihaonewworld jiaxin-fan wibll futong1998 crhisto mschilli87 laughinglovehaoran tejokukpe jianguozhou3 pelzko genomicsnx siqi2401 mkarikom feeeengym gemmabb hasanalanya crsky1023 altairwei cpusummer-wdn metamaden omnideconv qindan2008 esroom zhangkaicr stanleyjacob ashastry2 marie888666 living1069 laura-munatx ttmaowill mictott linuxpham rumarova ccasar hevmarriott jaclynbeck-sage cwp09 stephanienguyen2020 schaudge chenyanjuan1993 samdeet-khan katie-jd aditya-88 dingzetao heoly32 seveneminem causalpathlab songyj9 parkdohoon fujun0406

music's Issues

additions to vignette

Thanks for making a great package!

I realized I had to library(xbioc) in order to run music_basis(). It would be helpful to add this to the vignette.

The vignette loads IEmarkers.RData but this does not appear to be available from https://github.com/xuranw/MuSiC/tree/master/vignettes/data. Would it be possible to add it?

music_prop not available

Hi,
Thanks for providing this great package,
I just a small issue in the NAMESPACE file of the package. The music_prop function is not exported, which raises the following error message: object 'music_prop' not found.

A workaround for the user it to write MuSiC:::music_prop instead, but might not be evident for some occasional R users.

Tutorial: IEmarkers.Rdata

Hello,

sadly loading the Rdata file does result in two character-lists and not one RObject.

Best

Scaling Single Cell for the bulk deconvolution

I'm having issues scaling the single cell to the bulk experiment with my own data. Using your simulation function I can simulate a bulk set from the single-cell data, and then use that to deconvolute that using other single-cell data sets. However, when I then try to use that to deconvolute real bulk data sets, the model performs poorly. I hypothesize this is due to scaling, but I am unsure.

Thanks

Pre-processing

Hi,

I would like to know what should be the pre-processing steps to apply on bulk and scRNA data before applying MuSiC. I could not find anywhere in the documentation or in github how the preprocessing is done. It's a bit strange.

For scRNA, how many cells do you need to perform better ? I have a data from pooled scRNA data i.e I pooled all the cells of similar type 1 and quantified the gene expression, so I have 1 sample per cell type. Does it work with it ?

Benchmark evaluation tutorial error with plot_grid

I am running the tutorial for MuSiC for Benchmark evaluation. I followed all the steps without problems, and averything looks like is running properly, but the final step which is plotting the figures stored in "abs.diff.fig" and "prop.comp.fig" generate an error with plot_grid.

Here is everything R says about the error:

plot_grid(prop.comp.fig, abs.diff.fig, labels = "auto", rel_widths = c(4,3))
Error: Aesthetics must be either length 1 or the same as the data (9): label
Run rlang::last_error() to see where the error occurred.
rlang::last_error()
<error/rlang_error>
Aesthetics must be either length 1 or the same as the data (9): label
Backtrace:

cowplot::plot_grid(...)
cowplot::align_plots(...)
base::lapply(...)
cowplot:::FUN(X[[i]], ...)
cowplot:::as_gtable.default(x)
cowplot:::as_grob.ggplot(plot)
ggplot2::ggplotGrob(plot)
ggplot2:::ggplot_build.ggplot(x)
ggplot2:::by_layer(function(l, d) l$compute_geom_2(d))
ggplot2:::f(l = layers[[i]], d = data[[i]])
l$compute_geom_2(d)
ggplot2:::f(..., self = self)
self$geom$use_defaults(data, self$aes_params, modifiers)
ggplot2:::f(..., self = self)
ggplot2:::check_aesthetics(params[aes_params], nrow(data))
Run rlang::last_trace() to see the full context.

rlang::last_trace()
<error/rlang_error>
Aesthetics must be either length 1 or the same as the data (9): label
Backtrace:
█

└─cowplot::plot_grid(...)
└─cowplot::align_plots(...)
```
└─base::lapply(...)
```
```
  └─cowplot:::FUN(X[[i]], ...)
```
```
    ├─cowplot::as_gtable(x)
```

    └─cowplot:::as_gtable.default(x)

```
      ├─cowplot::as_grob(plot)
```

      └─cowplot:::as_grob.ggplot(plot)

        └─ggplot2::ggplotGrob(plot)

          ├─ggplot2::ggplot_gtable(ggplot_build(x))

          ├─ggplot2::ggplot_build(x)

          └─ggplot2:::ggplot_build.ggplot(x)

            └─ggplot2:::by_layer(function(l, d) l$compute_geom_2(d))

              └─ggplot2:::f(l = layers[[i]], d = data[[i]])

                └─l$compute_geom_2(d)

                  └─ggplot2:::f(..., self = self)

                    └─self$geom$use_defaults(data, self$aes_params, modifiers)

                      └─ggplot2:::f(..., self = self)

                        └─ggplot2:::check_aesthetics(params[aes_params], nrow(data))

Thank you very much for your help!
David

Incorrect usage of break

When installing the package, I see notes about incorrect usage of break:

Note: break used in wrong context: no loop is visible

This occurs in analysis.R and in utils.R. The warning is descriptive in this instance; break is used seemingly in place of stop() here and in unreachable code here (since nothing after the return call will ever be run.

The datasets in 'https://xuranw.github.io/MuSiC/data/' are not available or corrupted

plot_grid

Heatmap for informative and non-informative genes

Hello.

I am trying to use MuSIC and want to plot the heatmap of informative and non-informative genes. Can you guide me on how to extract that matrix?

Thank you,
Kalpit

problem with installation

Dear Xuran,

I first tried to install the package in the suggested way without success:

devtools::install_github('xuranw/MuSiC')
Error in read.dcf(path) :
Found continuation line starting ' plyr, ...' at begin of record.

Next I have managed to manually download the package and install it:
install.packages("C:/Users/nivs/Downloads/MuSiC-master.zip", repos = NULL, type = "win.binary")

following the suggestion from other issues, I have tried to also update the dependencies:
setwd("C:/Rpackages/MuSiC-master/")

devtools::check()
Updating MuSiC documentation
Writing NAMESPACE
Loading MuSiC
Loading required package: nnls
Loading required package: ggplot2
Writing NAMESPACE
-- Building ----------------------------------------------------------- MuSiC --
Setting env vars:

CFLAGS : -Wall -pedantic
CXXFLAGS : -Wall -pedantic
CXX11FLAGS: -Wall -pedantic

√ checking for file 'C:\Rpackages\MuSiC-master/DESCRIPTION' (553ms)

preparing 'MuSiC': (1.1s)
√ checking DESCRIPTION meta-information ...
installing the package to build vignettes (676ms)
E creating vignettes (34.2s)
Quitting from lines 136-156 (vignette.Rmd)
Error: processing vignette 'vignette.Rmd' failed with diagnostics:
Could not find CIBERSORT source code at '\offline.il.cgen.biz/home/nivs/Documents/R-data/bseqsc'. Please ensure you correctly configured bseqsc. See ?bseqsc_config.
Execution halted
Error in processx::run(bin, args = real_cmdargs, stdout_line_callback = real_callback(stdout), :
System command error

it seems that there are some issues with the vignette

finally, I was able to load the package, but ran into another problem:

library(MuSiC)

Download EMTAB single cell dataset from Github

Mousesub.eset = readRDS("C:/Rpackages/MuSiC-master/vignettes/data/Mousesubeset.rds")
Mousesub.basis = music_basis(Mousesub.eset, clusters = 'cellType', samples = 'sampleID', select.ct = c("Endo", "Podo", "PT", "LOH", "DCT", "CD-PC", "CD-IC", "Fib", "Macro", "Neutro","B lymph", "T lymph", "NK"))
Error in sampleNames(x) : could not find function "sampleNames"

Any idea?

Thanks,

Niv

May I obtain your permission ?

Dear xuranw,
I am a new guy in bio-software, I want to make some modification on the source code of your MuSiC about calculation of cell infiltration to get simple function for website. I will cite your article and show your License. Are these ok ? or if I want to modify your code, what do I need ?
I would be greatly appreciated if you could spend some of your time teach me. Thank you.
My email: [email protected]
Best regards.

Can't install package

Hi,

I can't install the package. Error:

 devtools::install_github('xuranw/MuSiC')
Downloading GitHub repo xuranw/MuSiC@master
Skipping 1 packages not available: Biobase
Downloading GitHub repo renozao/xbioc@master
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?

1: All                                       
2: CRAN packages only                        
3: None                                      
4: pkgmaker (0.31.1 -> ac95c24f3...) [GitHub]

Enter one or more numbers, or an empty line to skip updates:
1
pkgmaker (0.31.1 -> ac95c24f3...) [GitHub]
Downloading GitHub repo renozao/pkgmaker@develop
✓  checking for file ‘/private/var/folders/3t/lc9m5zv966934dq4d6gf6l60g_dj6h/T/Rtmp1OVDRI/remotes85b01129e8d2/renozao-pkgmaker-ac95c24/DESCRIPTION’ ...
─  preparing ‘pkgmaker’:
✓  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
   Removed empty directory ‘pkgmaker/vignettes’
─  building ‘pkgmaker_0.31.tar.gz’
   Warning: invalid uid value replaced by that for user 'nobody'
   Warning: invalid gid value replaced by that for user 'nobody'
   
* installing *source* package ‘pkgmaker’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
Setting package specific options: package:pkgmaker:logger (1 default option(s))
Creating meta registry in package 'pkgmaker' ... OK
Creating registry 'extra_handler' in package 'pkgmaker' ... OK
Creating registry 'extra_action' in package 'pkgmaker' ... OK
Registering extra handler 'install.packages' [function] ... OK
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (pkgmaker)
✓  checking for file ‘/private/var/folders/3t/lc9m5zv966934dq4d6gf6l60g_dj6h/T/Rtmp1OVDRI/remotes85b05a137826/renozao-xbioc-b4f512c/DESCRIPTION’ ...
─  preparing ‘xbioc’:
✓  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  looking to see if a ‘data/datalist’ file should be added
─  building ‘xbioc_0.1.18.tar.gz’
   Warning: invalid uid value replaced by that for user 'nobody'
   Warning: invalid gid value replaced by that for user 'nobody'
   
* installing *source* package ‘xbioc’ ...
** using staged installation
** R
** data
** inst
** byte-compile and prepare package for lazy loading
Error: (converted from warning) package ‘S4Vectors’ was built under R version 3.6.3
Execution halted
ERROR: lazy loading failed for package ‘xbioc’
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/xbioc’
Error: Failed to install 'MuSiC' from GitHub:
  Failed to install 'xbioc' from GitHub:
  (converted from warning) installation of package ‘/var/folders/3t/lc9m5zv966934dq4d6gf6l60g_dj6h/T//Rtmp1OVDRI/file85b046b54149/xbioc_0.1.18.tar.gz’ had non-zero exit status

Getting cell type counts from bulk data for differential expression and GSEA

I read through the paper and went through the tutorial and I am having trouble figuring out how to best use the outputs. From the outputs is it possible to get cell type specific counts for each gene for the bulk rna-seq samples? Then take this information to do a differential expression analysis using edgeR and then throw it into GSEA?

requirement on cell number per cell type

Hi,

Thanks for the package. Is there any minimum number of cell required per cell type in the single-cell dataset used?

Best wishes
Nurun

Dependency on xbioc package

It is worth pointing out in the README and vignettes that one must install the xbioc package to be able to install MuSiC, since this package is not on Bioc nor on CRAN.

https://github.com/renozao/xbioc

README File Outdated

This work is not published yet ... bioRxiv.

It is now in Nature Communications!

Could MuSiC be used for microarray data and compared inter/intra samples?

Hi xuran,

Thank you for your excellent work. I was wondering if MuSiC could be used for microarray data or not? And if the results of MuSiC could be compared across samples(inter-sample) and across cell types(intra-sample) or not.

Best
Yi Han

music_prop.cluster

Hi,
I am running the code which are available on Tutorial but somehow I am getting the following error, when I run music_prop.cluster

Est.mouse.bulk = music_prop.cluster(bulk.eset = Mouse.bulk.eset, sc.eset = Mousesub.eset, group.markers = Immune.marker, group = 'clusterType', clusters = 'cellType', samples = 'sampleID', clusters.type = clusters.type)

Error in music_prop.cluster(bulk.eset = Mouse.bulk.eset, sc.eset = Mousesub.eset, :
Cluster number is not matching!

Question on music.iter.ct, music_prop function

First of all, thank you for your wonderful work on data analysis.
While I was skimming through your R codes, I found on function "music.iter.ct" on "analysis.R" file there was some part I found difficult to understand the flow.
Flowing is the part of music.iter.ct function I got puzzled :

common.gene = intersect(names(Y), rownames(D))
common.gene = intersect(common.gene, colnames(Sigma.ct))
if(length(common.gene)< 0.1*min(length(Y), nrow(D), ncol(Sigma.ct))){
stop('Not enough common genes!')
}
Y = Y[match(common.gene, names(Y))];
D = D[match(common.gene, rownames(D)), ]
Sigma.ct = Sigma.ct[, match(common.gene, colnames(Sigma))]

at the final line which updates Sigma.ct to include only common genes, variable Sigma was used, but I could not find any variable named Sigma inside music.iter.ct function else where. I was wondering if I missed something or is just a typo.

+) found similar issue on music_prop function
following are code with issues :

m.sc = match(cm.gene, rownames(sc.basis$Disgn.mtx)); m.bulk = match(cm.gene, bulk.gene)
D1 = sc.basis$Disgn.mtx[m.sc, ]; M.S = colMeans(sc.basis$S, na.rm = T);
Yjg = relative.ab(exprs(bulk.eset)[m.bulk, ]); N.bulk = ncol(bulk.eset);
if(ct.cov){
Sigma.ct = sc.basis$Sigma.ct[, m.sc];
if(sum(Yjg[, i] == 0) > 0){
  D1.temp = D1[Yjg[, i]!=0, ];
  Yjg.temp = Yjg[Yjg[, i]!=0, i];
  Sigma.ct.temp = Sigma.ct[, Yjg[,i]!=0];
  if(verbose) message(paste(colnames(Yjg)[i], 'has common genes', sum(Yjg[, i] != 0), '...') )
}else{
  D1.temp = D1;
  Yjg.temp = Yjg[, i];
  Sigma.ct.temp = Sigma.ct;
  if(verbose) message(paste(colnames(Yjg)[i], 'has common genes', sum(Yjg[, i] != 0), '...'))
}

unfortunately, there are no predefined variable "i" in scope.
However, "parallel" code for else clause(case for ct.cov=FALSE) contains for-loop using "i" variable. I believe there should be some missing for-loop declaration. Thank you.

bugs with weight of cross cell type co-variance

the code in function music_prop:

  if(ct.cov){
    Sigma.ct = sc.basis$Sigma.ct[, m.sc];

    if(sum(Yjg[, i] == 0) > 0){
      D1.temp = D1[Yjg[, i]!=0, ];
      Yjg.temp = Yjg[Yjg[, i]!=0, i];
      Sigma.ct.temp = Sigma.ct[, Yjg[,i]!=0];
      ...
      ...

Is there miss for loop? like for(i in 1:N.bulk){}

Where is the turoria

Hey,
In your nature article about the Music Package, its stated that there is turorial on how to use this package. I cant seem to find it anywhere, could add a link? Much appreciated.

Something strange with weight of cross cell type co-variance

Error in pVar(x, clusters) : could not find function "pVar"

Hi,
when I executive command Est.prop.Xin <- music_prop(bulk.eset = XinT2D.construct.full$Bulk.counts, sc.eset = EMTAB.eset, clusters = 'cellType', samples = 'sampleID', select.ct = c('alpha', 'beta', 'delta', 'gamma'))
a error happened : Error in pVar(x, clusters) : could not find function "pVar"

Is there a lack of dependency ? Can u tell me how to fix the error ?

How to build custom single cell dataset

Hi,

I find this method really cool and promising but I am having issues trying to implement it to my data.

Can you provide a vignette (or section of one) describing how to go from expression matrix to the necessary input file for MuSiC? Or perhaps you can construct the single cell reference files from the Tabula muris or MCA datasets?

Too few common genes

Hi there,

I'm very interested in using this exciting deconvolution tool for my bulk RNA-seq data, but keep running into the error message from music_prop of too few common genes from three different single cell datasets.

I tried pre-processing each dataset by including only genes found in both datasets using the merge function in R, yet I still receive this message. I don't understand how this can be the case if their gene lists are exactly the same.

Thanks so much in advance.

Prop_comp_multi : Aesthetics must be either length 1 or the same as the data (9): label

Dear xuranw:
Thanks for your great package! I'm trying to learn the Tutorial of MuSiC . when I run Prop_comp_multi and Abs_diff_multi commands, I was getting the following error:

Prop_comp_multi(prop.real = data.matrix(XinT2D.construct.full$prop.real),
prop.est = list(data.matrix(Est.prop.Xin$Est.prop.weighted),
data.matrix(Est.prop.Xin$Est.prop.allgene)),
method.name = c('MuSiC', 'NNLS'),
title = 'Heatmap of Real and Est. Prop' )
Error: Aesthetics must be either length 1 or the same as the data (9): label

Abs_diff_multi(prop.real = data.matrix(XinT2D.construct.full$prop.real),
prop.est = list(data.matrix(Est.prop.Xin$Est.prop.weighted),
data.matrix(Est.prop.Xin$Est.prop.allgene)),
method.name = c('MuSiC', 'NNLS'),
title = 'Abs.Diff between Real and Est. Prop' )
Error: Aesthetics must be either length 1 or the same as the data (4): label

Could you give me some advise?
thanks!

Getting top 100 transcripts by weight for determining cell proportions?

Hi Xuran,
Thank you very much for the MuSiC package you have provided, I have been working on it and have another question regarding finding the most influential genes in determining cell type proportions. Can we use the Weight.gene matrix and sum across rows (transcripts) to get the transcripts with the highest aggregate weight, and consider them to be the most influential transcripts in determining cell type proportions? We would like to compare the MuSiC method with another method that other members of our group has used (a version of NNLS) and compare which transcripts were selected as being those that were differentially expressed in different immune cell types.
Any help is appreciated.
Thanks, T.J.

Dose bulk-RNA data batch matter?

If my bulk-RNA-seq data obtain from different time, does this matter?

Confusing steps in tutorial of recursive MuSiC algorithm?

Hello Xuran,
Thank you very much for creating this package, we have found it helpful and are still trying to figure out some issues. I have been trying to follow the recursive algorithm explanation:
[https://xuranw.github.io/MuSiC/articles/MuSiC.html#estimation-of-cell-type-proportions-with-pre-grouping-of-cell-types]

I have been having trouble with the IEmarkers object, when I downloaded the IEmarkers.RData object, I found an "Immune.marker" and "Epith.marker" object. I assumed I needed to turn them into a list, which I did with IEmarkers = list(C3 = Epith.marker, C4 = Immune.marker)
However I get an error:

Error in music_prop.cluster(bulk.eset = mouse.bulk, sc.eset = mouse.sc, : Cluster number is not matching!

when typing in the command you provided. I also see that in the tutorial, the command:
Est.mouse.bulk = music_prop.cluster(bulk.eset = Mouse.bulk.eset, sc.eset = Mousesub.eset, group.markers = IEmarkers, clusters = 'cellType', group = 'clusterType', samples = 'sampleID', clusters.type = clusters.type) uses 'group', but the options from the Documentation have the name of the argument as 'groups', is that the one we should be using? Any help is appreciated.

Thanks, T.J.

QC steps and also an unassigned variable in tutorial code

Hi Xu Ran
Many thanks for developing MuSiC! I really like the idea of giving each gene a weight instead of establishing an overall cutoff w signature matrix.

I am currently quite new to learning deconvolution techniques and R as well and I hope you can help my understanding on it.
1-for the QC of scRNAseq data/ i was wondering how does MuSiC filter off the doublets/multiplets/dying cells and outliers that might give falsely high gene expression.

2-I read over your code the part where it says..

> GSE50244.EMTAB.prop
I couldnt find the assignment of this variable. I'm guessing the GSE50244 set was concatenated with
EMTAB set but it is not clear to me what are the intermediate steps to it.

Thank you

Problems with sample identification

Hi. I have encountered this problem in my code. what is going on?

tmp <- music_prop(bulk.eset = rd, sc.eset = ad, clusters = 'cellType',
+            samples = 'sampleID', verbose = F)
Error in music_prop(bulk.eset = rd, sc.eset = ad, clusters = "cellType",  : 
  Too few common genes!

Error in colMeans(S, na.rm = TRUE) : 'x' must be an array of at least two dimensions

Hi,

I wonder what sampleID is in your analysis. Does it refer to the batches? In my sc data, I only have one batch (replica). So, I set the sampleID to 1 for all cells. However, I got the following error:
Error in colMeans(S, na.rm = TRUE) : 'x' must be an array of at least two dimensions

If I set sampleID=cell_ID, then the deconvolution works but I am not sure if this is right.

Thanks for your help

Error: Not enough cell types! when do the cell type proportion estimates

Hi,

I have a question on cell type proportion estimates. I have a single cell RNA-seq reference data like the below toy example,
Genes CD4 Mono Ery
ACE 49 1 0
ALG9 401 74 234
ANKRD18A 332 69 0
AQP1 14 0 8342
CELF6 40 17 0
CFB 206 100 14

When I do music_prop(bulk.eset = bulk.est, sc.eset = scRNA.est, clusters = 'cellType', samples = 'sampleID', select.ct = c('CD4', 'Mono','Ery'), verbose = F). It throws out an error as

Error in music_prop(bulk.eset = bulk.est, sc.eset = scRNA.est, clusters = "cellType", :
Not enough valid cell type!

In order to help you understand, I attach the info of scRNA.est below,

I am wondering that whether MuSiC needs at least two replicates for inferring the cell type proportion.

Thanks a lot!
Elaine

music_prop.cluster:Error in if (sum(abs(p.weight.new - p.weight)) < eps) { : missing value where TRUE/FALSE needed

Hi Xuran,

Thank you for writing this package! I was trying to run the Tutorial of MuSiC, when I run the music_prop.cluster function, I get the following error:

Est.mouse.bulk = music_prop.cluster(bulk.eset = Mouse.bulk.eset, sc.eset = Mousesub.eset,group.markers = IEmarkers, clusters =
'cellType',group = 'clusterType', samples = 'sampleID',clusters.type = clusters.type)

Error in if (sum(abs(p.weight.new - p.weight)) < eps) { :
missing value where TRUE/FALSE needed

would you give me some suggestions?
Thank you!

can't compile MuSiC

Hi,

Installation failed with a following message;

Downloading GitHub repo xuranw/MuSiC@master Skipping 2 packages not available: Biobase, bseqsc Installing 2 packages: bseqsc, nnls Installing packages into ‘/Volumes/Documents/Users/akihoji/Library/R/3.x/library’ (as ‘lib’ is unspecified) Error: (converted from warning) package ‘bseqsc’ is not available (for R version 3.5.1)

A culprit of this issue is one of the dependency, bseqsc. I tried to install it manually by

install_github('hutuqiu/BSeQC')

but I get a following error;

Error: HTTP error 404.

Any workaround for this ?

music_prop.cluster: Error in nnls(D.weight, Y.weight) NA/NaN/Inf...

Hi Xuran,

Thank you for writing this package! I was trying to run music_prop.cluster on some single cell data, as per the tutorial and using the bulk.construct function as well. I've run the same data on music_prop and get a proper result. Unfortunately, when I go to run it after identifying clusters and getting a list of the variably expressed genes for each group, I end up with the error message: Error in nnls(D.weight, Y.weight): NA/NaN/Inf in foreign function call (arg1):

Would you have any suggestions as to how to get around this?

Thank you!
Orion

May I ask some puzzled questions ?

dear Dr.xuranw,

I have studied your paper recently and I think It's a great tool to estimate cell proportions.But some questions puzzled me，I wonder if I can ask you?
1.When starts from scRNA-seq data from multiple subjects,the immune cells(eg.B cell,CD4+ cells etc.)from peripheral blood mononuclear cells are the same to tumour-infiltrating immune cells?
2.As you show in the overview of MuSiC framework,calculates both cross-subject mean and cross-subject variance for these genes in each cell type，why select the informative gene only use cross-subject variance ?Can only use these low cross-subject variance genes identifies different cell types?

Best wishes,
huitingxiao

Unable to build vignette

I tried to run the vignette, but ran into a few issues.

The RDS files are linked in the live vignette (on github.io) to files on the github.io site; this works but requires a tiny workaround rather than simply the bare URL:

readRDSFromWeb <- function(ref) {
  readRDS(gzcon(url(ref)))
}

You can do a similar trick with RData files.

Building the vignette with devtools::build_vignettes() also did not work. If you add %\VignetteEngine{knitr::rmarkdown} to the header of the vignette, it will work fine. I also found that the R chunks were not run, merely displayed verbatim. Using ```{r} rather than ``` r fixed this for me.

An unusual issue relates to the inclusion of CIBERSORT in the bseqsc package. My suggestion would be to compute the bseqsc results and save them inside the package as data, since CIBERSORT has an (in my view) ridiculously restrictive policy. Then, you could attempt to run the bseqsc code in the vignette, and fall back to using pre-computed values if this fails. Currently I will have to wait 3-5 days to (hopefully) gain access to CIBERSORT.

There was also a minor typo in the vignette.

I have fixed all of these in my PR with the exception of the CIBERSORT issue.

Error in colMeans(S, na.rm = TRUE): 'x' must be an array of at least two dimensions

I'm trying to the run the music_prop command and have been getting the following error:
Error in colMeans(S, na.rm = TRUE): 'x' must be an array of at least two dimensions
Traceback:

music_prop(bulk.eset = nabecBulk, sc.eset = nabecSet, clusters = "Celltype",
. samples = "sample", verbose = F)
music_basis(sc.eset, non.zero = TRUE, markers = sc.markers, clusters = clusters,
. samples = samples, select.ct = select.ct, cell_size = cell_size,
. ct.cov = ct.cov, verbose = verbose)
colMeans(S, na.rm = TRUE)
stop("'x' must be an array of at least two dimensions")

Here is the full write out of the command for reference:
Est.prop.nabec = music_prop(bulk.eset = nabecBulk, sc.eset = nabecSet, clusters = 'Celltype',
samples = 'sample', select.ct = c('Neuron', 'Oligodendrocyte', 'OPC',
'Astrocyte','Microglia'), verbose = F)

and the expression set of the single cell data:
ExpressionSet (storageMode: lockedEnvironment)
assayData: 27009 features, 3000 samples
element names: exprs
protocolData: none
phenoData
sampleNames: AAACCCAAGACAACAT-1 AAACCCAAGATGAATC-1 ...
ATCAGGTGTTTAGAGA-1 (3000 total)
varLabels: Celltype cluster sample
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

and bulk data:
ExpressionSet (storageMode: lockedEnvironment)
assayData: 59032 features, 311 samples
element names: exprs
protocolData: none
phenoData: none
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

Any idea what might be causing the problem?

Tag a release?

Can you tag a release? That makes it a bit easier to keep the bioconda version of this updated.

Loading Test Files and Number of Single Cell

About the CIBERSORT used in your paper

Hi, Xuranw
thanks for your great package.
I just have a question in your paper. Since you have compared CIBERSORT with MuSiC in your paper, what is the signature for the CIBERSORT you used?
Looking forward to your response
Thanks
Fei

how to select group.marker

Hi Xuran,

In the tutorial section "Estimation of cell type proportions with pre-grouping of cell types", I wonder how you selected genes for group.marker. It did mention 'intra-cluster differentially expressed genes', but I wonder if you have recommended procedure to identify these DEGs.

Thanks,
Yuping

group.marker selection criteria

Hello Xuran! Thanks for your great package!

I've been able to run MuSiC cell type estimation analysis on my data of interes (brain). However, several cell types are very transcriptionally closely related to one another, yet with significative functional distinction. Because of this, I want to run music_prop.clusters on my data, in order to obtain more reliable results.

However, as noted by #15 , how to select for differentially expressed genes among these groups using the output of music_basis is not explained neither in the vignettes nor in the paper itself. So how do one properly builds its own group.marker list from music_basis output?

From your experience as the package creator, what cutoff should be used to select genes from the design matrix, as an example?

How to perform deconvolution with RPKM values

Hi Xuran,

I tried to apply MuSiC to RNAseq bulk data with RPKM as the input. According to your paper(Discussion), "MuSiC can utilize RPKM if estimates of cell type-specific total RNA abundance can be provided." I am wondering how I can incorporate cell-type-specific total RNA abundance into your function? Or can I directly use RPKM as the input and use music_prop to do the deconvolution? Does my reference single-cell RNAseq data also requires RPKM as the input?
Thanks!

Best,
Ming

object 'GSE50244.EMTAB.prop' not found

Hello,

I am running the tutorial of MuSiC to check that everything runs properly before moving to my own data, but while running the function:
m.prop.GSE50244 = rbind(melt(GSE50244.EMTAB.prop$Est.prop.weighted),
melt(GSE50244.EMTAB.prop$Est.prop.allgene), melt(Est.prop.bseq),
melt(data.matrix(Est.prop.cibersort)))

R print the error:
Error in melt(GSE50244.EMTAB.prop$Est.prop.weighted) :
object 'GSE50244.EMTAB.prop' not found

I have try to figure it out on my own, for example I found that the link in:
"load(gzcon(url('https://xuranw.github.io/MuSiC/data/GSE50244CIBERSORT.RData')))"
It is not right (Error 404) and changed it for the one I think could be the right one: 'https://github.com/xuranw/MuSiC/tree/master/vignettes/data/GSE50244CIBERSORT.RData'

I have been looking around trying to find the object 'GSE50244.EMTAB.prop' but I haven't been able.

Thanks for you help,
David

unable to load

Hi Xuran
I was unable to load the package, with this error after trying to install from github

devtools::install_github('xuranw/MuSiC')
Error in read.dcf(path) :
Found continuation line starting ' plyr, ...' at begin of record.

Cell types not found

Hello,

Firstly, thanks for this analysis tool. So far I find it pretty intuitive to use & helpful.

I stumbled across something I thought a bit odd, was hoping you might be able to help me.

I used a mixture of 4 to 5 sc-RNAseq datasets with cell-types 'T-cell', 'Fibroblast', 'Macrophage', 'Endothelial', 'CAF' & 'Epithelial' and the transcriptomes are pretty similar across the datasets.
I make the ExpressionSet object for my sc-RNAseq datasets and 2 bulk tissue RNA datasets, one of them I got using the TCGAbiolinks package on R (I mention is because its the odd one).

I use the following
Est.1 <- music_prop(bulk.eset = bt_data, sc.eset = sc_data, clusters = 'Cell-type',samples = "SampleID", verbose = T)

and everything has proportions as expected. (1st plot)

jitter_estproportions_wShih.pdf

But I try with the second bt_data set, and I lose all of my T-cells?
Est.2 <- music_prop(bulk.eset = bt_data_2nd, sc.eset = sc_data, clusters = 'Cell-type',samples = "SampleID", verbose = T)

jitter_noTcells.pdf

I checked the bt_data_2nd matrix and there are definitely T-cell markers present. If I remove one of the datasets from the sc_data and rerun
Est.3 <-music_prop(bulk.est = bt_data_2nd, sc.est = sc_data.minus1, cluster = 'Cell-type', samples = "SampleID", verbose = T)

The NNLS seems to find T-cells, but not MuSiC.
jitter_NNLS_tcells.pdf

My sc-RNAseq datasets are usually processed as Seurat objects, so I pulled T-cell markers across all sc-RNAseq datasets and they're definitely in the bt_data_2nd (TCGA bulk RNAseq dataframe). So I don't understand why I am getting flat zeroes for T-cells.

bt_data (the one that had all cell-types afte deconvolution)
ExpressionSet (storageMode: lockedEnvironment)
assayData: 13104 features, 548 samples
element names: exprs
protocolData: none
phenoData
sampleNames: TCGA.20.0987 TCGA.23.1031 ... TCGA.13.1819 (548 total)
varLabels: EPCAM PTPRC ... VWF (7 total)
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

bt_data_2nd (the one that has no T-cells, apparently)
ExpressionSet (storageMode: lockedEnvironment)
assayData: 56537 features, 229 samples
element names: exprs
protocolData: none
phenoData
sampleNames: TCGA-04-1331-01A-01R-1569-13 TCGA-04-1332-01A-01R-1564-13 ...
TCGA-WR-A838-01A-12R-A406-31 (229 total)
varLabels: Sample.ID Definition ... sampleNames (5 total)
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

my sc-RNAseq datasets
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22390 features, 38789 samples
element names: exprs
protocolData: none
phenoData
sampleNames: E27_Peri_AAACCCAAGACGCCAA E27_Peri_AAACCCAAGAGTCAGC ...
Shih_ctcaatgtcggcaccttc (38789 total)
varLabels: Cell-type SampleID
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

Just to show that my second bulk-RNAseq dataset does indeed include T-cell markers.
I intersected the marker genes from all cell-types using seurat across all my sc-RNAseq datasets. leaving me with vectors containing marker genes for each cell-type that overlap across all the sc-RNAseq datasets.

a quick glance shows these genes are present and have expression values. (T-cells markers)
TCGA-61-1724-01A-01R-1568-13 TCGA-61-1736-01B-01R-1568-13
IL32 3266 14180
PTPRC 465 1027
NKG7 141 365
HCST 583 464
TCGA-61-1738-01A-01R-1567-13 TCGA-61-1741-01A-02R-1567-13
IL32 2443 18459
PTPRC 755 1139
NKG7 180 317
HCST 784 248
TCGA-61-1918-01A-01R-1568-13 TCGA-61-1919-01A-01R-1568-13
IL32 3651 14568
PTPRC 1164 4614
NKG7 76 2053
HCST 187 294
TCGA-61-2101-01A-01R-1568-13 TCGA-61-2102-01A-01R-1568-13
IL32 4881 6006
PTPRC 2574 1441
NKG7 870 362
HCST 482 291
TCGA-61-2109-01A-01R-1568-13 TCGA-61-2110-01A-01R-1568-13
IL32 5623 2866
PTPRC 1422 725
NKG7 580 717
HCST 354 769
TCGA-61-2113-01A-01R-1568-13 TCGA-VG-A8LO-01A-11R-A406-31
IL32 572 2747
PTPRC 147 320
NKG7 176 497
HCST 124 489

Much the same for the marker genes of the other cell types.

I'd appreciate any help or suggestions as to why I might be getting these results.

sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.5 (Nitrogen)

Matrix products: default
BLAS: /gpfs/igmmfs01/software/pkg/el7/apps/R/3.6.0/lib64/R/lib/libRblas.so
LAPACK: /gpfs/igmmfs01/software/pkg/el7/apps/R/3.6.0/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base

other attached packages:
[1] reshape2_1.4.3 xbioc_0.1.17 AnnotationDbi_1.46.1 IRanges_2.18.2
[5] S4Vectors_0.22.1 Seurat_3.1.0 MuSiC_0.1.1 ggplot2_3.2.1
[9] nnls_1.4 Biobase_2.44.0 BiocGenerics_0.30.0

loaded via a namespace (and not attached):
[1] Rtsne_0.15 colorspace_1.4-1 ggridges_0.5.1 rstudioapi_0.10
[5] leiden_0.3.1 listenv_0.7.0 npsurv_0.4-0 MatrixModels_0.4-1
[9] bit64_0.9-7 ggrepel_0.8.1 codetools_0.2-16 splines_3.6.0
[13] R.methodsS3_1.7.1 lsei_1.2-0 zeallot_0.1.0 jsonlite_1.6
[17] mcmc_0.9-6 ica_1.0-2 cluster_2.1.0 png_0.1-7
[21] R.oo_1.22.0 uwot_0.1.3 sctransform_0.2.0 BiocManager_1.30.4
[25] compiler_3.6.0 httr_1.4.1 backports_1.1.4 assertthat_0.2.1
[29] Matrix_1.2-17 lazyeval_0.2.2 htmltools_0.3.6 quantreg_5.51
[33] tools_3.6.0 rsvd_1.0.2 igraph_1.2.4.1 coda_0.19-3
[37] gtable_0.3.0 glue_1.3.1 RANN_2.6.1 dplyr_0.8.3
[41] Rcpp_1.0.2 vctrs_0.2.0 gdata_2.18.0 ape_5.3
[45] nlme_3.1-140 gbRd_0.4-11 lmtest_0.9-37 stringr_1.4.0
[49] globals_0.12.4 lifecycle_0.1.0 irlba_2.3.3 gtools_3.8.1
[53] future_1.14.0 MASS_7.3-51.4 zoo_1.8-6 scales_1.0.0
[57] SparseM_1.77 RColorBrewer_1.1-2 yaml_2.2.0 memoise_1.1.0
[61] reticulate_1.13 pbapply_1.4-1 gridExtra_2.3 pkgmaker_0.28
[65] stringi_1.4.3 RSQLite_2.1.2 caTools_1.17.1.2 bibtex_0.4.2
[69] Rdpack_0.11-0 SDMTools_1.1-221.1 rlang_0.4.0 pkgconfig_2.0.2
[73] bitops_1.0-6 lattice_0.20-38 ROCR_1.0-7 purrr_0.3.2
[77] labeling_0.3 htmlwidgets_1.3 bit_1.1-14 cowplot_1.0.0
[81] tidyselect_0.2.5 RcppAnnoy_0.0.12 plyr_1.8.4 magrittr_1.5
[85] R6_2.4.0 gplots_3.0.1.1 DBI_1.0.0 pillar_1.4.2
[89] withr_2.1.2 fitdistrplus_1.0-14 survival_2.44-1.1 tibble_2.1.3
[93] future.apply_1.3.0 tsne_0.1-3 crayon_1.3.4 KernSmooth_2.23-15
[97] plotly_4.9.0 grid_3.6.0 data.table_1.12.2 blob_1.2.0
[101] metap_1.1 digest_0.6.20 xtable_1.8-4 tidyr_1.0.0
[105] MCMCpack_1.4-4 R.utils_2.9.0 RcppParallel_4.4.3 munsell_0.5.0
[109] registry_0.5-1 viridisLite_0.3.0

Error in music_prop(bulk.eset = GSE50244.bulk.eset, sc.eset = EMTAB.eset, ct.cov = TRUE, : object 'i' not found

Thank you for writing this great package. One small issue: I tried running the example in the tutorial, but with ct.cov = TRUE. I get the error object 'i' not found. I believe the error comes from line 163 in utils.R, where the for loop defining i is missing. See line 201 in the same file for comparison.

xuranw / music Goto Github PK

music's Introduction

MuSiC

Update (01/15/2024)

1. Integration with R Devcontainer and Docker for Codespace Environments:

2. Enhanced Code Documentation:

Update (09/26/2022)

Update (09/26/2022)

How to cite MuSiC

Installation

Special Steps for GitHub Codespaces

What is Codespaces?

How to Use Codespaces

1. Fork the Repository

2. Create a new Codespace

3. Using the Codespace

4. Open RStudio

5. Test with the test.R File

More Information

music's People

Stargazers

Watchers

Forkers

music's Issues

Download EMTAB single cell dataset from Github

Recommend Projects

Recommend Topics

Recommend Org

How to cite `MuSiC`

5. Test with the `test.R` File