jinmiaochenlab / batch-effect-removal-benchmarking Goto Github PK

View Code? Open in Web Editor NEW

70.0 70.0 45.0 49.3 MB

A benchmark of batch-effect correction methods for single-cell RNA sequencing data

R 21.31% Jupyter Notebook 78.54% Python 0.15%

batch-effect-removal-benchmarking's People

Contributors

Stargazers

Watchers

Forkers

anu-bioinfo koksiong-ang madhu9 marionchvr johnathanlo feigeliudan01 gangeszs lacpromoter jhu99 m0hammadl xchromosome219 xubeisi hzaurzli jykr zhaojk jioffe502 lu77777777 xihuyan mengchengyao madhulika-ebi mthimma jeremytse mengqingren raman91 victory-lrj sudolin aprilyuge clearbreezes gd-wong leslielin7 liangdp1984 iamzhangxiaoyu sponge90 upupcj lyx-lin hmu-limaohao ki02dlqslek nyemperor muhammadsaeedbatikh zhanggao1793 siriusstarzx bramadi chengzhouwu aharrar

batch-effect-removal-benchmarking's Issues

problem with lfs

when i am trying to git lfs fetch/pull he increase me this error,
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking.git/info/lfs'

may i get a link to google drive with the datasets

Some questions related to dimension choosing for seurat v3

call_seurat3 <- function(batch_list, batch_label, celltype_label, npcs = 20,
plotout_dir = "", saveout_dir = "",
outfilename_prefix = "",
visualize = T, save_obj = T)

Sorry to disturb you, but I wonder why I need to set npcs as 20 in this step, to run Seurat v3, because the default value of this step is 30. Thanks.

Some questions about pbmc 5' dataset in the paper

Hello, I think in original benchmark paper, I cannot find the original paper for human pbmc 5' dataset (I know 3' part coming from Zheng et al.). Where can I find the source for 5' dataset? Thanks.

Question about choosing metrics to evaluate BBKNN

Sorry to disturb you again. I notice that in this paper you do not use any metric to evaluate the effect of BBKNN, I guess this is because bbknn cannot really modify the original count matrix. However, it can affect the result after UMAP dimension reduction. Therefore, could I use the LISI rate and kBET rate to evaluate this method? Thanks a lot.

Bulk RNA seq recommendation

What do you recommend to try out for "big data" but for RNA seq instead of single cell?

Problems about your evaluation for Scanorama on recovery of DEGs

In your script (https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking/blob/master/Script/simulation/02_run/run_scanorama.ipynb)

you used corrected_adata.var_names = adata.var_names to update the gene names in the "corrected_adata" object which saved the integration results from Scanorama, "adata" is an object before input to Scanorama.

However, after reading the source code of Scanorama (https://github.com/brianhie/scanorama/blob/master/scanorama/scanorama.py, Line 316, function merge_datasets), I found that Scanorama will sort the gene names input to it, which means:

Given your input gene names adata.var_names=('Gene1', 'Gene2', …, 'Gene5000') and data matrix adata.X=[x1, x2, …, x5000], Scanorama will reorganize the gene names and data matrix, which are corrected_adata.var_names=('Gene1', 'Gene10', 'Gene100', …, 'Gene999') and corrected_adata.X = [x1, x10, x100,…,x999]. And the returned gene names and data matrix are in the altered order.

Thus, if running your code corrected_adata.var_names = adata.var_names, you will get:

corrected_adata.var_names=('Gene1', 'Gene2', …, 'Gene5000')
corrected_adata.X = [x1, x10, x100,…,x999]

Obviously, the gene names are mismatched with the data. Then, your following evaluation for differential expressed genes will be completely wrong. After correcting this bug, I found that Scanorama achieved the state-of-the-art performance on DEGs recovery.

Some questions about dataset8, the h5ad file

Sorry to disturb you, since python cannot read rds file, so I cannot generate h5ad file using colab. I cannot load this dataset using my own laptop otherwise my computer will crash down. Therefore, could you please give me some suggestions about how to get the h5ad file mentioned in your script? Or is there any link for me to download this file? Thanks

How could I use .loom file in Liger method?

I meet some problems when I intend to load data in loom file to the pipeline you provide in this code, could you please give me some suggestions here?

In addition, when you use IMAP to visulaize your result, is the performance is same for every time you run a same dataset? Or in fact there will be some slight change in this part? Thanks.

Currently I am trying to run the same methods you mentioned in this paper in other platform(eg. harmony in python) but I get different clisi/asw/ilisi rate. Is it reasonable? Thanks.

Issue accessing datasets through git LFS

Hello,

Thank you for the amazing work on providing the benchmarking scripts and datasets!

We are currently experiencing some issues accessing the datasets from LFS, please see attached error message:

fetch: Fetching reference refs/heads/master
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking.git/info/lfs'

Thank you for your kind help!

Cheers,
Chloe

Some question about the batch information in dataset1

Hello,

I want to use the dataset in your benchmarking paper to evaluate an algorithm. I notice that you seperate the DCs cell into two batches accordding to the plate ID in dataset 1. ( P7 P8 P9 P10 as the batch 1, and P3 P4 P13 P14 as the batch 2)
I wonder why the batch group was defined like this.

Thanks a lot.

true up and down genes missing from simulated data

Hello,

I was hoping to use your simulated data, but wanted to also have a look at what the true up and down regulated genes are. I saw there was a file created by your splatter script, but the files are missing from the simulation data directories.

Thanks!

A question about input file of kBET

Hello :
Thanks for your wonderful job!
I have a question about the input file of kBET algorithm.
I noticed that the input file of kBET is the PCA embedding matrix of intergrated object , instead of the cell_feature matrix.
So, I tested the following 3 input files.

cell_feature matrix of integrated data
seurat_V3_直接用细胞.png.pdf
PCA embedding matrix of intergrated data .
seurat_V3_intergrated_PCA.pdf
PCA embedding matrix of Raw data
serat_v3_sct.pdf

It looks better to use PCA embedding as the input file.
Why is this?

Dataset 4 missing

Hi,

Thanks for the extremely useful benchmark! I'm trying to reproduce some of the results, and found dataset4 files has git lfs pointers instead of files.
I tried to install git lfs and fetch the file, but the error message says

fetch: Fetching reference refs/heads/master
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/jykr/Batch-effect-removal-benchmarking.git/info/lfs'

Can you try uploading the data again? Thanks a lot!

How to record memory usage

Hello,

I was hoping to evaluate the performance of my algorithm, but confused on how to record memory usage. I didn't find a description of the tool for recording memory usage in your article. Could you please tell me what tools are?

Thanks!

no call_harmony function

Hi,

There is only call_harmony_2 function in call_harmony.R
when I change call_harmony to call_harmony_2 in run_harmony_01.R, it has error below:

b_seurat <- RunHarmony(object = b_seurat, batch_label, theta = theta_harmony, plot_convergence = TRUE, 
                  nclust = numclust, max.iter.cluster = max_iter_cluster)
Error in UseMethod("RunHarmony") : 
no applicable method for 'RunHarmony' applied to an object of class "seurat"

jinmiaochenlab / batch-effect-removal-benchmarking Goto Github PK

batch-effect-removal-benchmarking's People

Contributors

Stargazers

Watchers

Forkers

batch-effect-removal-benchmarking's Issues

Recommend Projects

Recommend Topics

Recommend Org