Code Monkey home page Code Monkey logo

batch-effect-removal-benchmarking's People

Contributors

git-xiaomeng avatar jinmiaochen avatar koksiong-ang avatar lys-nicole avatar marionchvr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

batch-effect-removal-benchmarking's Issues

Some questions related to dimension choosing for seurat v3

call_seurat3 <- function(batch_list, batch_label, celltype_label, npcs = 20,
plotout_dir = "", saveout_dir = "",
outfilename_prefix = "",
visualize = T, save_obj = T)

Sorry to disturb you, but I wonder why I need to set npcs as 20 in this step, to run Seurat v3, because the default value of this step is 30. Thanks.

Some questions about pbmc 5' dataset in the paper

Hello, I think in original benchmark paper, I cannot find the original paper for human pbmc 5' dataset (I know 3' part coming from Zheng et al.). Where can I find the source for 5' dataset? Thanks.

Question about choosing metrics to evaluate BBKNN

Sorry to disturb you again. I notice that in this paper you do not use any metric to evaluate the effect of BBKNN, I guess this is because bbknn cannot really modify the original count matrix. However, it can affect the result after UMAP dimension reduction. Therefore, could I use the LISI rate and kBET rate to evaluate this method? Thanks a lot.

Problems about your evaluation for Scanorama on recovery of DEGs

In your script (https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking/blob/master/Script/simulation/02_run/run_scanorama.ipynb)

you used corrected_adata.var_names = adata.var_names to update the gene names in the "corrected_adata" object which saved the integration results from Scanorama, "adata" is an object before input to Scanorama.

However, after reading the source code of Scanorama (https://github.com/brianhie/scanorama/blob/master/scanorama/scanorama.py, Line 316, function merge_datasets), I found that Scanorama will sort the gene names input to it, which means:

Given your input gene names adata.var_names=('Gene1', 'Gene2', …, 'Gene5000') and data matrix adata.X=[x1, x2, …, x5000], Scanorama will reorganize the gene names and data matrix, which are corrected_adata.var_names=('Gene1', 'Gene10', 'Gene100', …, 'Gene999') and corrected_adata.X = [x1, x10, x100,…,x999]. And the returned gene names and data matrix are in the altered order.

Thus, if running your code corrected_adata.var_names = adata.var_names, you will get:

  1. corrected_adata.var_names=('Gene1', 'Gene2', …, 'Gene5000')​
  2. corrected_adata.X = [x1, x10, x100,…,x999]

Obviously, the gene names are mismatched with the data. Then, your following evaluation for differential expressed genes will be completely wrong. After correcting this bug, I found that Scanorama achieved the state-of-the-art performance on DEGs recovery.

Some questions about dataset8, the h5ad file

Sorry to disturb you, since python cannot read rds file, so I cannot generate h5ad file using colab. I cannot load this dataset using my own laptop otherwise my computer will crash down. Therefore, could you please give me some suggestions about how to get the h5ad file mentioned in your script? Or is there any link for me to download this file? Thanks

How could I use .loom file in Liger method?

I meet some problems when I intend to load data in loom file to the pipeline you provide in this code, could you please give me some suggestions here?

In addition, when you use IMAP to visulaize your result, is the performance is same for every time you run a same dataset? Or in fact there will be some slight change in this part? Thanks.

Currently I am trying to run the same methods you mentioned in this paper in other platform(eg. harmony in python) but I get different clisi/asw/ilisi rate. Is it reasonable? Thanks.

Issue accessing datasets through git LFS

Hello,

Thank you for the amazing work on providing the benchmarking scripts and datasets!

We are currently experiencing some issues accessing the datasets from LFS, please see attached error message:

fetch: Fetching reference refs/heads/master
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking.git/info/lfs'

Thank you for your kind help!

Cheers,
Chloe

Some question about the batch information in dataset1

Hello,

I want to use the dataset in your benchmarking paper to evaluate an algorithm. I notice that you seperate the DCs cell into two batches accordding to the plate ID in dataset 1. ( P7 P8 P9 P10 as the batch 1, and P3 P4 P13 P14 as the batch 2)
I wonder why the batch group was defined like this.

Thanks a lot.

true up and down genes missing from simulated data

Hello,

I was hoping to use your simulated data, but wanted to also have a look at what the true up and down regulated genes are. I saw there was a file created by your splatter script, but the files are missing from the simulation data directories.

Thanks!

A question about input file of kBET

Hello :
Thanks for your wonderful job!
I have a question about the input file of kBET algorithm.
I noticed that the input file of kBET is the PCA embedding matrix of intergrated object , instead of the cell_feature matrix.
So, I tested the following 3 input files.

  1. cell_feature matrix of integrated data
    seurat_V3_直接用细胞.png.pdf

  2. PCA embedding matrix of intergrated data .
    seurat_V3_intergrated_PCA.pdf

  3. PCA embedding matrix of Raw data
    serat_v3_sct.pdf

It looks better to use PCA embedding as the input file.
Why is this?

Dataset 4 missing

Hi,

Thanks for the extremely useful benchmark! I'm trying to reproduce some of the results, and found dataset4 files has git lfs pointers instead of files.
I tried to install git lfs and fetch the file, but the error message says

fetch: Fetching reference refs/heads/master
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/jykr/Batch-effect-removal-benchmarking.git/info/lfs'

Can you try uploading the data again? Thanks a lot!

How to record memory usage

Hello,

I was hoping to evaluate the performance of my algorithm, but confused on how to record memory usage. I didn't find a description of the tool for recording memory usage in your article. Could you please tell me what tools are?

Thanks!

no call_harmony function

Hi,

There is only call_harmony_2 function in call_harmony.R
when I change call_harmony to call_harmony_2 in run_harmony_01.R, it has error below:

b_seurat <- RunHarmony(object = b_seurat, batch_label, theta = theta_harmony, plot_convergence = TRUE, 
                  nclust = numclust, max.iter.cluster = max_iter_cluster)
Error in UseMethod("RunHarmony") : 
no applicable method for 'RunHarmony' applied to an object of class "seurat"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.