chris-mcginnis-ucsf / multi-seq Goto Github PK

View Code? Open in Web Editor NEW

56.0 56.0 10.0 2.23 MB

R implementation of MULTI-seq sample classification workflow

R 100.00%

multi-seq's People

Contributors

Stargazers

Watchers

Forkers

mattjones315 farrellja adiamb dnarna909 castaway1990 romunov gartner-lab y111111111 szhang8

multi-seq's Issues

function MULTIseq.preProcess refers to a non-existing argument

The call to MULTIseq.preProcess() function refers to argument chemistry but in the latest version of the deMULTIplex package, this argument doesn't exist. Can you elaborate on which version of the aforementioned package you're using?

readTable <- MULTIseq.preProcess(
  R1 = "../fastq/Mix1-Barcode/R1_paired.fastq.gz",
  R2 = "../fastq/Mix1-Barcode/R2_paired.fastq.gz",
  cellIDs = mix1_keep,
  chemistry = "V2"
)

This is a screenshot of the documentation from version 1.0.2.

MULTIseqDemux error

Hi Chris,

I am using the function MULTIseqDemux() within Seurat (v.3.2.2) to demulplex the HTO object.
The command which threw the error:
hto_obj = MULTIseqDemux(hto_obj, assay = "HTO", autoThresh=TRUE)

ERROR:
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
Iteration 1
Using quantile 0.1
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
Iteration 2
Using quantile 0.1
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
Error in if (n_bc_calls[i] == 0) { :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
5: In min(x) : no non-missing arguments to min; returning Inf
6: In max(x) : no non-missing arguments to max; returning -Inf

I also attached you the hto_obj file here.

hto_obj.zip

Error when reading indrops3 data

Hello Chris!

Thanks for developing and supporting Multi-seq!

I'm trying to load and analyze indrops3 data: https://singlecellcore.hms.harvard.edu/resources
I emulated 10X reads by merging 4 indrops3 reads into two, and loading them with:

readTable <- MULTIseq.preProcess(R1 = "data/06_indrop3_to_10x/multiseq10x_1.fq.gz", 
                                 R2 = "data/06_indrop3_to_10x/multiseq10x_2.fq.gz", 
                                 cellIDs = cell_ids$cell_barcode, 
                                 cell = c(1, 16),
                                 umi = c(17, 22), 
                                 tag = c(1, 8))

It errors with:

#[1] "Reading in R1..."
#[1] "Reading in R2..."
#[1] "Assembling read table..."
#Error in .Call2("C_solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  : 
#  solving row 12080: 'allow.nonnarrowing' is FALSE and the supplied end (8) is > refwidth

Do you have any suggestions on how to proceed?

Sergey

Barcode match

Hi Chris,
first of all thank you for your package, I'm currently using MULTI-Seq for my master thesis and it has been an important tool for my analysis. Maybe is a dumb question but I started wondering how does deMULTIplex keep track of which barcode is which?
For example, I have 6 barcodes named "Bar2", "Bar3", "Bar5", "Bar6", "Bar7", "Bar9" and the sequences are read in this order, after MULTIseq.align the column names are "Bar1", "Bar2", "Bar3", "Bar4", "Bar5", "Bar6". Will be "Bar1" correspond to "Bar2" in my annotation? Or is "Bar1" the first barcode it reads in readTable, therefore the first found in the fastq file?
I couldn't completely understand it from the code and the tutorial.
Thanks a lot!
Alessia

MULTISeqDemux function error: "No threshold found for CMO302

Hi
I am working on analyzing multiplexing (cellplex and TotalSeqA) data. I used HTOdemux for demultiplexing samples and it worked fine for both cellplex and totalseqA data however when I used MULTISeqDemux function for demultiplexing cellplex data it is giving me error saying "No threshold found for CMO302"

MULTISeqDemux function worked on TotalSeqA data though. Can you please help me understand what's going on?

Thanks!

Issues with the MULTIseq.preProcess step

Hi Chris,

I'm trying to use your deMULTIplex tool to demultiplex my hashed single cell data but it seems to keep terminating at the MULTIseq.preProcess step. I realized that the issue is with the ShortRead function readFastq( ). I think my fastq may just be too big (it has ~13mil reads) as readFastq() runs fine when I split my fastq into 4 subfiles (of ~3mil reads each). Do you think this is the real issue and if yes, how do I go about creating one readTable, incorporating all of the information in the full fastq, without having to create 4 seprate readTables and merging them after.

Thanks,
Paulina

Error when preProcessing HMEC example data

Hi Chris,

I am testing the pipeline with the HMEC example data, but I get the following error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 550, 705

See my code bellow:

cell.id<- read.table("cellIDs.txt")
cell.id.vec <- cell.id$x

## Pre-process MULTI-seq sample barcode FASTQs
readTable <- MULTIseq.preProcess(R1 = 'ACAGTG_S3_L001_R1_001.fastq.gz', R2 = 'ACAGTG_S3_L001_R2_001.fastq.gz', cellIDs = cell.id.vec, cell=c(1,16), umi=c(17,28), tag=c(1,8))

Am I missing something?

Thank you so much for sharing the LMOs,

Cheers,
Yolanda

cells being called HTO1_HTO1?

When I run multiseq demux I sometimes get cells that have the label HTO1_HTO1 and are labeled as doublets. What part of the algorithm is doing that?

Thank you,
Ben

Install fails -- extra space in manual files

Hey Chris,

Just tried re-installing the package. Installation fails. If I'm interpreting the error correctly, it seems to not like the space in the arguments section of the manual files:

* installing *source* package ‘deMULTIplex’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
Error : (converted from warning) bad markup (extra space?) at MULTIseq.preProcess.Rd:12:12
ERROR: installing Rd objects failed for package ‘deMULTIplex’
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/deMULTIplex’
Error in i.p(...) : 
  (converted from warning) installation of package ‘/var/folders/y9/j0gdhnsd1mn1lj2gsfqr532h0000gp/T//Rtmploqg3m/fileab5fe3b277d/deMULTIplex_1.0.2.tar.gz’ had non-zero exit status

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1         rstudioapi_0.10    knitr_1.22         magrittr_1.5       usethis_1.5.0      devtools_2.0.2    
 [7] pkgload_1.0.2      R6_2.4.0           rlang_0.3.4        tools_3.6.0        pkgbuild_1.0.3     xfun_0.6          
[13] sessioninfo_1.1.1  cli_1.1.0          withr_2.1.2        remotes_2.0.4      assertthat_0.2.1   digest_0.6.18     
[19] rprojroot_1.3-2    crayon_1.3.4       processx_3.3.0     BiocManager_1.30.4 callr_3.2.0        fs_1.3.1          
[25] ps_1.3.0           curl_3.3           glue_1.3.1         memoise_1.1.0      compiler_3.6.0     desc_1.2.0        
[31] backports_1.1.4    prettyunits_1.0.2

removal of duplicated umis

Hi @chris-mcginnis-ucsf ,

Thanks for developing this great package.

I am trying to understand the preprocessing step.
In MULTIseq.Align.Suite.R code, there is step called Remove reads representing duplicated UMIs on a cell-by-cell basis

MULTI-seq/R/MULTIseq.Align.Suite.R

Lines 50 to 52 in 7341133

    
           ## Remove duplicated UMIs for reads with reference-matching tags 
        
           umis <- umis[tag.ind] 
        
           umi.ind <- which(duplicated(umis) == FALSE)

I did not get through with it.
1.This duplicated UMIs is detected in same Cell?
2. Are Multiple UMIs(duplicated) used for calculating barcode counts matrix?

Or maybe I did not understand the code.

Thanks!

tutorial processing multi-seq raw fastqs?

Hello Chris,
We just used the multi-seq 10x for sc-RNAseq in 32 samples, I received back 16 triplet files (R1, R2 & I) from the UCSF core and I have a list of barcodes 1 through 32. Is there a guide starting out from this point? I apologize if this is not the right channel to ask for help, I am starting out with 10x analysis and multi-seq
many thanks for this work
Aditya

Multi-seq for 5' VDJ 10x

Hello Chris,

This might not be the right place to ask this question so I apologize in advance for disturbing you.
I watched your seminar uploaded in Merck website discussing about Multi-seq technique and I have also read your nice paper in Nature. I was wondering if the products sold by Merck (including barcoded oligos) are compatible with 10x 5' system. It is not clear for me which should be the design of the barcoded oligo to achieve cDNA feature barcode amplification.

Again, my apologies for asking this here,

Guillem

including "negative" population helpful for algorithm?

Hi,

I am using your great package through the MULTIseqDemux function in Seurat. Depending on the settings a varying part of each cluster is assigned to "negative" (see HTO-tSNE below):

Visually judged, the negatives could also be assigned to their neighboring singlet cluster.
I was wondering what your thoughts on including real negative cells (=empty drops) when running MULTIseqDemux() are? Would it make the classification of negatives more precise? Empty drops could be retrieved from the cellranger raw matrix (raw_feature_bc_matrix) as opposed to the filtered matrix.

Thanks!

Best wishes
Tilo

Demux and cell numbers

Hi,
Thank you very much for this method for cell hashing demux.
I am doing tcr sequencing with cell hashing . and it gives separate fastqs for HTOs and TCR (I have used cellranger and Illumina). I am writing down the method I have used to demux it using MULTI-seq, please let me know I am in the right track.

I did a VDJ assembly using cell ranger to get the clonotypes
generated count matrix for HTO data using cell ranger.
Used Seurat and MULTISeqDemux function to de-convolute the samples. (quantile = 0.9,, autoThresh = TRUE, qrange = seq(from = 0.1, to = 0.9, by = 0.05),)
4.downloaded the barcodes for negative, doublets and Htos using WhichCell function in Seurat.
Using the merge function in R, assigned the clonotypes (matching hydrogel barcodes) for the demuxed barcodes.

Now the queries are,

do the parameters used for demux is apt for my case?
I had around 1316 cells after deconvolution. but when I tried to match with hydrogel barcodes in tcr data, only 337 are matched. The unassigned includes from HTOs .what could be the reason? could it be due to the 1 mismatch in the barcode? I assume any of my step is not doing the hammimg distance 1 correction for barcodes?

Any feedback in this regards will be appreciated.

Thanks,
Claire

MULTIseq.preProcess

I'm using MULTI-seq on hashing data. I'm getting the following error when I tried to use MULTIseq.preProcess function. Is there an issue?

Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord, :
solving row 1: 'allow.nonnarrowing' is FALSE and the supplied end (28) is > refwidth

MULTI-seq and Seurat Integration

Hello @chris-mcginnis-ucsf, can you please provide more details about how to export the sample barcode UMI count matrix to be used as the input for Seurat or CellRanger?

Download of test data fails

Following this link:

Download example dataset for trying deMULTIplex here: https://ucsf.box.com/s/ibo5t6y0a0a3dde68o5xnedjmajsujxk

I see this error:
This shared file or folder link has been removed or is unavailable to you.

tutorial data barcode check result

Hey Chris,

I'm testing out deMULTIplex on the PBMC data, following the tutorial. I ran the following:


library(deMULTIplex)
library(ggplot2)

bar.ref <- read.csv("~/resources/deMULTIplex_ExampleData/LMOlist.csv",header=F)
cell.id.vec <- read.table("~/resources/deMULTIplex_ExampleData/cellIDs.txt")

barcs <- bar.ref$V1
cell.ids <- cell.id.vec$x

readTable <- MULTIseq.preProcess(R1 = '~/resources/deMULTIplex_ExampleData/ACAGTG_S3_L001_R1_001.fastq.gz', R2 = '~/resources/deMULTIplex_ExampleData/ACAGTG_S3_L001_R2_001.fastq.gz', cellIDs = cell.ids, cell=c(1,16), umi=c(17,28), tag=c(1,8))

str(readTable)

## Perform MULTI-seq sample barcode alignment
bar.table <- MULTIseq.align(readTable, cell.ids, barcs)

## Visualize barcode space
bar.tsne <- barTSNE(bar.table[,1:96]) 
## Note: Exclude columns 97:98 (assuming 96 barcodes were used) which provide total barcode UMI counts for each cell. 

for (i in 3:ncol(bar.tsne)) {
    g <- ggplot(bar.tsne, aes(x = TSNE1, y = TSNE2, color = bar.tsne[,i])) +
    geom_point() +
    scale_color_gradient(low = "black", high = "red") +
    ggtitle(colnames(bar.tsne)[i]) +
    theme(legend.position = "none") 
    print(g)
}

And checked the TSNE plots. I'm getting plots like these:

Is this the expected output for this dataset? A lot of the plots look like the "missing" example and I expected this dataset to not have missing barcodes based on your note.

Thanks!
Lauren

Preprocessing Issue with cell.id.vec

In the documentation I could not find how you obtain the cell IDs for the cell id vector? There is a file within the CellRanger package called 3M-february-2018.txt.gz that contains all of the cell barcode sequences but I'm assuming this is not the correct as it is several million cell barcodes long and the example provided in the documentation is a little over 13 thousand.

focusing on 'multiplets'

Hi @chris-mcginnis-ucsf

I am trying analyse a datasets generated similarly as MULTI-seq, but with fewer sample barcodes.

What's different with MULTI-seq, we are focusing on 'multiplets' specifically,
e.g. which cells have valida barcode1 and barcode2, which cells have valida barcode3 and barcode4 and barcode5

I am not an expert of statistics, and I wonder if we can:

Remove "negative" cells by classification and the rest cells are what we want.
OR, simply, I manually set a counts threshold, such as barcode 1>5 & barcode 2 >5 as bar1&bar2 doublets, and so on.

Which way do you think is mathmatically reasonable? And better idea is welcome.

cannot find barcode threshold

Hi @chris-mcginnis-ucsf
I came to an warning that 'cannot find threshold' for certain barcode.
Any suggestion what is reason behind it and how to fix it?

Actually, my dataset is a bit different with multi-seq sample data, I have only five barcodes, and their expression pattern(expression abundance) not like MULTI-seq sample barcode. Ours is much similar to a gene expression.
Any suggestion how to determine valid barcode for this?

MULTI-seq samples using cellrange

Hi:
MULTI-seq is interestig.

I'm trying to process data from PRJNA822077 using MULTI-seq. However, I am confused about the first step mentioned by the article.
Raw sequencing reads from the gene expression libraries were processed using “CellRanger” v3.0.2. The GRCh38 build of the mouse genome was used. Except for explicitly setting --expect-cells = 25,000, default parameters were used for all samples.

Here is my cellrange command line:
cellranger count --id=SRRxxxx --transcriptome=/pathto/mm10/cellranger-index/mm10/ --fastqs=/pathto/fastq/ --localmem=200 --sample=SRRxxxx

However,one of my results shows :2 Median Genes per Cell.

So may I learn which part I go wrong? thanks!

No threshold found for barcode

I am using deMULTIplex on a dataset with six barcodes, of which one (no. 3) happens to be notably more prevalent than the other barcodes. deMULTIplex gives the following note when running classifyCells: "No threshold found for Bar3..." - and as a result, all of those cells are classified as negative. When looking at barcode counts, however, it is immediately obvious that they belong to barcode 3, and the expression of this barcode is 3-4 magnitudes higher than any other barcode. From looking at cells classified in the remaining barcodes (1,2, 4-6), there doesn't appear to be much background noise from barcode 3 either. Is there something I can do to improve classification?

Code and data example for using the sample barcode UMI count matrix as input (We want to try the tool for cell hashing data)

Hi,

MULTI-seq looks very cool!

Could you please supply the example input data files that is included in the manual?

Currently it is very difficult for a non-R programmer to use the tool.

It is mensioned that MULTI-seq is capable of taking UMI matrix as input. Could you please provide a code and data example for this feature?

findReclassCells

I am trying to use findReclassCells function, but getting the following error, even though the row names are unique.

reclass.cells <- findReclassCells(bar.table, names(final.calls)[which(final.calls=="Negative")])
[1] "Normalizing barode data..."
[1] "Pre-allocating data structures..."
Error in .rowNamesDF<-(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘AAACCTGCAGCTCCGA-1’, ‘AAACGGGAGTGTACGG-1’, ‘AAACGGGCAATCAGAA-1’, ‘AAACGGGTCGACAGCC-1’, ‘AAAGCAACAAAGCAAT-1’, ‘AAATGCCGTTCAGTAC-1’, ‘AACACGTGTACTCTCC-1’, ‘AACACGTTCCGCATCT-1’, ‘AACCATGTCAGTGCAT-1’, ‘AACCATGTCCGTAGGC-1’

Hide classifyCells Print-out

Hi:

this is an amazing package but each time when I run classifyCells, it gives me tons of print out, like the example I give below. Is there anyway we can hide those printout? Thanks!

Are BClist.Robj and cell.id.vec.Robj files provided?

Hi @chris-mcginnis-ucsf , thank you for providing this package. I am trying to learn it. And I was following the tutorials, and realized that I don't have BClist.Robj and cell.id.vec.Robj files. Wonder if these files are provided anywhere, or how should I generate them?

Thanks

## Define vectors for reference barcode sequences and cell IDs
bar.ref <- load("/path/to/BClist.Robj")
cell.id.vec <- load("/path/to/cell.id.vec.Robj")

deMULTIplex for antibody-based cell hashing

Hello,

I am analysing a 10x scRNA-seq dataset where 6 cell populations were multiplexed by antibody-based cell hashing (with Totalseq-B antibodies). I have a few questions about demultiplexing:

Is it possible to use the deMULTIplex package to demultiplex datasets with antibody-based cell hashing as opposed to lipid-based as described in the paper? I tried running it on the normalised Seurat matrices, comparing it with HTODemux (described here), and found that MULTIseqDemux returns a % of doublets much closer to what I would expect, whilst HTODemux classifies a lot more cells as doublets (almost 3x as much as expected). However I am not sure how to judge which algorithm gives the best results without relying on an expected multiplet rate (from the 10x table). Here's the code I used:

seurat.object.norm <- NormalizeData(seurat.object)
seurat.object.norm <- NormalizeData(seurat.object.norm, assay="Protein", normalization.method="CLR")
seurat.object.norm <- HTODemux(seurat.object.norm, assay="Protein", positive.quantile=0.99)
seurat.object.norm <- MULTIseqDemux(seurat.object.norm, assay="Protein", autoThresh=TRUE)

Does the algorithm identify doublets AND multiplets? Related to this, does it identifies dublets/multiplets based solely on detection of 2 or more HTOs for a single cell barcode (inter-sample doublets)? I would like to also exclude intra-sample doublets (i.e. doublets that share a cell barcode and also one HTO only); could this be achieved by manually filtering cell barcodes having an abnormally high gene count or UMI count (similarly to what is described here), prior to running MULTIseqDemux? Are there algorithms incorporating this step to avoid relying on an arbitrary threshold?

Thank you for your help!

License for this repo?

Annoying question: I've reimplemented this code in python which I would like to make public but first want to make sure that I credit your work and choose a compatible license. This repo doesn't appear to an explicit license, and googling around shows maybe it's under CC0.

Is this code licensed? If you're looking for guidance on choosing a license, choosealicense.com is helpful.

Thanks!

Feature Request: preProcess barcodes as variable instead of hard coded

Hi Chris,

I'd like to use deMULTIplex for 12bp barcodes. Currently, the barcode length is hard coded in MULTIseq.preProcess for the 8bp multi-seq barcodes:

if (chemistry == "V2") {
     readTable <- cbind(as.data.frame(subseq(sread(r1),1,16)),
                       as.data.frame(subseq(sread(r1),17,26)),
                       as.data.frame(subseq(sread(r2),1,8))) #HERE
    colnames(readTable) <- c("Cell","UMI","Sample")
    gc()
    ind <- which(readTable$Cell %in% cellIDs)
    readTable <- readTable[ind, ]
    return(readTable)
  }

To make demultiplex compatible with other multiplexing methods as well as custom barcodes, It would be great if barcode length were coded as a variable so users can put in their own without modifying the source code.

Thanks!
Nathan

reclass.cells function throwing an error

Hey chris, when I try to run the
reclass.cells <- findReclassCells(barTable.full, neg.cells = names(final.calls)[which(final.calls=="Negative")]).
it throws an error
[1] "Normalizing barode data..." [1] "Pre-allocating data structures..." Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': .
what am i Missing, if pass a unique command to the neg.cell argument, then it works but then I find duplicated cell.ids where one instance is negative while other is attributed to a sample barcode.

MULTIseq.preprocess function

Hi!
I'm having a memory problem for the MULTIseq.preprocess function. My fastq files are around 4-5GB. When I run the function I run out of memory as it consumes more than 100GB. Is this normal?

Thanks a lot for your help with this!!

Best wishes,
Alina

	## Remove duplicated UMIs for reads with reference-matching tags
	umis <- umis[tag.ind]
	umi.ind <- which(duplicated(umis) == FALSE)

chris-mcginnis-ucsf / multi-seq Goto Github PK

multi-seq's People

Contributors

Stargazers

Watchers

Forkers

multi-seq's Issues

Recommend Projects

Recommend Topics

Recommend Org