chris-mcginnis-ucsf / multi-seq Goto Github PK
View Code? Open in Web Editor NEWR implementation of MULTI-seq sample classification workflow
R implementation of MULTI-seq sample classification workflow
The call to MULTIseq.preProcess()
function refers to argument chemistry
but in the latest version of the deMULTIplex
package, this argument doesn't exist. Can you elaborate on which version of the aforementioned package you're using?
readTable <- MULTIseq.preProcess(
R1 = "../fastq/Mix1-Barcode/R1_paired.fastq.gz",
R2 = "../fastq/Mix1-Barcode/R2_paired.fastq.gz",
cellIDs = mix1_keep,
chemistry = "V2"
)
This is a screenshot of the documentation from version 1.0.2.
Hi Chris,
I am using the function MULTIseqDemux() within Seurat (v.3.2.2) to demulplex the HTO object.
The command which threw the error:
hto_obj = MULTIseqDemux(hto_obj, assay = "HTO", autoThresh=TRUE)
ERROR:
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
No threshold found for HTO3...
Iteration 1
Using quantile 0.1
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
Iteration 2
Using quantile 0.1
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
No threshold found for HTO1...
No threshold found for HTO2...
No threshold found for HTO3...
Error in if (n_bc_calls[i] == 0) { :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
5: In min(x) : no non-missing arguments to min; returning Inf
6: In max(x) : no non-missing arguments to max; returning -Inf
I also attached you the hto_obj file here.
Hello Chris!
Thanks for developing and supporting Multi-seq!
I'm trying to load and analyze indrops3 data: https://singlecellcore.hms.harvard.edu/resources
I emulated 10X reads by merging 4 indrops3 reads into two, and loading them with:
readTable <- MULTIseq.preProcess(R1 = "data/06_indrop3_to_10x/multiseq10x_1.fq.gz",
R2 = "data/06_indrop3_to_10x/multiseq10x_2.fq.gz",
cellIDs = cell_ids$cell_barcode,
cell = c(1, 16),
umi = c(17, 22),
tag = c(1, 8))
It errors with:
#[1] "Reading in R1..."
#[1] "Reading in R2..."
#[1] "Assembling read table..."
#Error in .Call2("C_solve_user_SEW", refwidths, start, end, width, translate.negative.coord, :
# solving row 12080: 'allow.nonnarrowing' is FALSE and the supplied end (8) is > refwidth
Do you have any suggestions on how to proceed?
Sergey
Hi Chris,
first of all thank you for your package, I'm currently using MULTI-Seq for my master thesis and it has been an important tool for my analysis. Maybe is a dumb question but I started wondering how does deMULTIplex keep track of which barcode is which?
For example, I have 6 barcodes named "Bar2", "Bar3", "Bar5", "Bar6", "Bar7", "Bar9" and the sequences are read in this order, after MULTIseq.align
the column names are "Bar1", "Bar2", "Bar3", "Bar4", "Bar5", "Bar6". Will be "Bar1" correspond to "Bar2" in my annotation? Or is "Bar1" the first barcode it reads in readTable
, therefore the first found in the fastq file?
I couldn't completely understand it from the code and the tutorial.
Thanks a lot!
Alessia
Hi
I am working on analyzing multiplexing (cellplex and TotalSeqA) data. I used HTOdemux for demultiplexing samples and it worked fine for both cellplex and totalseqA data however when I used MULTISeqDemux function for demultiplexing cellplex data it is giving me error saying "No threshold found for CMO302"
MULTISeqDemux function worked on TotalSeqA data though. Can you please help me understand what's going on?
Thanks!
Hi Chris,
I'm trying to use your deMULTIplex tool to demultiplex my hashed single cell data but it seems to keep terminating at the MULTIseq.preProcess step. I realized that the issue is with the ShortRead function readFastq( ). I think my fastq may just be too big (it has ~13mil reads) as readFastq() runs fine when I split my fastq into 4 subfiles (of ~3mil reads each). Do you think this is the real issue and if yes, how do I go about creating one readTable, incorporating all of the information in the full fastq, without having to create 4 seprate readTables and merging them after.
Thanks,
Paulina
Hi Chris,
I am testing the pipeline with the HMEC example data, but I get the following error:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 550, 705
See my code bellow:
cell.id<- read.table("cellIDs.txt")
cell.id.vec <- cell.id$x
## Pre-process MULTI-seq sample barcode FASTQs
readTable <- MULTIseq.preProcess(R1 = 'ACAGTG_S3_L001_R1_001.fastq.gz', R2 = 'ACAGTG_S3_L001_R2_001.fastq.gz', cellIDs = cell.id.vec, cell=c(1,16), umi=c(17,28), tag=c(1,8))
Am I missing something?
Thank you so much for sharing the LMOs,
Cheers,
Yolanda
When I run multiseq demux I sometimes get cells that have the label HTO1_HTO1 and are labeled as doublets. What part of the algorithm is doing that?
Thank you,
Ben
Hey Chris,
Just tried re-installing the package. Installation fails. If I'm interpreting the error correctly, it seems to not like the space in the arguments section of the manual files:
* installing *source* package ‘deMULTIplex’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
Error : (converted from warning) bad markup (extra space?) at MULTIseq.preProcess.Rd:12:12
ERROR: installing Rd objects failed for package ‘deMULTIplex’
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/deMULTIplex’
Error in i.p(...) :
(converted from warning) installation of package ‘/var/folders/y9/j0gdhnsd1mn1lj2gsfqr532h0000gp/T//Rtmploqg3m/fileab5fe3b277d/deMULTIplex_1.0.2.tar.gz’ had non-zero exit status
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 rstudioapi_0.10 knitr_1.22 magrittr_1.5 usethis_1.5.0 devtools_2.0.2
[7] pkgload_1.0.2 R6_2.4.0 rlang_0.3.4 tools_3.6.0 pkgbuild_1.0.3 xfun_0.6
[13] sessioninfo_1.1.1 cli_1.1.0 withr_2.1.2 remotes_2.0.4 assertthat_0.2.1 digest_0.6.18
[19] rprojroot_1.3-2 crayon_1.3.4 processx_3.3.0 BiocManager_1.30.4 callr_3.2.0 fs_1.3.1
[25] ps_1.3.0 curl_3.3 glue_1.3.1 memoise_1.1.0 compiler_3.6.0 desc_1.2.0
[31] backports_1.1.4 prettyunits_1.0.2
Hi @chris-mcginnis-ucsf ,
Thanks for developing this great package.
I am trying to understand the preprocessing step.
In MULTIseq.Align.Suite.R
code, there is step called Remove reads representing duplicated UMIs on a cell-by-cell basis
MULTI-seq/R/MULTIseq.Align.Suite.R
Lines 50 to 52 in 7341133
I did not get through with it.
1.This duplicated UMIs is detected in same Cell?
2. Are Multiple UMIs(duplicated) used for calculating barcode counts matrix?
Or maybe I did not understand the code.
Thanks!
Hello Chris,
We just used the multi-seq 10x for sc-RNAseq in 32 samples, I received back 16 triplet files (R1, R2 & I) from the UCSF core and I have a list of barcodes 1 through 32. Is there a guide starting out from this point? I apologize if this is not the right channel to ask for help, I am starting out with 10x analysis and multi-seq
many thanks for this work
Aditya
Hello Chris,
This might not be the right place to ask this question so I apologize in advance for disturbing you.
I watched your seminar uploaded in Merck website discussing about Multi-seq technique and I have also read your nice paper in Nature. I was wondering if the products sold by Merck (including barcoded oligos) are compatible with 10x 5' system. It is not clear for me which should be the design of the barcoded oligo to achieve cDNA feature barcode amplification.
Again, my apologies for asking this here,
Guillem
Hi,
I am using your great package through the MULTIseqDemux function in Seurat. Depending on the settings a varying part of each cluster is assigned to "negative" (see HTO-tSNE below):
Visually judged, the negatives could also be assigned to their neighboring singlet cluster.
I was wondering what your thoughts on including real negative cells (=empty drops) when running MULTIseqDemux() are? Would it make the classification of negatives more precise? Empty drops could be retrieved from the cellranger raw matrix (raw_feature_bc_matrix) as opposed to the filtered matrix.
Thanks!
Best wishes
Tilo
Hi,
Thank you very much for this method for cell hashing demux.
I am doing tcr sequencing with cell hashing . and it gives separate fastqs for HTOs and TCR (I have used cellranger and Illumina). I am writing down the method I have used to demux it using MULTI-seq, please let me know I am in the right track.
Now the queries are,
Any feedback in this regards will be appreciated.
Thanks,
Claire
I'm using MULTI-seq on hashing data. I'm getting the following error when I tried to use MULTIseq.preProcess function. Is there an issue?
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord, :
solving row 1: 'allow.nonnarrowing' is FALSE and the supplied end (28) is > refwidth
Hello @chris-mcginnis-ucsf, can you please provide more details about how to export the sample barcode UMI count matrix to be used as the input for Seurat or CellRanger?
Following this link:
Download example dataset for trying deMULTIplex here: https://ucsf.box.com/s/ibo5t6y0a0a3dde68o5xnedjmajsujxk
I see this error:
This shared file or folder link has been removed or is unavailable to you.
Hey Chris,
I'm testing out deMULTIplex on the PBMC data, following the tutorial. I ran the following:
library(deMULTIplex)
library(ggplot2)
bar.ref <- read.csv("~/resources/deMULTIplex_ExampleData/LMOlist.csv",header=F)
cell.id.vec <- read.table("~/resources/deMULTIplex_ExampleData/cellIDs.txt")
barcs <- bar.ref$V1
cell.ids <- cell.id.vec$x
readTable <- MULTIseq.preProcess(R1 = '~/resources/deMULTIplex_ExampleData/ACAGTG_S3_L001_R1_001.fastq.gz', R2 = '~/resources/deMULTIplex_ExampleData/ACAGTG_S3_L001_R2_001.fastq.gz', cellIDs = cell.ids, cell=c(1,16), umi=c(17,28), tag=c(1,8))
str(readTable)
## Perform MULTI-seq sample barcode alignment
bar.table <- MULTIseq.align(readTable, cell.ids, barcs)
## Visualize barcode space
bar.tsne <- barTSNE(bar.table[,1:96])
## Note: Exclude columns 97:98 (assuming 96 barcodes were used) which provide total barcode UMI counts for each cell.
for (i in 3:ncol(bar.tsne)) {
g <- ggplot(bar.tsne, aes(x = TSNE1, y = TSNE2, color = bar.tsne[,i])) +
geom_point() +
scale_color_gradient(low = "black", high = "red") +
ggtitle(colnames(bar.tsne)[i]) +
theme(legend.position = "none")
print(g)
}
And checked the TSNE plots. I'm getting plots like these:
Is this the expected output for this dataset? A lot of the plots look like the "missing" example and I expected this dataset to not have missing barcodes based on your note.
Thanks!
Lauren
In the documentation I could not find how you obtain the cell IDs for the cell id vector? There is a file within the CellRanger package called 3M-february-2018.txt.gz that contains all of the cell barcode sequences but I'm assuming this is not the correct as it is several million cell barcodes long and the example provided in the documentation is a little over 13 thousand.
I am trying analyse a datasets generated similarly as MULTI-seq, but with fewer sample barcodes.
What's different with MULTI-seq, we are focusing on 'multiplets' specifically,
e.g. which cells have valida barcode1 and barcode2, which cells have valida barcode3 and barcode4 and barcode5
I am not an expert of statistics, and I wonder if we can:
Which way do you think is mathmatically reasonable? And better idea is welcome.
Hi @chris-mcginnis-ucsf
I came to an warning that 'cannot find threshold' for certain barcode.
Any suggestion what is reason behind it and how to fix it?
Actually, my dataset is a bit different with multi-seq sample data, I have only five barcodes, and their expression pattern(expression abundance) not like MULTI-seq sample barcode. Ours is much similar to a gene expression.
Any suggestion how to determine valid barcode for this?
Hi:
MULTI-seq is interestig.
I'm trying to process data from PRJNA822077 using MULTI-seq. However, I am confused about the first step mentioned by the article.
Raw sequencing reads from the gene expression libraries were processed using “CellRanger” v3.0.2. The GRCh38 build of the mouse genome was used. Except for explicitly setting --expect-cells = 25,000, default parameters were used for all samples.
Here is my cellrange command line:
cellranger count --id=SRRxxxx --transcriptome=/pathto/mm10/cellranger-index/mm10/ --fastqs=/pathto/fastq/ --localmem=200 --sample=SRRxxxx
However,one of my results shows :2 Median Genes per Cell.
So may I learn which part I go wrong? thanks!
I am using deMULTIplex on a dataset with six barcodes, of which one (no. 3) happens to be notably more prevalent than the other barcodes. deMULTIplex gives the following note when running classifyCells: "No threshold found for Bar3..." - and as a result, all of those cells are classified as negative. When looking at barcode counts, however, it is immediately obvious that they belong to barcode 3, and the expression of this barcode is 3-4 magnitudes higher than any other barcode. From looking at cells classified in the remaining barcodes (1,2, 4-6), there doesn't appear to be much background noise from barcode 3 either. Is there something I can do to improve classification?
Hi,
MULTI-seq looks very cool!
Could you please supply the example input data files that is included in the manual?
Currently it is very difficult for a non-R programmer to use the tool.
It is mensioned that MULTI-seq is capable of taking UMI matrix as input. Could you please provide a code and data example for this feature?
I am trying to use findReclassCells function, but getting the following error, even though the row names are unique.
reclass.cells <- findReclassCells(bar.table, names(final.calls)[which(final.calls=="Negative")])
[1] "Normalizing barode data..."
[1] "Pre-allocating data structures..."
Error in.rowNamesDF<-
(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘AAACCTGCAGCTCCGA-1’, ‘AAACGGGAGTGTACGG-1’, ‘AAACGGGCAATCAGAA-1’, ‘AAACGGGTCGACAGCC-1’, ‘AAAGCAACAAAGCAAT-1’, ‘AAATGCCGTTCAGTAC-1’, ‘AACACGTGTACTCTCC-1’, ‘AACACGTTCCGCATCT-1’, ‘AACCATGTCAGTGCAT-1’, ‘AACCATGTCCGTAGGC-1’
Hi:
this is an amazing package but each time when I run classifyCells, it gives me tons of print out, like the example I give below. Is there anyway we can hide those printout? Thanks!
Hi @chris-mcginnis-ucsf , thank you for providing this package. I am trying to learn it. And I was following the tutorials, and realized that I don't have BClist.Robj and cell.id.vec.Robj files. Wonder if these files are provided anywhere, or how should I generate them?
Thanks
## Define vectors for reference barcode sequences and cell IDs
bar.ref <- load("/path/to/BClist.Robj")
cell.id.vec <- load("/path/to/cell.id.vec.Robj")
Hello,
I am analysing a 10x scRNA-seq dataset where 6 cell populations were multiplexed by antibody-based cell hashing (with Totalseq-B antibodies). I have a few questions about demultiplexing:
seurat.object.norm <- NormalizeData(seurat.object)
seurat.object.norm <- NormalizeData(seurat.object.norm, assay="Protein", normalization.method="CLR")
seurat.object.norm <- HTODemux(seurat.object.norm, assay="Protein", positive.quantile=0.99)
seurat.object.norm <- MULTIseqDemux(seurat.object.norm, assay="Protein", autoThresh=TRUE)
Thank you for your help!
Annoying question: I've reimplemented this code in python which I would like to make public but first want to make sure that I credit your work and choose a compatible license. This repo doesn't appear to an explicit license, and googling around shows maybe it's under CC0.
Is this code licensed? If you're looking for guidance on choosing a license, choosealicense.com is helpful.
Thanks!
Hi Chris,
I'd like to use deMULTIplex for 12bp barcodes. Currently, the barcode length is hard coded in MULTIseq.preProcess for the 8bp multi-seq barcodes:
if (chemistry == "V2") {
readTable <- cbind(as.data.frame(subseq(sread(r1),1,16)),
as.data.frame(subseq(sread(r1),17,26)),
as.data.frame(subseq(sread(r2),1,8))) #HERE
colnames(readTable) <- c("Cell","UMI","Sample")
gc()
ind <- which(readTable$Cell %in% cellIDs)
readTable <- readTable[ind, ]
return(readTable)
}
To make demultiplex compatible with other multiplexing methods as well as custom barcodes, It would be great if barcode length were coded as a variable so users can put in their own without modifying the source code.
Thanks!
Nathan
Hey chris, when I try to run the
reclass.cells <- findReclassCells(barTable.full, neg.cells = names(final.calls)[which(final.calls=="Negative")])
.
it throws an error
[1] "Normalizing barode data..." [1] "Pre-allocating data structures..." Error in
.rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names':
.
what am i Missing, if pass a unique command to the neg.cell argument, then it works but then I find duplicated cell.ids where one instance is negative while other is attributed to a sample barcode.
Hi!
I'm having a memory problem for the MULTIseq.preprocess function. My fastq files are around 4-5GB. When I run the function I run out of memory as it consumes more than 100GB. Is this normal?
Thanks a lot for your help with this!!
Best wishes,
Alina
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.