Comments (3)
Hi Yijia, we're glad you like our tool! I personally don't have experience with sci-Plex and the readTags() function was not designed to handle such a different barcode structure, but it may be possible. My understanding is that the RT and hairpin barcode together make up the "cell barcode". Your "sample barcode", i.e. the hashing oligo used in the experiment, should be in R2 then? If so, try running readTags() like this:
readTable <- readTags(
barcode.type = "MULTIseq", # this is to make sure it extracts both the cell barcode and UMI from R1 - a quirk of how I wrote it
assay = NULL,
filter.cells = NULL,
fastq.A = "/path/to/R1.fastq.gz",
fastq.B = "/path/to/R2.fastq.gz",
pos.cbc = c(1,34),
pos.sbc = c(start,end), # input the appropriate indices of your hash barcode in R2 here
pos.umi = c(1,34)
)
If this succeeds (it will probably throw an error that barcode is not correct) and your readTable is populated with actual sequences (Cell and UMI should be identical full length reads), then I would follow this up with a combination of gsub()
commands to extract the correct sequence for the cell barcodes and UMIs:
readTable$Cell <- gsub("CAGAGC.{8}", "", readTable$Cell) # this should remove the UMI and constant region CAGAGC and leave the concatenated hairpin and 10bp RT barcode
readTable$UMI <- gsub(".*?CAGAGC(.{8}).*", "\\1", readTable$UMI) # this should keep just the UMI
I assume your R1 reads will not be variable length (i.e. all 34 bases) even if the content you wish to extract within is variable (33-34 bases). This means there will be an extra base appended to the cell barcodes that used a 9-base hairpin but I don't think this should cause any problems as long as you're aware of it.
As an aside, since you'll likely want to pair your sample demultiplexing results with gene expression analysis of your sci-RNA-seq library, you'll probably need to confirm how the cell barcodes are generated in that analysis pipeline. For example, if the RT and hairpin sequences are concatenated in the opposite order or the 9-base hairpins have that extra base trimmed, you'll need to make adjustments so that once you have your deMULTIplex2 results, your sample assignments can be transferred over to your other analysis.
Let me know if that helps!
from demultiplex2.
Thank you very much for your detailed explanation and prompt response!! Your patient guidance has been incredibly helpful, and I truly appreciate the time and effort you've taken to assist me!!!
Attached is the script customized for sci-Plex, building upon your function:
readTags <- function (fastq.A = NA, fastq.B = NA,
pos.cbc, pos.sbc, pos.umi,
filter.cells = NULL, ...)
{
t0 <- Sys.time()
if (length(fastq.A) < 1 | length(fastq.B) < 1) {
return(message("Error - one or more FASTQ files not found"))
}
else if (length(fastq.A) > 1 | length(fastq.B) > 1) {
return(message("Error - too many files with match names found"))
}
cat("### Building Read Table ###", fill = T)
cat("Loading first set of reads...", fill = T)
r <- readFastq(fastq.A)
gc(verbose = F)
read_table <- data.frame(Cell = subseq(sread(r),pos.cbc[1], pos.cbc[2]),
UMI = subseq(sread(r), pos.umi[1], pos.umi[2]))
read_table$Cell <- gsub("CAGAGC.{8}", "_", read_table$Cell) # this should remove the UMI and constant region CAGAGC and leave the concatenated hairpin and 10bp RT barcode
read_table$UMI <- gsub(".*?CAGAGC(.{8}).*", "\\1", read_table$UMI) # this should keep just the UMI
hp_check <- substr(x = read_table$Cell,start = 10,stop=10)
# 34bp :9{6+8}10+T ;10{6+8}10
# to be consistant with star_solo_output cell_id :[9 or 10 bp] hp_rtbc
# if hairpin=9bp ,remove the last nucleutide of cell bc
tmp <- read_table$Cell[hp_check== "_"]
read_table$Cell[hp_check== "_"]<- substr(x = tmp,
start = 1,
stop = nchar(tmp)-1)
tmp<-NULL
cat("Finished processing first set of reads; unloading from memory",
fill = T)
r <- NULL
gc(verbose = F)
cat("Loading second set of reads...", fill = T)
r <- readFastq(fastq.B)
gc(verbose = F)
read_table$Sample <- as.character(subseq(sread(r), pos.sbc[1],pos.sbc[2]))
cat("Finished processing second set of reads; unloading from memory",
fill = T)
r <- NULL
gc(verbose = F)
cat("Finished parsing ", nrow(read_table), " read pairs",
sep = "", fill = T)
if (is.null(filter.cells)) {
cat("Keeping all reads because filter.cells = NULL",
fill = T)
}
else if (!is.null(filter.cells)) {
cat("Filtering for ", length(filter.cells), " provided cell barcodes...",
sep = "", fill = T)
ind <- which(read_table$Cell %in% filter.cells)
read_table <- read_table[ind, ]
}
read_table <- read_table[, c("Cell", "Sample",
"UMI")]
cat("Finished building read table", fill = T)
cat("\n")
cat("Finished in", round(difftime(Sys.time(), t0, units = "mins")[[1]],
1), "minutes", sep = " ")
cat("\n")
return(read_table)
}
}
readTable <- readTags(
#filter.cells = NULL,
fastq.A = R1,
fastq.B = R2,
pos.cbc = c(1,34),
pos.sbc = c(1,10), # input the appropriate indices of your hash barcode in R2 here
pos.umi = c(1,34)
)
dim(all_read_table)
keep <- (nchar(all_read_table$Cell) %in% c(23,24))
sum(keep)
all_read_table <- all_read_table[sum(keep) ,]
#dir.create(exp)
write.table(all_read_table,file = paste0(exp,"/",exp,"_01_read_seq_table.txt"),
quote = F,sep = "\t",row.names = F)
## Step2 : Aign the readtable to the reference bc =====
tag_mtx <- alignTags(all_read_table, ref_sg_seq)
tag_mtx[1:4,1:4] # cell_bc * sg
summary(rowSums(tag_mtx))
summary(colSums(tag_mtx))
# filter low quality BC
cell_ids <- Matrix::rowSums(tag_mtx) >= 5 # umi_detected_in_a_cell: filter cells
tag_used <- Matrix::colSums(tag_mtx) >= 20 # umi_detected_in_a_sg {expected cell/num of sg: 100}
sum(cell_ids) # {~ expected cell number}
sum(tag_used)
tag_mtx <- tag_mtx[cell_ids,tag_used]
tag_df<- tag_mtx%>%as.matrix()%>%as.data.frame()%>%
mutate(CBC=rownames(.))%>%select(CBC,everything())
write.table(tag_df,file = paste0(exp,"/",exp,"_02_sgRNA_CBC_mtx.txt"),
quote = F,sep = "\t",row.names = F)
# Step3 : Assign the sample identity to cell [ho -> cbc] ====
res <- demultiplexTags(tag_mtx,
plot.path = paste0(exp),
plot.name = exp,
plot.diagnostics = T)
table(res$final_assign)
assign <- as.data.frame(res$assign_table)%>%
mutate(CBC=rownames(.))
write.table(assign,file = paste0(exp,"/",exp,"_03_final_assign.txt"),
quote = F,sep = "\t",row.names = F)
tagHist(tag_mtx = tag_mtx,
minUMI = 10,
plotnUMI = F)
ggsave(paste0(exp,"/","tagHist.pdf"))
pdf(paste0(exp,"/","tagCallHeatmap.pdf"))
tagCallHeatmap(tag_mtx = tag_mtx,
calls = res$final_assign,
log=F)
dev.off()
# Error occur im the situation that
# one of the calltypes has only 1 cell
# the demultiplexed hash oligo is less than the refenrence hash oligo
# calltypes <- unique(calls[duplicated(calls)])
# tmp <- tmp[, order]
from demultiplex2.
Thanks Yijia for sharing your code with us!
We have never benchmarked deMULTIplex2 on sci-Plex data and we do not know how well the model fits the data. So please check the diagnostic plots output by deMULTIplex2. Let us know if anything weird shows up (does the distribution plot look similar to that in the paper? does the model fitting fail for some of your samples?).
from demultiplex2.
Related Issues (4)
- Error installing HOT 2
- Demo dataset HOT 1
- Fitting failed HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from demultiplex2.