Hi, For sci-Plex , the cell barcode position is no

Apply deMULTIplex2 for sci-Plex? about demultiplex2 HOT 3 CLOSED

YijiaChi commented on July 19, 2024

Apply deMULTIplex2 for sci-Plex?

from demultiplex2.

Comments (3)

dannyconrad commented on July 19, 2024

Hi Yijia, we're glad you like our tool! I personally don't have experience with sci-Plex and the readTags() function was not designed to handle such a different barcode structure, but it may be possible. My understanding is that the RT and hairpin barcode together make up the "cell barcode". Your "sample barcode", i.e. the hashing oligo used in the experiment, should be in R2 then? If so, try running readTags() like this:

readTable <- readTags(
  barcode.type = "MULTIseq", # this is to make sure it extracts both the cell barcode and UMI from R1 - a quirk of how I wrote it
  assay = NULL,
  filter.cells = NULL,
  fastq.A = "/path/to/R1.fastq.gz",
  fastq.B = "/path/to/R2.fastq.gz",
  pos.cbc = c(1,34),
  pos.sbc = c(start,end), # input the appropriate indices of your hash barcode in R2 here
  pos.umi = c(1,34)
)

If this succeeds (it will probably throw an error that barcode is not correct) and your readTable is populated with actual sequences (Cell and UMI should be identical full length reads), then I would follow this up with a combination of gsub() commands to extract the correct sequence for the cell barcodes and UMIs:

readTable$Cell <- gsub("CAGAGC.{8}", "", readTable$Cell) # this should remove the UMI and constant region CAGAGC and leave the concatenated hairpin and 10bp RT barcode
readTable$UMI <- gsub(".*?CAGAGC(.{8}).*", "\\1", readTable$UMI) # this should keep just the UMI

I assume your R1 reads will not be variable length (i.e. all 34 bases) even if the content you wish to extract within is variable (33-34 bases). This means there will be an extra base appended to the cell barcodes that used a 9-base hairpin but I don't think this should cause any problems as long as you're aware of it.

As an aside, since you'll likely want to pair your sample demultiplexing results with gene expression analysis of your sci-RNA-seq library, you'll probably need to confirm how the cell barcodes are generated in that analysis pipeline. For example, if the RT and hairpin sequences are concatenated in the opposite order or the 9-base hairpins have that extra base trimmed, you'll need to make adjustments so that once you have your deMULTIplex2 results, your sample assignments can be transferred over to your other analysis.

Let me know if that helps!

from demultiplex2.

YijiaChi commented on July 19, 2024

Thank you very much for your detailed explanation and prompt response!! Your patient guidance has been incredibly helpful, and I truly appreciate the time and effort you've taken to assist me!!!

Attached is the script customized for sci-Plex, building upon your function:

readTags <- function (fastq.A = NA, fastq.B = NA, 
                      pos.cbc,  pos.sbc, pos.umi,
                      filter.cells = NULL, ...) 
{
  t0 <- Sys.time()
  
  if (length(fastq.A) < 1 | length(fastq.B) < 1) {
    return(message("Error - one or more FASTQ files not found"))
  }
  else if (length(fastq.A) > 1 | length(fastq.B) > 1) {
    return(message("Error - too many files with match names found"))
  }
  
  cat("### Building Read Table ###", fill = T)
  
  cat("Loading first set of reads...", fill = T)
  r <- readFastq(fastq.A)
  gc(verbose = F)
  
  read_table <- data.frame(Cell = subseq(sread(r),pos.cbc[1], pos.cbc[2]), 
                           UMI = subseq(sread(r), pos.umi[1], pos.umi[2]))
  
  read_table$Cell <- gsub("CAGAGC.{8}", "_", read_table$Cell) # this should remove the UMI and constant region CAGAGC and leave the concatenated hairpin and 10bp RT barcode
  read_table$UMI <- gsub(".*?CAGAGC(.{8}).*", "\\1", read_table$UMI) # this should keep just the UMI
  
  hp_check <- substr(x = read_table$Cell,start = 10,stop=10)
  
  # 34bp ：9{6+8}10+T ;10{6+8}10
  # to be consistant with star_solo_output cell_id ：[9 or 10 bp] hp_rtbc
  # if hairpin=9bp ,remove the last nucleutide of cell bc
  
  tmp <- read_table$Cell[hp_check== "_"]
  read_table$Cell[hp_check== "_"]<- substr(x = tmp,
                                           start = 1,
                                           stop = nchar(tmp)-1)
  tmp<-NULL
  
  cat("Finished processing first set of reads; unloading from memory", 
      fill = T)
  r <- NULL
  gc(verbose = F)
  cat("Loading second set of reads...", fill = T)
  r <- readFastq(fastq.B)
  gc(verbose = F)
  
  read_table$Sample <- as.character(subseq(sread(r), pos.sbc[1],pos.sbc[2]))
  
  cat("Finished processing second set of reads; unloading from memory", 
      fill = T)
  r <- NULL
  gc(verbose = F)
  cat("Finished parsing ", nrow(read_table), " read pairs", 
      sep = "", fill = T)
  if (is.null(filter.cells)) {
    cat("Keeping all reads because filter.cells = NULL", 
        fill = T)
  }
  else if (!is.null(filter.cells)) {
    cat("Filtering for ", length(filter.cells), " provided cell barcodes...", 
        sep = "", fill = T)
    ind <- which(read_table$Cell %in% filter.cells)
    read_table <- read_table[ind, ]
  }
  read_table <- read_table[, c("Cell", "Sample", 
                               "UMI")]
  cat("Finished building read table", fill = T)
  cat("\n")
  cat("Finished in", round(difftime(Sys.time(), t0, units = "mins")[[1]], 
                           1), "minutes", sep = " ")
  cat("\n")
  return(read_table)
}


}

readTable <- readTags(
  #filter.cells = NULL,
  fastq.A = R1,
  fastq.B = R2,
  pos.cbc = c(1,34),
  pos.sbc = c(1,10), # input the appropriate indices of your hash barcode in R2 here
  pos.umi = c(1,34)
)

dim(all_read_table)
keep <- (nchar(all_read_table$Cell) %in%  c(23,24))
sum(keep)
all_read_table <- all_read_table[sum(keep) ,]
#dir.create(exp)
write.table(all_read_table,file = paste0(exp,"/",exp,"_01_read_seq_table.txt"),
            quote = F,sep = "\t",row.names = F)


## Step2 : Aign the readtable to the reference bc =====

tag_mtx <- alignTags(all_read_table, ref_sg_seq)

tag_mtx[1:4,1:4]  # cell_bc * sg

summary(rowSums(tag_mtx))
summary(colSums(tag_mtx))

# filter low quality BC
cell_ids <- Matrix::rowSums(tag_mtx) >= 5  # umi_detected_in_a_cell: filter cells
tag_used <- Matrix::colSums(tag_mtx) >= 20   # umi_detected_in_a_sg  {expected cell/num of sg: 100}

sum(cell_ids)  # {~ expected cell number}
sum(tag_used)

tag_mtx <- tag_mtx[cell_ids,tag_used]

tag_df<- tag_mtx%>%as.matrix()%>%as.data.frame()%>%
  mutate(CBC=rownames(.))%>%select(CBC,everything())

write.table(tag_df,file = paste0(exp,"/",exp,"_02_sgRNA_CBC_mtx.txt"),
            quote = F,sep = "\t",row.names = F)

# Step3 : Assign the sample identity to cell  [ho -> cbc] ====

res <- demultiplexTags(tag_mtx,
                       plot.path = paste0(exp),
                       plot.name = exp,
                       plot.diagnostics = T)
table(res$final_assign)

assign <- as.data.frame(res$assign_table)%>%
  mutate(CBC=rownames(.))

write.table(assign,file = paste0(exp,"/",exp,"_03_final_assign.txt"),
            quote = F,sep = "\t",row.names = F)

tagHist(tag_mtx = tag_mtx, 
        minUMI = 10, 
        plotnUMI = F)
ggsave(paste0(exp,"/","tagHist.pdf"))

pdf(paste0(exp,"/","tagCallHeatmap.pdf"))
tagCallHeatmap(tag_mtx = tag_mtx, 
               calls = res$final_assign,
               log=F)

dev.off()

# Error occur im the situation that 
        # one of the calltypes has only 1 cell
        # the demultiplexed hash oligo is less than the refenrence hash oligo


# calltypes <- unique(calls[duplicated(calls)])
# tmp <- tmp[, order]

from demultiplex2.

qinzhu commented on July 19, 2024

Thanks Yijia for sharing your code with us!
We have never benchmarked deMULTIplex2 on sci-Plex data and we do not know how well the model fits the data. So please check the diagnostic plots output by deMULTIplex2. Let us know if anything weird shows up (does the distribution plot look similar to that in the paper? does the model fitting fail for some of your samples?).

from demultiplex2.

Apply deMULTIplex2 for sci-Plex? about demultiplex2 HOT 3 CLOSED

Comments (3)

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent