Code Monkey home page Code Monkey logo

Comments (3)

dannyconrad avatar dannyconrad commented on July 19, 2024

Hi Yijia, we're glad you like our tool! I personally don't have experience with sci-Plex and the readTags() function was not designed to handle such a different barcode structure, but it may be possible. My understanding is that the RT and hairpin barcode together make up the "cell barcode". Your "sample barcode", i.e. the hashing oligo used in the experiment, should be in R2 then? If so, try running readTags() like this:

readTable <- readTags(
  barcode.type = "MULTIseq", # this is to make sure it extracts both the cell barcode and UMI from R1 - a quirk of how I wrote it
  assay = NULL,
  filter.cells = NULL,
  fastq.A = "/path/to/R1.fastq.gz",
  fastq.B = "/path/to/R2.fastq.gz",
  pos.cbc = c(1,34),
  pos.sbc = c(start,end), # input the appropriate indices of your hash barcode in R2 here
  pos.umi = c(1,34)
)

If this succeeds (it will probably throw an error that barcode is not correct) and your readTable is populated with actual sequences (Cell and UMI should be identical full length reads), then I would follow this up with a combination of gsub() commands to extract the correct sequence for the cell barcodes and UMIs:

readTable$Cell <- gsub("CAGAGC.{8}", "", readTable$Cell) # this should remove the UMI and constant region CAGAGC and leave the concatenated hairpin and 10bp RT barcode
readTable$UMI <- gsub(".*?CAGAGC(.{8}).*", "\\1", readTable$UMI) # this should keep just the UMI

I assume your R1 reads will not be variable length (i.e. all 34 bases) even if the content you wish to extract within is variable (33-34 bases). This means there will be an extra base appended to the cell barcodes that used a 9-base hairpin but I don't think this should cause any problems as long as you're aware of it.

As an aside, since you'll likely want to pair your sample demultiplexing results with gene expression analysis of your sci-RNA-seq library, you'll probably need to confirm how the cell barcodes are generated in that analysis pipeline. For example, if the RT and hairpin sequences are concatenated in the opposite order or the 9-base hairpins have that extra base trimmed, you'll need to make adjustments so that once you have your deMULTIplex2 results, your sample assignments can be transferred over to your other analysis.

Let me know if that helps!

from demultiplex2.

YijiaChi avatar YijiaChi commented on July 19, 2024

Thank you very much for your detailed explanation and prompt response!! Your patient guidance has been incredibly helpful, and I truly appreciate the time and effort you've taken to assist me!!!

Attached is the script customized for sci-Plex, building upon your function:

readTags <- function (fastq.A = NA, fastq.B = NA, 
                      pos.cbc,  pos.sbc, pos.umi,
                      filter.cells = NULL, ...) 
{
  t0 <- Sys.time()
  
  if (length(fastq.A) < 1 | length(fastq.B) < 1) {
    return(message("Error - one or more FASTQ files not found"))
  }
  else if (length(fastq.A) > 1 | length(fastq.B) > 1) {
    return(message("Error - too many files with match names found"))
  }
  
  cat("### Building Read Table ###", fill = T)
  
  cat("Loading first set of reads...", fill = T)
  r <- readFastq(fastq.A)
  gc(verbose = F)
  
  read_table <- data.frame(Cell = subseq(sread(r),pos.cbc[1], pos.cbc[2]), 
                           UMI = subseq(sread(r), pos.umi[1], pos.umi[2]))
  
  read_table$Cell <- gsub("CAGAGC.{8}", "_", read_table$Cell) # this should remove the UMI and constant region CAGAGC and leave the concatenated hairpin and 10bp RT barcode
  read_table$UMI <- gsub(".*?CAGAGC(.{8}).*", "\\1", read_table$UMI) # this should keep just the UMI
  
  hp_check <- substr(x = read_table$Cell,start = 10,stop=10)
  
  # 34bp :9{6+8}10+T ;10{6+8}10
  # to be consistant with star_solo_output cell_id :[9 or 10 bp] hp_rtbc
  # if hairpin=9bp ,remove the last nucleutide of cell bc
  
  tmp <- read_table$Cell[hp_check== "_"]
  read_table$Cell[hp_check== "_"]<- substr(x = tmp,
                                           start = 1,
                                           stop = nchar(tmp)-1)
  tmp<-NULL
  
  cat("Finished processing first set of reads; unloading from memory", 
      fill = T)
  r <- NULL
  gc(verbose = F)
  cat("Loading second set of reads...", fill = T)
  r <- readFastq(fastq.B)
  gc(verbose = F)
  
  read_table$Sample <- as.character(subseq(sread(r), pos.sbc[1],pos.sbc[2]))
  
  cat("Finished processing second set of reads; unloading from memory", 
      fill = T)
  r <- NULL
  gc(verbose = F)
  cat("Finished parsing ", nrow(read_table), " read pairs", 
      sep = "", fill = T)
  if (is.null(filter.cells)) {
    cat("Keeping all reads because filter.cells = NULL", 
        fill = T)
  }
  else if (!is.null(filter.cells)) {
    cat("Filtering for ", length(filter.cells), " provided cell barcodes...", 
        sep = "", fill = T)
    ind <- which(read_table$Cell %in% filter.cells)
    read_table <- read_table[ind, ]
  }
  read_table <- read_table[, c("Cell", "Sample", 
                               "UMI")]
  cat("Finished building read table", fill = T)
  cat("\n")
  cat("Finished in", round(difftime(Sys.time(), t0, units = "mins")[[1]], 
                           1), "minutes", sep = " ")
  cat("\n")
  return(read_table)
}


}
readTable <- readTags(
  #filter.cells = NULL,
  fastq.A = R1,
  fastq.B = R2,
  pos.cbc = c(1,34),
  pos.sbc = c(1,10), # input the appropriate indices of your hash barcode in R2 here
  pos.umi = c(1,34)
)

dim(all_read_table)
keep <- (nchar(all_read_table$Cell) %in%  c(23,24))
sum(keep)
all_read_table <- all_read_table[sum(keep) ,]
#dir.create(exp)
write.table(all_read_table,file = paste0(exp,"/",exp,"_01_read_seq_table.txt"),
            quote = F,sep = "\t",row.names = F)


## Step2 : Aign the readtable to the reference bc =====

tag_mtx <- alignTags(all_read_table, ref_sg_seq)

tag_mtx[1:4,1:4]  # cell_bc * sg

summary(rowSums(tag_mtx))
summary(colSums(tag_mtx))

# filter low quality BC
cell_ids <- Matrix::rowSums(tag_mtx) >= 5  # umi_detected_in_a_cell: filter cells
tag_used <- Matrix::colSums(tag_mtx) >= 20   # umi_detected_in_a_sg  {expected cell/num of sg: 100}

sum(cell_ids)  # {~ expected cell number}
sum(tag_used)

tag_mtx <- tag_mtx[cell_ids,tag_used]

tag_df<- tag_mtx%>%as.matrix()%>%as.data.frame()%>%
  mutate(CBC=rownames(.))%>%select(CBC,everything())

write.table(tag_df,file = paste0(exp,"/",exp,"_02_sgRNA_CBC_mtx.txt"),
            quote = F,sep = "\t",row.names = F)

tag_df

# Step3 : Assign the sample identity to cell  [ho -> cbc] ====

res <- demultiplexTags(tag_mtx,
                       plot.path = paste0(exp),
                       plot.name = exp,
                       plot.diagnostics = T)
table(res$final_assign)

assign <- as.data.frame(res$assign_table)%>%
  mutate(CBC=rownames(.))

write.table(assign,file = paste0(exp,"/",exp,"_03_final_assign.txt"),
            quote = F,sep = "\t",row.names = F)

tagHist(tag_mtx = tag_mtx, 
        minUMI = 10, 
        plotnUMI = F)
ggsave(paste0(exp,"/","tagHist.pdf"))

pdf(paste0(exp,"/","tagCallHeatmap.pdf"))
tagCallHeatmap(tag_mtx = tag_mtx, 
               calls = res$final_assign,
               log=F)

dev.off()

# Error occur im the situation that 
        # one of the calltypes has only 1 cell
        # the demultiplexed hash oligo is less than the refenrence hash oligo


# calltypes <- unique(calls[duplicated(calls)])
# tmp <- tmp[, order]

 corrected  function TagHist() ( with bad my exp)

tagCallHeatmap

from demultiplex2.

qinzhu avatar qinzhu commented on July 19, 2024

Thanks Yijia for sharing your code with us!
We have never benchmarked deMULTIplex2 on sci-Plex data and we do not know how well the model fits the data. So please check the diagnostic plots output by deMULTIplex2. Let us know if anything weird shows up (does the distribution plot look similar to that in the paper? does the model fitting fail for some of your samples?).

from demultiplex2.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.