jgarces02 / flowct Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 20.24 MB

FlowCT package

Batchfile 0.35% Shell 0.19% R 99.46%

flowct's People

Contributors

Watchers

flowct's Issues

Add metadata for doing clustering

Add (clinical, FISH, treatment, % cells...) metadata for doing clustering... it's expected to make better clusters.

package requirement

processx is a dependency, it should be added...Thanks!

Batch correction new alternatives

Scanorama (py)
BBKNN (py)

issues in downloading package

In my windows 10 PC I had to add the following line just before downloading
options(download.file.method = "libcurl")
Not sure if always required

Speedup "multidensity"

multidensity <- function(fcs.SCE, assay.i, show.markers = "all", color.by = NULL, subsampling = NULL, interactive = F, ridgeline.lim = 15, colors = NULL){
  if(show.markers == "all") show.markers <- rownames(fcs.SCE)
  if(!is.null(subsampling)) suppressMessages(fcs.SCE <- sub.samples(fcs.SCE, subsampling = subsampling))

  data <- t(assay(fcs.SCE, i = assay.i))
  data2 <- cbind(data, colData(fcs.SCE))

  # prepare tables: for plotting and with median values for each marker
  median_df <- data.frame(antigen = show.markers, median = apply(data[,show.markers], 2, median))
  ggdf <- as.data.frame(melt(as.data.table(data2), measure.vars = show.markers, value.name = "expression", variable.name = "antigen"))

  if(is.null(colors)) colors <- div.colors(unique(length(ggdf[,color.by])))

  if(length(unique(fcs.SCE$filename)) > ridgeline.lim){
    g <- ggplot(data = ggdf[grepl(paste0(show.markers, collapse = "|"), ggdf$antigen),],
                aes_string(x = "expression", color = color.by, group = "filename")) +
      # geom_density(size = 0.5) +
      stat_density(geom = "line", position = "identity", size = 0.5) +
      facet_wrap(~ antigen, scales = "free") +
      geom_vline(data = median_df, aes(xintercept = median), linetype = 2, color = "gray55") +
      scale_color_manual(name = color.by, values = colors) +
      theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust = 1),
                              strip.text = element_text(size = 7), axis.text = element_text(size = 5))

    if(interactive) ggplotly(g) else print(g)
  }else{
    suppressMessages(print(ggplot(ggdf, aes_string(x = "expression", y = "filename")) +
                             geom_density_ridges(alpha = 0.7) +
                             facet_wrap(~ antigen, scales = "free") +
                             geom_vline(data = median_df, aes(xintercept = median), linetype = 2, color = "gray55") +
                             theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust = 1),
                                                     strip.text = element_text(size = 7), axis.text = element_text(size = 7))))
  }
}

colors_palette doesn't appear

colors_palette seem not to work from the internal data (sysdata.rda)

issue with fsom.clustering function: data not scaled and results not reproducible: code proposal

Hi, some issues in this part.
Data should be scaled to ensure correct clustering; I added set_seed to ensure consistent results, I removed the first if loop due to errors. Now everything seems fine and working.

fsom.clustering <- function(fcs.SE, assay.i = "normalized", scale.data= TRUE, markers.to.use = rownames(fcs.SE), markers.to.plot = NULL, k.metaclustering = 40, metaclustering.name = NULL){
  require(FlowSOM)
  set.seed(1234)
  data <- as.flowSet.SE(fcs.SE, assay.i)
  
  ## FSOM clustering
  cat("Calculating SOM clustering...\n")
  fsom <- suppressMessages(ReadInput(data, transform = FALSE, scale = scale.data)) #read data
  fsom <- suppressMessages(BuildSOM(fsom, colsToUse = markers.to.use)) #build SOM
  
  cat("Building MST...\n")
  fsom <- suppressMessages(BuildMST(fsom, tSNE = TRUE, silent = T)) #build MST for visualization of clustering
  
  #plot the MST to evaluate the marker fluorescence (or general tree) intensity for each SOM
  if(!is.null(markers.to.plot)){
    if(markers.to.plot == "tree"){
      PlotStars(fsom)
    }else{
      for(marker in markers.to.plot) PlotMarker(fsom, marker)
    }
  }
  
  ## Metaclustering
  if(!is.null(k.metaclustering)){
    cat("Metaclustering...\n")
    if(!is.null(metaclustering.name)){
      mc <- suppressMessages(ConsensusClusterPlus::ConsensusClusterPlus(t(fsom$map$codes), maxK = k.metaclustering, reps = 100,
                                                                        pItem = 0.9, pFeature = 1, title = metaclustering.name, plot = "pdf",
                                                                        clusterAlg = "hc", innerLinkage = "average", finalLinkage = "average",
                                                                        distance = "euclidean", seed = 333, verbose = F))
    }else{
      mc <- suppressMessages(ConsensusClusterPlus::ConsensusClusterPlus(t(fsom$map$codes), maxK = k.metaclustering, reps = 100,
                                                                        pItem = 0.9, pFeature = 1, title = "consensus_plots", plot = "pdf",
                                                                        clusterAlg = "hc", innerLinkage = "average", finalLinkage = "average",
                                                                        distance = "euclidean", seed = 333, verbose = F))
      unlink("consensus_plots", recursive = TRUE)
    }
    
    #get cluster ids for each cell
    code_clustering1 <- mc[[k.metaclustering]]$consensusClass %>% as.factor()
    cell_clustering1 <- code_clustering1[fsom$map$mapping[,1]]
    
    #add clustering to original MST and color by cluster colors
    PlotStars(fsom, backgroundValues = code_clustering1, backgroundColor = alpha(div.colors(length(code_clustering1)), alpha = 0.7))
    
    return(list(fsom = fsom, metaclusters = cell_clustering1, plotStars_value = code_clustering1))
  }else{
    return(fsom)
  }
}

DE analysis between cell clusters

markers.names malfunction

Impossible to check if colnames are correctly assigned because new_names = NULL doesn't show new names.
Error assigning new names...

Error in `colnames<-`(`*tmp*`, value = new_names) : 
  attempt to set 'colnames' on an object with less than two dimensions

Improve...

expr_no_transfL <- expr_no_transf[metadata_sc$FlowSOM == "CD8p",]
MATCH...

New (possible) normalization methods

From flowStats ---> gaussNorm (already implemented) and wrapSet: both based on per-channel landarmks.
From iFlow ---> gpaSet: multidimensional normalization method using the generalized Procrustes analysis.

error in subclustering: it should be done on the original SCE

This part should be replaced by the specific function after preparing the original SCE:

#to be replaced
fcsL <- fcs1000[,fcs1000$SOM_named == "lymphocytes"]
metadata(fcsL)$subclustering <- "lymphocytes"

###new code to be included in the script
fcs$SOM_named <- clusters.rename(fcs$SOM, cluster = replacedata$original_cluster, name = replacedata$new_cluster)
fcsL<-subset(fcs, ,SOM_named=="lymphocytes")

Pre-processing

Do not consider some markers (eg, time or CDx-H) in pre-processing and allow to delete them -> taking account flowAI time usage...

Conflicting architectures in hpclogin.unav.es

> fcs <- normalization.flw(fcs.SCE = fcs, marker.to.norm = c("CCR6", "CCR4"),
+                          norm.method = "harmony", var.to.use = "patient_id")

error: Mat::init(): requested size is too large

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: .External(list(name = "CppMethod__invoke_void", address = <pointer: 0x117fd830>,     dll = list(name = "Rcpp", path = "/home/cdasilvam/FlowCT.v2_environment1/renv/library/R-4.0/x86_64-pc-linux-gnu/Rcpp/libs/Rcpp.so",         dynamicLookup = TRUE, handle = <pointer: 0x1b6c8e90>,         info = <pointer: 0x19cfe20>), numParameters = -1L), <pointer: 0x29d889b0>,     <pointer: 0x361e00c0>, .pointer, ...)
 2: harmonyObj$setup(data_mat, phi, phi_moe, Pr_b, sigma, theta,     max.iter.cluster, epsilon.cluster, epsilon.harmony, nclust,     tau, block.size, lambda_mat, verbose)
 3: HarmonyMatrix(data_mat = assay(fcs.SCE, assay.i), meta_data = colData(fcs.SCE),     vars_use = var.to.use, do_pca = F, verbose = F)
 4: normalization.flw(fcs.SCE = fcs, marker.to.norm = c("CCR6", "CCR4"),     norm.method = "harmony", var.to.use = "patient_id")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

create.metaFCS

When creating it, use surface_markers instead select specific metadata cols.

Filenames crashes table with deleted events in QC

If filenames has the same beginning, like MO_1 and MO_10, the qc.and.removeDoublets function isn't able to show the table with deleted events...

misc TODO

IMPORTANT: External script to randomize analysis groups, according some spec vars.
barplot.cell.pops -> default option for color.by (avoid errors).
Remove cellID-dependence for subprocesses: subsampling...
Copy tree methods to our code for avoiding dependences.
surv.tree not correctly exported.
Keep proportion option in maxstat.
Add UMAP options to dim.reduction (and change tumap by umap?)

Add progress bar to:

fsom.metaclustering

cluster.heatmap

Specify different name for name it if not X11

Update data.table::melt

In data.table::melt(table(mnames)) :
  The melt generic in data.table has been passed a table and will attempt to redirect to the relevant reshape2 method; please note that reshape2 is deprecated, and this redirection is now deprecated as well. To continue using melt methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(table(mnames)). In the next version, this warning will become an error.

Show marker name in unify.FCSheaders

Collapse channel::marker to see if discrepancies or identical marker names (Irene's request).

flowAI remove events differently?

Same files... ????? Check metadata(fcs_se)$removed

Overlapped histogram for muitos patients

Calculate the median and highlight those patients with a SD higher than X...

Make colors_palette bigger

Divergent colors function... and link to colors_palette.

barplot.cell.pops, both percentage and counts

barplot.cell.pops_m <- function (cell_clusters, metadata, colname_sampleID, return_table)
{
  tab <- table(cell_clusters, metadata[,colname_sampleID])
  prop_tableL <- prop.table(tab, margin = 2) * 100
 
  ggdfL <- data.table::melt(prop_tableL, value.name = "proportion")
  colnames(ggdfL)[2] <- colname_sampleID
  mmL <- match(ggdfL[, colname_sampleID], metadata[, colname_sampleID])
  ggdfL <- data.frame(metadata[mmL, ], ggdfL)
  g <- ggplot(ggdfL, aes_string(x = "sample_id", y = "proportion",
                                fill = "cell_clusters")) + geom_bar(stat = "identity") +
    facet_wrap(~condition, scales = "free_x") + theme_minimal() +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    scale_fill_manual(values = colors_palette)
  if ("try-error" %in% class(suppressWarnings(try(x11(), silent = T)))) {
    cat("X11 is not active, boxplot is saved in -> barPlot_cell_prop.",
        deparse(substitute(cell_clusters)), ".jpg\n", sep = "")
    suppressMessages(ggsave(paste0("barPlot_cell_prop.",
                                   deparse(substitute(cell_clusters)), ".jpg"), device = "jpeg",
                            plot = g))
  }
  else {
    print(g)
  }
  if(return_table == "counts"){
    return(tab)
  }else{ #add "percentage" option and block any other possibilities
    return(prop_tableL)
  }
 
}

qc.and.removeDoublets(), errors and suggestions

Si se ha hecho la reducción o se han seleccionado una serie de eventos previamente (ej, LiB) hay que indicarle que no hay reduction_suffix y omitir la eliminación de los dobletes (si se han eliminado manualmente, luego podrían perderse eventos interesantes).

>qc.and.removeDoublets(reduction_suffix="")
Processing: Screening_013543_TubeT.fcs
Processing: Screening_014090_TubeT.fcs
Processing: Screening_014184_TubeT.fcs
Processing: Screening_014408_TubeT.fcs
Processing: Screening_014497_TubeT.fcs
Processing: Screening_014786_TubeT.fcs
Processing: Screening_015308_TubeT.fcs
Processing: Screening_015879_TubeT.fcs
Processing: Screening_016286_TubeT.fcs
Processing: Screening_016469_TubeT.fcs
Processing: Screening_016692_TubeT.fcs
Processing: Screening_017425_TubeT.fcs
Processing: Screening_017499_TubeT.fcs
Processing: Screening_017529_TubeT.fcs
Processing: Screening_017557_TubeT.fcs
Processing: Screening_017759_TubeT.fcs
Processing: Screening_017789_TubeT.fcs
Processing: Screening_018108_TubeT.fcs
Processing: Screening_018769_TubeT.fcs
Processing: Screening_018796_TubeT.fcs
Processing: Screening_019278_TubeT.fcs
Processing: Screening_019649_TubeT.fcs
Processing: Screening_020616_TubeT.fcs
Processing: Screening_020653_TubeT.fcs
Processing: Screening_021098_TubeT.fcs
Error in cat("WARNING! >", i, "has lost some much cells (more that 30%) in the QC and doublets removal steps, consider to review it!",  :
  argument is missing, with no default
In addition: Warning message:
In dir.create(output_folder) : 'results_preprocessing' already exists

suggestion for color palette

I just modified the div.colors function by taking advantage of the colorspace package. Please look at the function and think on the possibility of replacing the old one with this one (Why the 2nd part depends upon a number >74?). Attached you'll find an UMAP with this automatically generated qualitative palette.

div.colors <- function(n, set.seed = 333){
  require(RColorBrewer)
  require(colorspace)
  
  if(n < 74){ 
    col<-qualitative_hcl(n, palette = "Dark3")  
  }else{
    qual_col_pals = sample(brewer.pal.info)
    col_vector = sample(unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals))))
    col = sample(col_vector, 333)
    set.seed(set.seed); col <- sample(col, n, replace = T)
  }
  return(col)
}

In a flow cytometry package, the possibility to see 2D dotplot is essential: suggestion code

Here you'll find my code with result obtained!

df<-as.data.frame(t(assay(fcs1000,"logcounts")))
df_expr<-cbind(df, SOM=fcs1000$SOM)
#or if only selected clusters
df_expr<-df_expr[df_expr$SOM==c("1","2","3","4"),]

#   prepare the dotplot
pmain<-ggplot(df_expr, aes(x=CD4, y=CD8, color=SOM)) +
  geom_point(size=0.5)+
  scale_color_manual(values = color_clusters) +
  theme_bw()
###if density plot needed
#ggplot(df_expr, aes(x=CD4, y=CD8)) +
  # geom_point(size=0.5)+theme_bw() +geom_hex(bins = 200) +
  # scale_fill_continuous(type = "viridis") 

library(cowplot) 

# histogram along x axis
xdens <- axis_canvas(pmain, axis = "x")+
  scale_fill_manual(values = color_clusters) +
  geom_density(data = df_expr, aes(x = CD4, fill = SOM),
               alpha = 0.7, size = 0.2)
# histogram along y axis
ydens <- axis_canvas(pmain, axis = "y", coord_flip = TRUE)+
  scale_fill_manual(values = color_clusters) +
  geom_density(data = df_expr, aes(x = CD8, fill = SOM),
               alpha = 0.7, size = 0.2)+
  coord_flip()
p1 <- insert_xaxis_grob(pmain, xdens, grid::unit(.2, "null"), position = "top")
p2<- insert_yaxis_grob(p1, ydens, grid::unit(.2, "null"), position = "right")
ggdraw(p2)

data are not scaled: issue for dimensional reduction algorithms ==> proposed code

I added a new assay.i named "scaled" with scaled data....it should be used along the code as default to be sure to use the correct set of data..
Maybe some default parameters should be modified!

scale.data <- function(fcs.SE, assay.i = "normalized",scaled.matrix.name = "scaled"){
  data <- t(assay(fcs.SE, i = assay.i))
  rng <- matrixStats::colQuantiles(data, probs = c(0.01, 0.99))
  expr01 <- t((t(data) - rng[, 1]) / (rng[, 2] - rng[, 1]))
  expr01[expr01 < 0] <- 0
  expr01[expr01 > 1] <- 1
  SummarizedExperiment::assay(fcs.SE, i = scaled.matrix.name) <-t(expr01)
  return(fcs.SE)
  }

Add progress bar to dimensional reduction

      pb <- txtProgressBar(min = 0, max = nrow(gene.sum), 
        style = 3)
      for (i in 1:nrow(gene.sum)) {
        prot.dat = all.prot.dat[Hugo_Symbol %in% gene.sum[i, 
          Hugo_Symbol]]
        syn.res = rbind(syn.res, cluster_prot(prot.dat = prot.dat, 
          gene = gene.sum[i, Hugo_Symbol], th = gene.sum[i, 
            th], protLen = gene.sum[i, aa.length]))
        setTxtProgressBar(pb, i)
      }

(based in maftools::oncodrive::parse_prot)

Normalization step

densityplot(as.formula(paste0("~", marker)), datr, main = "normalized", xlim = lims.FCS(fcs), filter=curv1Filter(marker), legend = F)
Error: $ operator is invalid for atomic vectors

density does not allow to plot individual markers???

corplot.condition and diffdots.cell.clustering does not allow to select array

When I try to set array.i ="transformed" I have this error:
Error in assay(fcs.SCE, i = assay.i) :
'assay(, i="character", ...)' invalid subscript 'i'
'normalized' not in names(assays())

Going further...

Maybe we can applied this method in FlowCT?? Take a look!

v2, misc

To use a SummarizedExperiment to coordinately use both expression (both raw and transformed/normalized) and metadata information.

Load packages automatically (is it possible??)

unable to install devel version from github

This is the error...no problem if I download and install from tar.gz
we should look at all dependencies...
Error: (converted from warning) package 'Hmisc' was built under R version 4.0.2
Execution halted
ERROR: lazy loading failed for package 'FlowCT.devel'

EDIT: this error is not reproducible...I changed computer and it does not appear...For sure not related to FlowCT!

New approach for PCA/kmeans clustering of samples (for big datasets)

It would be useful to introduce the ggfortify package (https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_pca.html) to "cluster" all the different patients/samples (even at the subclustering step!)...in this way we can introduce a new variable that could be useful for downstream analysis!

jgarces02 / flowct Goto Github PK

flowct's People

Contributors

Watchers

flowct's Issues

Recommend Projects

Recommend Topics

Recommend Org