Code Monkey home page Code Monkey logo

e-mtab-6141's Introduction

A simple tutorial for a complex ComplexHeatmap

Kevin Blighe 2020-05-28

1, introduction

The data used for the heatmap is taken from a large bulk RNA-seq study on rheumatoid arthritis synovial biopsies, a project on which I co-led as Senior Bioinformatician. The work was published as Molecular Portraits of Early Rheumatoid Arthritis Identify Clinical and Treatment Response Phenotypes (Lewis et al. (2019)).

ComplexHeatmap (Gu, Eils, and Schlesner (2016)) is an R Programming Language (R Core Team (2020)) package that is currently listed in the Bioconductor package repository.

2, install and load required packages

  require(RColorBrewer)
  require(ComplexHeatmap)
  require(circlize)
  require(digest)
  require(cluster)

If all load successfully, proceed to Part 3. Otherwise, go through the following code chunks in order to ensure that each package is installed and loaded properly.

BiocManager (Morgan (2019))

  if (!requireNamespace('BiocManager', quietly = TRUE))
    install.packages('BiocManager')

RColorBrewer (Neuwirth (2014))

  if (!requireNamespace('RColorBrewer', quietly = TRUE))
    BiocManager::install('RColorBrewer')

  require(RColorBrewer)

ComplexHeatmap (Gu, Eils, and Schlesner (2016))

  if (!requireNamespace('ComplexHeatmap', quietly = TRUE))
    BiocManager::install('ComplexHeatmap')

  require(ComplexHeatmap)

circlize (Gu et al. (2014))

  if (!requireNamespace('circlize', quietly = TRUE))
    BiocManager::install('circlize')

  require(circlize)

digest (Antoine Lucas et al. (2020))

  if (!requireNamespace('digest', quietly = TRUE))
    BiocManager::install('digest')

  require(digest)

cluster (Maechler et al. (2019))

  if (!requireNamespace('cluster', quietly = TRUE))
    BiocManager::install('cluster')

  require(cluster)

3, obtain the data

Sample information and raw data FASTQ files are stored in the public domain under accessions E-MTAB-6141 and PRJEB23131, respectively.

I have separately stored an expression matrix and sample metadata in my GitHub repository (github.com/kevinblighe/). The data will be downloaded to temporary files that will later be deleted by your operating system. The files are presented as uncompressed, plain text in order to ensure compatibility across all platforms outside of a standard build system.

  tmpfile <- tempfile()
  download.file('https://github.com/kevinblighe/E-MTAB-6141/raw/master/rdata/mat.tsv',
    tmpfile, method = 'auto')
  mat <- read.table(tmpfile, sep = '\t', row.names = 1,
    header = TRUE, stringsAsFactors = FALSE)

  tmpfile <- tempfile()
  download.file('https://github.com/kevinblighe/E-MTAB-6141/raw/master/rdata/metadata.tsv',
    tmpfile, method = 'auto')
  metadata <- read.table(tmpfile, sep = '\t', row.names = 1,
    header = TRUE, stringsAsFactors = FALSE)

  tmpfile <- tempfile()
  download.file('https://github.com/kevinblighe/E-MTAB-6141/raw/master/rdata/sig_genes.list',
    tmpfile, method = 'auto')
  sig_genes <- read.table(tmpfile, sep = '\t',
    header = FALSE, stringsAsFactors = FALSE)[,1]

Check the md5 checksums to ensure data integrity / security. The checksums should be:

  • ‘mat’ object: 1fbbe9568738577a2f3e3dc42e6c75cf
  • ‘metadata’ object: 542a40ccf8b14c51ffa45361c5d3aed9
  • ‘sig_genes’ object: fdc3e52c9cf0adff3747c7683b69d371
  digest::digest(mat, algo = 'md5')
## [1] "1fbbe9568738577a2f3e3dc42e6c75cf"
  digest::digest(metadata, algo = 'md5')
## [1] "542a40ccf8b14c51ffa45361c5d3aed9"
  digest::digest(sig_genes, algo = 'md5')
## [1] "fdc3e52c9cf0adff3747c7683b69d371"

Take a look at the contents of the data.

  # first 5 rows; first 5 columns
  mat[1:5,1:5]
##         SAM9103802 SAM9103803 SAM9103804 SAM9103805  SAM9103806
## A1BG     0.1288745  0.1637147 -0.1106011 -0.1113405  0.09776126
## A1CF     1.4491133  1.6378292  1.4676648  1.5119170  1.40215292
## A2M     15.0932787 14.8324464 14.7205315 14.6949978 14.70800150
## A2ML1    3.4826292  3.7443431  4.4786253  3.1842529  4.80886450
## A3GALT2  1.2259417  1.0704200  1.2452696  1.1006621  1.11011460
  # take a peek at the metadata
  head(metadata)
##            Pathotype CD3 CD20 CD68L CD68SL CD138
## SAM9103802  Lymphoid   2    3     1      3     3
## SAM9103803  Lymphoid   3    4     4      3     3
## SAM9103804   Myeloid   3    0     0      4     0
## SAM9103805  Lymphoid   2    2     3      1     1
## SAM9103806   Fibroid   0    0     0      1     0
## SAM9103807   Fibroid   0    0    NA      1     0
  # take a peek at the genes identified as statistically significant
  head(sig_genes)
## [1] "A2ML1"      "AADACL2"    "ABCA10"     "ABCA12"     "AC010646.3"
## [6] "ACACB"
  # dimensions of expression data and metadata, and length of sig_genes
  dim(mat)
## [1] 19279    87
  dim(metadata)
## [1] 87  6
  length(sig_genes)
## [1] 2772
  # verify integrity of metadata and expression matrix:
  # --check that both objects are aligned by name
  all(rownames(metadata) == colnames(mat))
## [1] TRUE
  # Subset the expression matrix for the statistically significant genes
  mat <- mat[sig_genes,]

4, generate the heatmap

[a] scale the data to Z-scores (by row)

This is quite standard when performing clustering and generating a heatmap.

  heat <- t(scale(t(mat)))

[b] set colour scheme and choose breaks

  myCol <- colorRampPalette(c('dodgerblue', 'black', 'yellow'))(100)
  myBreaks <- seq(-3, 3, length.out = 100)

[c] create annotation: histo-pathotype and histology scores

First, we will just generate some colour mappings for the metadata histology scores.

  # CD3
    cd3 <- metadata$CD3
    cd3 <- cd3[!is.na(cd3)] # remove missing values - we don't want to include these in the mapping
    pick.col <- brewer.pal(9, 'Greens')
    col.cd3 <- colorRampPalette(pick.col)(length(unique(cd3)))

  # CD20
    cd20 <- metadata$CD20
    cd20 <- cd20[!is.na(cd20)]
    pick.col <- brewer.pal(9, 'Blues')
    col.cd20 <- colorRampPalette(pick.col)(length(unique(cd20)))

  # CD68L
    cd68L <- metadata$CD68L
    cd68L <- cd68L[!is.na(cd68L)]
    pick.col <- brewer.pal(9, 'Reds')
    col.cd68L <- colorRampPalette(pick.col)(length(unique(cd68L)))

  # CD68SL
    cd68SL <- metadata$CD68SL
    cd68SL <- cd68L[!is.na(cd68L)]
    pick.col <- brewer.pal(9, 'Oranges')
    col.cd68SL <- colorRampPalette(pick.col)(length(unique(cd68SL)))

  # CD138
    cd138 <- metadata$CD138
    cd138 <- cd138[!is.na(cd138)]
    pick.col <- brewer.pal(9, 'Purples')
    col.cd138 <- colorRampPalette(pick.col)(length(unique(cd68SL)))

The use of brewer.pal and colorRampPalette above are to just automatically produce hexidecimal colour codes that will be used later for mapping between each histology score and each colour.

  unique(col.cd3)
## [1] "#F7FCF5" "#C7E9C0" "#74C476" "#238B45" "#00441B"
  unique(col.cd20)
## [1] "#F7FBFF" "#C6DBEF" "#6BAED6" "#2171B5" "#08306B"
  unique(col.cd68L)
## [1] "#FFF5F0" "#FCBBA1" "#FB6A4A" "#CB181D" "#67000D"
  unique(col.cd68SL)
## [1] "#FFF5EB" "#FDD0A2" "#FD8D3C" "#D94801" "#7F2704"
  unique(col.cd138)
## [1] "#FCFBFD" "#DADAEB" "#9E9AC8" "#6A51A3" "#3F007D"

Now let’s build the actual annotation object, i.e., the legend:

  # Create an initial data-frame of the annotation that we want to use
  # In this example, the 'ann' object turns out to be the exact same as 'metadata'
  ann <- data.frame(
    Pathotype = metadata$Pathotype,
    CD3 = metadata$CD3,
    CD20 = metadata$CD20,
    CD68L = metadata$CD68L,
    CD68SL = metadata$CD68SL,
    CD138 = metadata$CD138,
    stringsAsFactors = FALSE)

  # create the colour mapping
  colours <- list(
    Pathotype = c('Lymphoid' = 'blue', 'Myeloid' = 'red', 'Fibroid' = 'green3', 'Ungraded' = 'grey'),
    CD3 = c('0' = '#F7FCF5', '1' = '#C7E9C0', '2' = '#74C476', '3' = '#238B45', '4' = '#00441B'),
    CD20 = c('0' = '#F7FBFF', '1' = '#C6DBEF', '2' = '#6BAED6', '3' = '#2171B5', '4' = '#08306B'),
    CD68L = c('0' = '#FFF5F0', '1' = '#FCBBA1', '2' = '#FB6A4A', '3' = '#CB181D', '4' = '#67000D'),
    CD68SL = c('0' = '#FFF5EB', '1' = '#FDD0A2', '2' = '#FD8D3C', '3' = '#D94801', '4' = '#7F2704'),
    CD138 = c('0' = '#FCFBFD', '1' = '#DADAEB', '2' = '#9E9AC8', '3' = '#6A51A3', '4' = '#3F007D'))

  # now create the ComplexHeatmap annotation object
  # as most of these parameters are self-explanatory, comments will only appear where needed
  colAnn <- HeatmapAnnotation(
    df = ann,
    which = 'col', # 'col' (samples) or 'row' (gene) annotation?
    na_col = 'white', # default colour for any NA values in the annotation data-frame, 'ann'
    col = colours,
    annotation_height = 0.6,
    annotation_width = unit(1, 'cm'),
    gap = unit(1, 'mm'),
    annotation_legend_param = list(
      Pathotype = list(
        nrow = 4, # number of rows across which the legend will be arranged
        title = 'Pathotype',
        title_position = 'topcenter',
        legend_direction = 'vertical',
        title_gp = gpar(fontsize = 12, fontface = 'bold'),
        labels_gp = gpar(fontsize = 12, fontface = 'bold')),
      CD3 = list(
        nrow = 5,
        title = 'CD3',
        title_position = 'topcenter',
        legend_direction = 'vertical',
        title_gp = gpar(fontsize = 12, fontface = 'bold'),
        labels_gp = gpar(fontsize = 12, fontface = 'bold')),
      CD20 = list(
        nrow = 5,
        title = 'CD20',
        title_position = 'topcenter',
        legend_direction = 'vertical',
        title_gp = gpar(fontsize = 12, fontface = 'bold'),
        labels_gp = gpar(fontsize = 12, fontface = 'bold')),
      CD68L = list(
        nrow = 5,
        title = 'CD68L',
        title_position = 'topcenter',
        legend_direction = 'vertical',
        title_gp = gpar(fontsize = 12, fontface = 'bold'),
        labels_gp = gpar(fontsize = 12, fontface = 'bold')),
      CD68SL = list(
        nrow = 5,
        title = 'CD68SL',
        title_position = 'topcenter',
        legend_direction = 'vertical',
        title_gp = gpar(fontsize = 12, fontface = 'bold'),
        labels_gp = gpar(fontsize = 12, fontface = 'bold')),
      CD138 = list(
        nrow = 5,
        title = 'CD138',
        title_position = 'topcenter',
        legend_direction = 'vertical',
        title_gp = gpar(fontsize = 12, fontface = 'bold'),
        labels_gp = gpar(fontsize = 12, fontface = 'bold'))))

Believe me, there are many more parameters that can be configured than those which I show here.

[d] create annotation: box-and-whisker plots

  boxplotCol <- HeatmapAnnotation(
    boxplot = anno_boxplot(
      heat,
      border = FALSE,
      gp = gpar(fill = '#CCCCCC'),
      pch = '.',
      size = unit(2, 'mm'),
      axis = TRUE,
      axis_param = list(
        gp = gpar(fontsize = 12),
        side = 'left')),
      annotation_width = unit(c(2.0), 'cm'),
      which = 'col')

  boxplotRow <- HeatmapAnnotation(
    boxplot = row_anno_boxplot(
      heat,
      border = FALSE,
      gp = gpar(fill = '#CCCCCC'),
      pch = '.',
      size = unit(2, 'mm'),
      axis = TRUE,
      axis_param = list(
        gp = gpar(fontsize = 12),
        side = 'top')),
      annotation_width = unit(c(2.0), 'cm'),
      which = 'row')

[e] create annotation: gene labels

Many heatmaps are produced from a large number of variables / genes, which result in it being difficult to label each gene in the plot space. Here, we can ‘step through’ the variables / genes and choose to only label a select few.

The number of rows (genes) in our object is:

2772

In this code snippet, we ‘step through’ the rownames and only retain every 40th successive label.

  genelabels <- rowAnnotation(
    Genes = anno_mark(
      at = seq(1, nrow(heat), 40),
      labels = rownames(heat)[seq(1, nrow(heat), 40)],
      labels_gp = gpar(fontsize = 10, fontface = 'bold'),
      padding = 0.75),
      width = unit(2.0, 'cm') +

      max_text_width(
        rownames(heat)[seq(1, nrow(heat), 40)],
        gp = gpar(fontsize = 10,  fontface = 'bold')))

[f] perform partitioning around medoids (PAM) to identify clusters in the data

Performing k-means or PAM on our data can help us to identify internal ‘structure’ in the data that may relate to biologically meaningful pathways, as an example.

  pamClusters <- cluster::pam(heat, k = 4) # pre-select k = 4 centers
  pamClusters$clustering <- paste0('Cluster ', pamClusters$clustering)

  # fix order of the clusters to have 1 to 4, top to bottom
  pamClusters$clustering <- factor(pamClusters$clustering,
    levels = c('Cluster 1', 'Cluster 2', 'Cluster 3', 'Cluster 4'))

[g] create the actual heatmap object

  hmap <- Heatmap(heat,

    # split the genes / rows according to the PAM clusters
      split = pamClusters$clustering,
      cluster_row_slices = FALSE,

    name = 'Gene\nZ-\nscore',

    col = colorRamp2(myBreaks, myCol),

    # parameters for the colour-bar that represents gradient of expression
      heatmap_legend_param = list(
        color_bar = 'continuous',
        legend_direction = 'vertical',
        legend_width = unit(8, 'cm'),
        legend_height = unit(5.0, 'cm'),
        title_position = 'topcenter',
        title_gp=gpar(fontsize = 12, fontface = 'bold'),
        labels_gp=gpar(fontsize = 12, fontface = 'bold')),

    # row (gene) parameters
      cluster_rows = TRUE,
      show_row_dend = TRUE,
      #row_title = 'Statistically significant genes',
      row_title_side = 'left',
      row_title_gp = gpar(fontsize = 12,  fontface = 'bold'),
      row_title_rot = 90,
      show_row_names = FALSE,
      row_names_gp = gpar(fontsize = 10, fontface = 'bold'),
      row_names_side = 'left',
      row_dend_width = unit(25,'mm'),

    # column (sample) parameters
      cluster_columns = TRUE,
      show_column_dend = TRUE,
      column_title = '',
      column_title_side = 'bottom',
      column_title_gp = gpar(fontsize = 12, fontface = 'bold'),
      column_title_rot = 0,
      show_column_names = FALSE,
      column_names_gp = gpar(fontsize = 10, fontface = 'bold'),
      column_names_max_height = unit(10, 'cm'),
      column_dend_height = unit(25,'mm'),

    # cluster methods for rows and columns
      clustering_distance_columns = function(x) as.dist(1 - cor(t(x))),
      clustering_method_columns = 'ward.D2',
      clustering_distance_rows = function(x) as.dist(1 - cor(t(x))),
      clustering_method_rows = 'ward.D2',

    # specify top and bottom annotations
      top_annotation = colAnn,
      bottom_annotation = boxplotCol)

[j] draw the heatmap

  draw(hmap + genelabels,
    heatmap_legend_side = 'left',
    annotation_legend_side = 'right',
    row_sub_title_side = 'left')

Molecular Portraits of Early Rheumatoid Arthritis Identify Clinical and Treatment Response Phenotypes

5, extra: change colour scheme, breaks, and do extra clustering on columns

Modifying colour and breaks can help to emphasise expression patterns in the heatmap.

Columns (samples) can be clustered ‘on the fly’ via column_km.

  myCol <- colorRampPalette(c('royalblue', 'white', 'red3'))(100)
  myBreaks <- seq(-1.5, 1.5, length.out = 100)

  hmap1 <- Heatmap(heat,

    name = 'Gene Z-score',

    col = colorRamp2(myBreaks, myCol),

    heatmap_legend_param = list(
      color_bar = 'continuous',
      legend_direction = 'horizontal',
      legend_width = unit(8, 'cm'),
      legend_height = unit(5.0, 'cm'),
      title_position = 'topcenter',
      title_gp=gpar(fontsize = 30, fontface = 'bold'),
      labels_gp=gpar(fontsize = 24, fontface = 'bold')),

    cluster_rows = TRUE,
    show_row_dend = TRUE,
    row_title = 'Statistically significant genes',
    row_title_side = 'left',
    row_title_gp = gpar(fontsize = 30,  fontface = 'bold'),
    row_title_rot = 90,
    show_row_names = FALSE,
    row_names_gp = gpar(fontsize = 11, fontface = 'bold'),
    row_names_side = 'left',
    row_dend_width = unit(25,'mm'),

    cluster_columns = TRUE,
    show_column_dend = TRUE,
    column_title = 'Samples',
    column_title_side = 'bottom',
    column_title_gp = gpar(fontsize = 30, fontface = 'bold'),
    column_title_rot = 0,
    show_column_names = FALSE,
    column_names_gp = gpar(fontsize = 8, fontface = 'bold'),
    column_names_max_height = unit(10, 'cm'),
    column_dend_height = unit(25,'mm'),

    clustering_distance_columns = function(x) as.dist(1 - cor(t(x))),
    clustering_method_columns = 'ward.D2',
    clustering_distance_rows = function(x) as.dist(1 - cor(t(x))),
    clustering_method_rows = 'ward.D2')


  myCol <- colorRampPalette(c('forestgreen', 'black', 'purple'))(100)
  myBreaks <- seq(-2, 2, length.out = 100)

  hmap2 <- Heatmap(heat,

    split = pamClusters$clustering,
    cluster_row_slices = FALSE,

    column_km = 6,

    name = 'Gene Z-score',
    col = colorRamp2(myBreaks, myCol),

    heatmap_legend_param = list(
      color_bar = 'continuous',
      legend_direction = 'horizontal',
      legend_width = unit(8, 'cm'),
      legend_height = unit(5.0, 'cm'),
      title_position = 'topcenter',
      title_gp=gpar(fontsize = 30, fontface = 'bold'),
      labels_gp=gpar(fontsize = 24, fontface = 'bold')),

    cluster_rows = TRUE,
    show_row_dend = FALSE,
    #row_title = 'Statistically significant genes',
    row_title_side = 'right',
    row_title_gp = gpar(fontsize = 30,  fontface = 'bold'),
    row_title_rot = 90,
    show_row_names = FALSE,
    row_names_gp = gpar(fontsize = 12, fontface = 'bold'),
    row_names_side = 'left',
    row_dend_width = unit(25,'mm'),

    cluster_columns = TRUE,
    show_column_dend = TRUE,
    column_title = 'Samples',
    column_title_side = 'bottom',
    column_title_gp = gpar(fontsize = 30, fontface = 'bold'),
    column_title_rot = 0,
    show_column_names = FALSE,
    column_names_gp = gpar(fontsize = 8, fontface = 'bold'),
    column_names_max_height = unit(10, 'cm'),
    column_dend_height = unit(25,'mm'),

    clustering_distance_columns = function(x) as.dist(1 - cor(t(x))),
    clustering_method_columns = 'ward.D2',
    clustering_distance_rows = function(x) as.dist(1 - cor(t(x))),
    clustering_method_rows = 'ward.D2')

  pushViewport(viewport(layout = grid.layout(nr = 1, nc = 2)))
    pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1))
      draw(hmap1,
        heatmap_legend_side = 'top',
        row_sub_title_side = 'left',
        newpage = FALSE)
      popViewport()

    pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2))
      draw(hmap2,
        heatmap_legend_side = 'top',
        row_sub_title_side = 'right',
        newpage = FALSE)
      popViewport()
  popViewport()

Molecular Portraits of Early Rheumatoid Arthritis Identify Clinical and Treatment Response Phenotypes

6, session info

sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=pt_BR.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=pt_BR.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=pt_BR.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] cluster_2.1.0        digest_0.6.25        circlize_0.4.8      
## [4] ComplexHeatmap_2.2.0 RColorBrewer_1.1-2   knitr_1.28          
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.4.6        magrittr_1.5        colorspace_1.4-1   
##  [4] clue_0.3-57         rjson_0.2.20        rlang_0.4.5        
##  [7] highr_0.8           stringr_1.4.0       tools_3.6.3        
## [10] parallel_3.6.3      xfun_0.13           png_0.1-7          
## [13] htmltools_0.4.0     yaml_2.2.1          BiocManager_1.30.10
## [16] GlobalOptions_0.1.1 shape_1.4.4         evaluate_0.14      
## [19] rmarkdown_2.1       stringi_1.4.6       compiler_3.6.3     
## [22] GetoptLong_0.1.8

7, references

R Core Team (2020)

Antoine Lucas et al. (2020)

Morgan (2019)

Maechler et al. (2019)

Lewis et al. (2019)

Gu, Eils, and Schlesner (2016)

Gu et al. (2014)

Neuwirth (2014)

Antoine Lucas, Dirk Eddelbuettel with contributions by, Jarek Tuszynski, Henrik Bengtsson, Simon Urbanek, Mario Frasca, Bryan Lewis, Murray Stokely, et al. 2020. Digest: Create Compact Hash Digests of R Objects. https://CRAN.R-project.org/package=digest.

Gu, Zuguang, Roland Eils, and Matthias Schlesner. 2016. “Complex Heatmaps Reveal Patterns and Correlations in Multidimensional Genomic Data.” Bioinformatics.

Gu, Zuguang, Lei Gu, Roland Eils, Matthias Schlesner, and Benedikt Brors. 2014. “Circlize Implements and Enhances Circular Visualization in R.” Bioinformatics 30 (19): 2811–2.

Lewis, Myles J, Michael R Barnes, Kevin Blighe, Katriona Goldmann, Sharmila Rana, and et al. 2019. “Molecular Portraits of Early Rheumatoid Arthritis Identify Clinical and Treatment Response Phenotypes.” https://pubmed.ncbi.nlm.nih.gov/31461658/.

Maechler, Martin, Peter Rousseeuw, Anja Struyf, Mia Hubert, and Kurt Hornik. 2019. Cluster: Cluster Analysis Basics and Extensions.

Morgan, Martin. 2019. BiocManager: Access the Bioconductor Project Package Repository. https://CRAN.R-project.org/package=BiocManager.

Neuwirth, Erich. 2014. RColorBrewer: ColorBrewer Palettes. https://CRAN.R-project.org/package=RColorBrewer.

R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

e-mtab-6141's People

Contributors

kevinblighe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

e-mtab-6141's Issues

Column cluster distance vs Row cluster distance

Hello Kevin,

Thank you for this tutorial that has been very useful (even 4 years later).

I have a question regarding the cluster distance metric you use, specifically regarding the difference between row and column distance.

You define the following:

clustering_distance_columns = function(x) as.dist(1 - cor(t(x))), clustering_method_columns = 'ward.D2', clustering_distance_rows = function(x) as.dist(1 - cor(t(x))), clustering_method_rows = 'ward.D2',

I understand that, for rows (genes), you use 1 - Pearson correlation of the transposed matrix.

But I see that you use the same formula for the column (sample) clustering.
In the case of columns, shouldn't it be the 1 - Pearson correlation of the matrix itself? e.g:

clustering_distance_columns = function(x) as.dist(1 - cor(x))

I'm new to the field of RNAseq analysis, so forgive me is the question is naive, but I cannot visualise what it means to use the distance of the rows as metrics for the column clustering.

Thank you for your insight, and have a good day.

Processed normalized data of blood samples

Hi Kevin,

Could you please provide the processed file having normalized count or RLD files for blood RNA-seq data. I am interested to look into correlation of few genes across samples.

Thanks
Gyan

logCPM

--Hi,
is your mat.tsv table normalized such as logCPM ?
thank you --

two errors in plot of windows

hi, I am trying to repeat the code you supplied, but found 2 errors in windows R. thanks in advance

1 'Gene\nZ-\nscore' will give errors in windows, not like linux, windows does not support \n, maybe it connected with file name
2 one line code repeat in the following
# row (gene) parameters cluster_rows = TRUE, show_row_dend = TRUE, #row_title = 'Statistically significant genes', row_title_side = 'left', row_title_gp = gpar(fontsize = 12, fontface = 'bold'), row_title_rot = 90, show_row_names = FALSE, **row_names_gp = gpar(fontsize = 10, fontface = 'bold'),** row_names_side = 'left', row_dend_width = unit(25,'mm'),


and I found the gene names font size is so big, I change the row_title_gp = gpar(fontsize = 1, fontface = 'bold'), but does not make sense, how should I revise it?

and after change Gene\nZ-\nscore to Gene_Z--score, it outsided the plot area,
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.