Code Monkey home page Code Monkey logo

scigenex's Introduction

Project Status: Active - The project has reached a stable, usable state and is being actively developed. License: MIT

SciGeneX repository

⏬ Installation

System requirements

The partitioning steps are currently performed using a system call to the Markov Cluster (MCL) algorithm that presently limits the use of DBF-MCL to unix-like platforms. Importantly, the mcl command should be in your PATH and reachable from within R (see dedicated section).

Step 1 - Installation of SciGeneX

From R

The scigenex library is currently not available in CRAN or Bioc. To install from github, use:

devtools::install_github("dputhier/scigenex")
library(scigenex)

From the terminal

Download the tar.gz from github or clone the main branch. Uncompress and run the following command from within the uncompressed scigenex folder:

R CMD INSTALL .

Then load the library from within R.

library(scigenex)

Step 2 - Installation of MCL

You may skip this step as the latest versions of SciGeneX will call scigenex::install_mcl()to install MCL in ~/.scigenex directory if this program is not found in the PATH.

Installation of MCL using install_mcl()

The install_mcl() has been developed to ease MCL installation. This function should be call automatically from within R when calling the gene_clustering() function. If install_mcl() does not detect MCL in the PATH it will install it in ~/.scigenex.

Installation of MCL from source

One also can install MCL from source using the following code.

# Download the latest version of mcl 
wget http://micans.org/mcl/src/mcl-latest.tar.gz
# Uncompress and install mcl
tar xvfz mcl-latest.tar.gz
cd mcl-xx-xxx
./configure
make
sudo make install
# You should get mcl in your path
mcl -h

Installation of MCL from sources

Finally you may install MCL using conda. Importantly, the mcl command should be available in your PATH from within R.

conda install -c bioconda mcl

Example

The scigenex library contains several datasets including the pbmc3k_medium which is a subset from pbmc3k 10X dataset.

library(Seurat)
library(scigenex)
set_verbosity(1)

# Load a dataset
load_example_dataset("7871581/files/pbmc3k_medium")

# Select informative genes
res <- select_genes(pbmc3k_medium,
                     distance = "pearson",
                     row_sum=5)
                     
# Cluster informative features
 
## Construct and partition the graph
res <- gene_clustering(res, 
                       inflation = 1.5, 
                       threads = 4)
                        
# Display the heatmap of gene clusters
res <- top_genes(res)
plot_heatmap(res, cell_clusters = Seurat::Idents(pbmc3k_medium))

πŸ“– Documentation

Documentation (in progress) is available at https://dputhier.github.io/scigenex/.

scigenex's People

Contributors

dputhier avatar juliebvs avatar sebastiennin avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

scigenex's Issues

Example dataset in doc

In the documentation it would be better for pedagogical purpose to download and uncompress the file.

	url = "https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz"
	download.file(url=url, 
	                  destfile = "pbmc3k_filtered_gene_bc_matrices.tar.gz")
	system("tar xvfz pbmc3k_filtered_gene_bc_matrices.tar.gz")
	pbmc_data <- Read10X(data.dir = "filtered_gene_bc_matrices/hg19/")

Store cell order/cluster

I dont know if there is any plan to store cell order/cluster into the clusterSet object after plot_heatmap. This would be valuable so that the user may manipulate the cell order/cluster later.
Best
Denis

example in plot_heatmap

I would suggest a code providing a set of clusters not a single one:

    library(scigenex)
    set.seed(123)
    m <- matrix(rnorm(40000), nc=20)
    m[1:100,1:10] <- m[1:100,1:10] + 4
    m[101:200,11:20] <- m[101:200,11:20] + 3
    m[201:300,5:15] <- m[201:300,5:15] + -2
    res <- DBFMCL(data=m,
                  distance_method="pearson",
                  av_dot_prod_min = 0,
                  inflation = 2,
                  k=25,
                  fdr = 10)

    plot_heatmap(object = res)

Improvement for get_genes function

It could be nice to add a parameter "top" in get_genes function to extract the top genes from a gene signature.

Ex : get_genes(obj, cluster = 1, top = TRUE)

silent argument in DBFMCL

The help file indicate:

#' @param silent if set to TRUE (default), the progression of distance matrix
#' calculation is not displayed.

My feeling is that it does not control this anymore. Maybe MCL messages ?

storing av_dot_prod

Hi,
I think it would be interesting to store the value of av_dot_prod in the DBFMCL object so that could re-run DBFMCL with and increased value of av_dot_prod_min to delete some contaminating clusters.
Best

Default inflation value

I have previously observed that pushing inflation too much results in loosing informative genes supported by few cells. Unless there are strong arguments against that I would definitively suggest a default inflation value set to 2.
Best

If no results of functional enrichment analysis for a specific cluster then visualization of results will stop at this cluster

Add a warning message to inform user that there is no result for cluster XX and keep continue the visualization of next clusters.

> res_gene <- enrich_analysis(object = res_gene, specie = "mmusculus")
[1] "Enrichment analysis for cluster 1"
[1] "Enrichment analysis for cluster 2"
[1] "Enrichment analysis for cluster 3"
[1] "Enrichment analysis for cluster 4"
[1] "Enrichment analysis for cluster 5"
[1] "Enrichment analysis for cluster 6"
[1] "Enrichment analysis for cluster 7"
[1] "Enrichment analysis for cluster 8"
[1] "Enrichment analysis for cluster 9"
[1] "Enrichment analysis for cluster 10"
[1] "Enrichment analysis for cluster 11"
[1] "Enrichment analysis for cluster 12"
[1] "Enrichment analysis for cluster 13"
[1] "Enrichment analysis for cluster 14"
[1] "Enrichment analysis for cluster 15"
[1] "Enrichment analysis for cluster 16"
[1] "Enrichment analysis for cluster 17"
[1] "Enrichment analysis for cluster 18"
No results to show
Please make sure that the organism is correct or set significant = FALSE
[1] "Enrichment analysis for cluster 19"
[1] "Enrichment analysis for cluster 20"
[1] "Enrichment analysis for cluster 21"
No results to show
Please make sure that the organism is correct or set significant = FALSE
[1] "Enrichment analysis for cluster 22"
> res_gene <- enrich_viz(object = res_gene)
|--  Plot enrichment analysis results for cluster 1 
|--  Plot enrichment analysis results for cluster 2 
|--  Plot enrichment analysis results for cluster 3 
|--  Plot enrichment analysis results for cluster 4 
|--  Plot enrichment analysis results for cluster 5 
|--  Plot enrichment analysis results for cluster 6 
|--  Plot enrichment analysis results for cluster 7 
|--  Plot enrichment analysis results for cluster 8 
|--  Plot enrichment analysis results for cluster 9 
|--  Plot enrichment analysis results for cluster 10 
|--  Plot enrichment analysis results for cluster 11 
|--  Plot enrichment analysis results for cluster 12 
|--  Plot enrichment analysis results for cluster 13 
|--  Plot enrichment analysis results for cluster 14 
|--  Plot enrichment analysis results for cluster 15 
|--  Plot enrichment analysis results for cluster 16 
|--  Plot enrichment analysis results for cluster 17 
|--  Plot enrichment analysis results for cluster 18 
Error in gostplot(object@cluster_annotations[[cur_cluster]], interactive = TRUE) : 
  The following columns are missing from the result: source_order, term_size, term_name, term_id, source, significant

Unstable results

There are some discrepancies in tests...

	test_that("Cheking DBFMCL is providing the right number of genes", {
	  set.seed(123)
	  m <- matrix(rnorm(80000), nc=20)
	  m[1:100,1:10] <- m[1:100,1:10] + 4
	  m[101:200,11:20] <- m[101:200,11:20] + 3
	  m[201:300,5:15] <- m[201:300,5:15] + -2
	  res <- DBFMCL(data=m,
	                distance_method="pearson",
	                av_dot_prod_min = 0,
	                inflation = 1.2,
	                k=25,
	                fdr = 10)
	  #plot_clust(res, ceil = 10, floor = -10)
	  expect_equal(length(res@size), 3)
	  expect_equal(res@size, c(109, 107, 81))
	})


	  ══ Failed tests ════════════════════════════════════════════════════════════════
	  ── Failure (test.dbfmcl.R:15:3): Cheking DBFMCL is providing the right number of genes ──
	  res@size not equal to c(109, 107, 81).
	  2/3 mismatches (average diff: 6)
	  [1] 115 - 109 == 6
	  [2] 113 - 107 == 6

plot_heatmap and blank lines

In the plot_heatmap() fonction the rownames corresponding to blank lines that separate the clusters are appearing. These rownames should be discarded or the corresponding lines if blank row names are not supported by the underlying function.

README example

The example in the README section is not very attractive...
I would propose

library(scigenex)
sed.seed(123)
m <- matrix(rnorm(40000), nc=20)
m[1:100,1:10] <- m[1:100,1:10] + 4
m[101:200,11:20] <- m[101:200,11:20] + 3
m[201:300,5:15] <- m[201:300,5:15] + -2
res <- DBFMCL(data=m,
              distance_method="pearson",
              av_dot_prod_min = 0,
              inflation = 2,
              k=25,
              fdr = 10)
plot_clust(res, ceil = 10, floor = -10)
write_clust(res, "ALL.sign.txt")

plot_heatmap: Error in matrix(ncol = ncol(m)) : data is too long

This is most probably due to the fact that hierarchical clustering is no more part of the function.

	> dbf_seurat <- top_genes(dbf, top = 10)
	|--  Results are stored in top_genes slot of ClusterSet object. 
	> plot_heatmap(dbf_seurat, use_top_genes = T)
	|--  Centering matrix. 
	|--  Ordering cells based on hierarchical clustering. 
	|--  Ceiling matrix. 
	|--  Flooring matrix. 
	Error in matrix(ncol = ncol(m)) : data is too long

It fails here when trying to access @cell_clusters$hclust_res$order which is NULL.

Best

    if (cell_ordering_method == "hclust") {
      print_msg("Ordering cells based on hierarchical clustering.", 
        msg_type = "DEBUG")
      m <- m[, object@cell_clusters$hclust_res$order]
      if (length(object@cell_clusters$labels) != 0) {
        object@cell_clusters$labels <- object@cell_clusters$labels[colnames(m)]
        object@cell_clusters$cores <- object@cell_clusters$cores[colnames(m)]
      }
    }

0 cluster found with spearman as distance

Using pbmc dataset from seurat and spearman correlation coef as distance, 1 cluster was found containing 799 genes and then filtered.

|-- Done
|-- creating file : /mnt/NAS7/Workspace/bavaisj/ciml-splab/BecomingLTi/PBMC_DBFMCL/03_Script/02_GeneSelectionWithDBFMCL//mnt/NAS7/SPlab/BIOINFO_PROJECT/BecomingLTi/PBMC_DBFMCL/05_Output/02_GeneSelectionWithDBFMCL/TestR/BecomingLTi_PBMC_DBFMCL_intermediate_result.mcl_out.txt
|-- Reading MCL output: /mnt/NAS7/SPlab/BIOINFO_PROJECT/BecomingLTi/PBMC_DBFMCL/05_Output/02_GeneSelectionWithDBFMCL/TestR/BecomingLTi_PBMC_DBFMCL_intermediate_result.mcl_out.txt
|-- 0 clusters conserved after MCL partitioning.
|-- 1 clusters filtered out from MCL partitioning (size and mean dot product).

Parameters used for this run :
The following parameters will be used :
Working directory: /mnt/NAS7/Workspace/bavaisj/ciml-splab/BecomingLTi/PBMC_DBFMCL/03_Script/02_GeneSelectionWithDBFMCL
Name: /mnt/NAS7/SPlab/BIOINFO_PROJECT/BecomingLTi/PBMC_DBFMCL/05_Output/02_GeneSelectionWithDBFMCL/TestR/BecomingLTi_PBMC_DBFMCL_intermediate_result
Distance method: spearman
Minimum average dot product for clusters: 0.2
Minimum cluster size: 10
Number of neighbors: 75
Number of randomizations: 3
FDR: 1 %
Inflation: 5
Visualize standard outputs from both mcl and cluster commands: FALSE
Memory used : 1024

We also change the parameters (average dot product, neighbors, inflation, and fdr) but nothing change.
However, we obtain many clusters using pearson correlation as distance.

We checked the formula of the spearman correlation coef in the C code and we saw that the formula is (distΒ²*6)/(n_sample*((n_sample*n_sample)-1)).
Shouldn't it be 1 - (distΒ²*6)/(n_sample*((n_sample*n_sample)-1)) ?

Failure (test.dbfmcl.R:12:3): Cheking DBFMCL is providing the right number of clusters

When running make test on OSX the number of selected genes seem to differ from UNIX. @JulieBvs can you check the test is working on Linux.

   make test

    Testing dbfmcl
    βœ” |  OK F W S | Context
    ⠏ |   0       | test.dbfmcl                                                     The following parameters will be used :
        Working directory:  /Users/puthier/Documents/git/project_dev/dbfmcl/tests/testthat
        Name:  exprs
        Distance method:  pearson
        Number of neighbors:  25
        Number of randomizations:  3
        FDR:  10 %
        Perform clustering:  FALSE
        Visualize standard outputs from both mcl and cluster commands:  FALSE
        Memory used :  1024

    Randomization: 7994001 (1/1.000    ratio)
    Seed = 123
    Pre-computation for distances
    Computing distances: 100.00%
    Randomization: 7994001 obtained, 7994001 asked
    Computing FDR: 100.00%
    Computing cut-off
    number of conserved genes = 309
    Building graph
    Genes   core = 309   extra = 0
    DBF done
    βœ– |   1 1     | test.dbfmcl [1.2 s]
    ────────────────────────────────────────────────────────────────────────────────
    Failure (test.dbfmcl.R:12:3): Cheking DBFMCL is providing the right number of clusters
    res@size not equal to 298.
    1/1 mismatches
    [1] 309 - 298 == 11

MCL package not available since last commit

Hi @dputhier ,
I did a pull this morning before making a new branch and now I got this error when running reinstall.

reinstall()
β„Ή Loading dbfmcl
Error: Dependency package(s) 'MCL' not available.
Run rlang::last_error() to see where the error occurred.
In addition: Warning message:
In (function (dep_name, dep_ver = "*") :
Dependency package 'MCL' not available.

Empty critical_distance slot

When running :

m <- matrix(rnorm(6000), nc=20)
m[1:100,1:10] <- m[1:100,1:10] + 4
m[101:200,11:20] <- m[101:200,11:20] + 3
m[201:300,5:15] <- m[201:300,5:15] + -2

res <- DBFMCL(data=m,
distance_method="pearson",
av_dot_prod_min = 0,
inflation = 1.2,
k=25,
fdr = 90)

res@critical_distance
[1] NA

enrich_go with Mmusculus

Error using enrich_go with "Mmusculus" as specie parameter.
Needs to replace hs in

query_entrezid <- AnnotationDbi::select(hs, 
                                           keys = query,
                                           columns = c("ENTREZID", "SYMBOL"),
                                           keytype = "SYMBOL")

Example not working

In clusterSet examples, the default example is not working due to the fact that we are calling MCL R library by default. Adding "mcl_cmd_line = T" typically fix the issue. In fact mcl_cmd_line should be proposed by default. We should even discard this dependency to MCL R library as this project is no more maintained and the version is really a poor implementation of the original MCL algorithm.

  ?"ClusterSet-class"

 res <- DBFMCL(data=m, distance_method="pearson", av_dot_prod_min = 0, inflation = 1.2, k=25, fdr = 10, mcl_cmd_line = T)

top_genes and clusters in plot_heatmap

Got this error when using plot_heatmap

	> plot_heatmap(dbf, use_top_genes=TRUE, cluster = 1)
	|--  Centering matrix. 
	|--  Ordering cells based on hierarchical clustering. 
	|--  Ceiling matrix. 
	|--  Flooring matrix. 
	Error in m[genes_top, ] : subscript out of bounds
	> dbf@top_genes
	           gene.top.1      gene.top.2      gene.top.3      gene.top.4      gene.top.5     
	cluster.1  "H2-K1"         "Ccl21a"        "Fcgbp"         "Ifi27l2a"      "H2-D1"        
	cluster.2  "Gapdh"         "Stmn1"         "Ubb"           "Hmgn2"         "Pclaf"        
	cluster.3  "Gm48228"       "Mgat4c"        "Gucy2g"        "Abca17"        "5033417F24Rik"
	cluster.4  "1700024I08Rik" "Gm17200"       "Crocc2"        "Gm45418"       "Gm19689"      
	cluster.5  "Gm36356"       "1700029N11Rik" "Ttll6"         "Gmnc"          "Tsix"         
	cluster.6  "mt-Co1"        "mt-Atp6"       "Rps14"         "Rps28"         "Rpl37a"       
	cluster.7  "Gm553"         "Gm5784"        "5430431A17Rik" "Gabra2"        "Trav10n"      
	cluster.8  "Rpl30"         "Rpl22"         "Atp5mpl"       "Rpl23"         "Rpl34"        
	cluster.9  "Rmnd5a"        "Endou"         "Ccnd3"         "Satb1"         "Ets2"         
	cluster.10 "Rplp1"         "Rpsa"          "Actb"          "Rplp0"         "Laptm5"       
	cluster.11 "Rps21"         "Fau"           "Rps26"         "Chchd2"        "Rpl8"         
	cluster.12 "Tmod2"         "Cngb1"         "Gm43848"       "Gm29562"       "Tigd5"        
	cluster.13 "Rps17"         "Map1lc3b"      "Psma3"         "Gm10076"       "Uqcr10"       
	cluster.14 "G530011O06Rik" "1700041G16Rik" "Mkrn3"         "Cmklr1"        "Zfp456"       
	cluster.15 "mt-Co3"        "mt-Co2"        "Rps16"         "Rps20"         "Tpt1"         
	cluster.16 "Rps19"         "Rpl19"         "Rps6"          "Rpl12"         "Rps5"         
	cluster.17 "Gucy2e"        "Spink12"       "Serinc2"       "Gm16196"       "Gm13391"      
	cluster.18 "B2m"           "Pdia3"         "Tmsb4x"        "Npc2"          "Atp1b3"       
	cluster.19 "Boll"          "Fignl2"        "F8"            "Sntb1"         "Hoxa5"        
	cluster.20 "Sis"           "D630023F18Rik" "Mpo"           "Padi4"         "Arl4d"        
	cluster.21 "B930018H19Rik" "Loxl4"         "Gm38037"       "A430110L20Rik" "Pcdh1"        
	           gene.top.6 gene.top.7      gene.top.8 gene.top.9 gene.top.10    
	cluster.1  "H2-Ab1"   "H2-Eb1"        "H2-Aa"    "Mgp"      "Igfbp4"       
	cluster.2  "Snrpg"    "Slc25a5"       "Calm2"    "Rpl15"    "Cox7a2"       
	cluster.3  "Myo3b"    "Egf"           "Upk1a"    "Mamstr"   "Cldn3"        
	cluster.4  "Sypl2"    "9630002D21Rik" "Meig1"    "Gm16006"  "Dnah10"       
	cluster.5  "Lrfn2"    "Chst5"         "Adora2b"  "Adm2"     "Gjb6"         
	cluster.6  "Rps29"    "mt-Nd4"        "Rpl10a"   "Rps27"    "Rpl3"         
	cluster.7  "Dgkk"     "4632428C04Rik" "Cxxc4"    "Tmem136"  "Wee2"         
	cluster.8  "Rpl37"    "Rps7"          "Rps4x"    "Rps15a"   "Atp5j2"       
	cluster.9  "Themis"   "Myb"           "Aqp11"    "Rag2"     "Arl5c"        
	cluster.10 "Coro1a"   "Rps10"         "Rps9"     "Rplp2"    "Eef2"         
	cluster.11 "Arhgdia"  "Selplg"        "Mbnl1"    "Marcksl1" "Atp5g3"       
	cluster.12 "Wtip"     "Dennd5b"       "Caskin2"  "Zfp11"    "Vpreb1"       
	cluster.13 "Cct5"     "Uba1"          "Mapk1"    "Tpi1"     "Esd"          
	cluster.14 "Poll"     "Gpr137"        "Ogfod3"   "Ndufaf6"  "L2hgdh"       
	cluster.15 "mt-Nd1"   "Rpl27a"        "Rps15"    "mt-Nd3"   "mt-Nd2"       
	cluster.16 "Rps2"     "Rpl18a"        "Hsp90ab1" "Hspe1"    "Psmb8"        
	cluster.17 "Fam189a2" "Gm4890"        "Ticam2"   "Grk5"     "Pmaip1"       
	cluster.18 "Nfkbia"   "Sdf4"          "Tagln2"   "Rgs10"    "Lbh"          
	cluster.19 "Lrrc25"   "Plek2"         "Rom1"     "Anxa8"    "A730063M14Rik"
	cluster.20 "Taf4b"    "Lrg1"          "Fam110b"  "Sept10"   "Fitm2"        
	cluster.21 "Rassf6"   "Gpr160"        "Gm5134"   "Gm13546"  "Zdhhc23" 

Error in function DBFMCL, "system mcl"

Hi,
I got an error when I try to run the example of the main github page of DBFMCL

I run those lines:

> m <- matrix(rnorm(80000), nc=20)
> m[1:100,1:10] <- m[1:100,1:10] + 4
> m[101:200,11:20] <- m[101:200,11:20] + 3
> m[201:300,5:15] <- m[201:300,5:15] + -2

I first got this error:

> res <- DBFMCL(data=m,
+               distance_method="pearson",
+               clustering=TRUE,
+               k=25)
Error in DBFMCL(data = m, distance_method = "pearson", clustering = TRUE,  : 
  unused argument (clustering = TRUE)

Then, when I removed the "clustering" parameter:

> res <- DBFMCL(data=m,
+               distance_method="pearson",
+               k=25)
The following parameters will be used : 
	Working directory:  /home/rstudio 
	Name:  1HCr50pP3f 
	Distance method:  pearson 
	Minimum average dot product for clusters:  2 
	Minimum cluster size:  10 
	Number of neighbors:  25 
	Number of randomizations:  3 
	FDR:  10 % 
	Inflation: 8 
	Visualize standard outputs from both mcl and cluster commands:  FALSE 
	Memory used :  1024 

Randomization: 7994001 (1/1.000    ratio)
Seed = 123
Pre-computation for distances
Computing distances: 100.00%
Randomization: 7994001 obtained, 7994001 asked
Computing FDR: 100.00%
Computing cut-off
number of conserved genes = 310
Building graph
Genes   core = 310   extra = 0
DBF done
sh: 1: mcl: not found
Error in if (system("mcl --version | grep 'Stijn van Dongen'", intern = TRUE) >  : 
  argument is of length zero
In addition: Warning message:
In system("mcl --version | grep 'Stijn van Dongen'", intern = TRUE) :
  running command 'mcl --version | grep 'Stijn van Dongen'' had status 1

Mean Dot product

This step is quite long. I think you could simply take randomly a subset (e.g. 20%) a the clustered gene to compute the mean dot product (setting a minimum number of genes).
Best

top_genes return a matrix with bad rownames

When using top_genes function, the rownames of the object@top_genes always start at 1. It may be good to make it consistent with the cluster parameters used as input.

set.seed(123)
m <- matrix(rnorm(40000), nc=20)
m[1:100,1:10] <- m[1:100,1:10] + 4
m[101:200,11:20] <- m[101:200,11:20] + 3
m[201:300,5:15] <- m[201:300,5:15] + -2
res <- DBFMCL(data=m,
            distance_method="pearson",
            av_dot_prod_min = 0,
            inflation = 2,
            k=25,
            fdr = 10)

res <- top_genes(res, cluster = 2, top = 20)

res@top_genes

          gene.top.1 gene.top.2 gene.top.3 gene.top.4 gene.top.5 gene.top.6 gene.top.7 gene.top.8 gene.top.9 gene.top.10 gene.top.11 gene.top.12 gene.top.13 gene.top.14 gene.top.15
cluster.1 "gene166"  "gene186"  "gene192"  "gene183"  "gene117"  "gene155"  "gene180"  "gene114"  "gene122"  "gene163"   "gene121"   "gene150"   "gene168"   "gene181"   "gene200"  
          gene.top.16 gene.top.17 gene.top.18 gene.top.19 gene.top.20
cluster.1 "gene171"   "gene104"   "gene113"   "gene165"   "gene189"  

av_dot_prod_min

When selecting clusters for dot product taking the average may be highly sensitive to outliers resulting in numerous spurious clusters. We should compute the median

  if (mean(cur_dot_prod) > av_dot_prod_min & length(h) >

 ===> 

if (median(cur_dot_prod) > av_dot_prod_min & length(h) >

This is examplified here with numerous signatures selected while they should be discarded.

image

DBFMCL filtering is unstable.

Depending the same dataset may provide different results over time with same parameters.
This is highly problematic for reproducibility, to write tests, but also to create a documentation.
Would be cool if @fafa13 would help us to fix it !

    library(devtools)
    devtools::install_github("dputhier/scigenex")
    library(scigenex)
    m <- matrix(rnorm(80000), nc=20)
    m[1:100,1:10] <- m[1:100,1:10] + 4
    m[101:200,11:20] <- m[101:200,11:20] + 3
    m[201:300,5:15] <- m[201:300,5:15] + -2
    res <- DBFMCL(data=m,
                  distance_method="pearson",
                  av_dot_prod_min = 0,
                  inflation = 1.2,
                  k=25,
                  fdr = 10)
   nrow(res)

add col dendrogram

Hi Julie,
It seems there is a function in iheatmapr to add the dendrogram to plot_heatmap. I think it would be a valable option.
Best
Denis

Improving the speed of filtering step

In the current implementation the dot product is computed whatever the size of the cluster (min_cluster_size). The cluster should be first tested for min_cluster_size and the remaining for av_dot_prod_min. This should improve the processing.
Best
Denis

"Reading MCL output: "

I think a message indicating "Reading and filtering MCL output: " would be more appropriate.
Best

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.