The gseadv's discuss from llrs

Provide functions for entropy and mutual information

In this paper "On Entropy and Information in Gene Interaction Networks" they explore perturbations of a network, it has an associated package at ucsd-ccbb/gsia that could be interesting to use or to provide coverage for it or use for other analysis

Compare same terms in different GeneSetCollections

A user request a method to compare the same GeneSet names of different GeneSetCollections (GSC), by counting the number of genes in each.

Could go on the line of compare(GSC1, GSC2) and then look for same pathways/GeneSet names and compare the number in each case. Could return the total number of shared names and the differences between them in a data.frame.

Would it be expandable to more than two GSC? Yes if in the names of the data.frame we set the names of the GSC being compared and return NA if the term is not shared between two GSC.
Like: Terms GSC1_GSC2 GSC1_GSC3 GSC2_GSC3

Statistical behind

This distribution seems interesting

Check if all genes are connected

From the https://github.com/llrs/integration-helper see the correct function (although it would require some tweaking to improve efficiency to the scales of gene sets)

Appveyor

Add to install from biocondutor (from), perhaps install first bioconductor then the other packages

Graph from GeneSetCollections

I should suggest or introduce how to use the adjacency for creating networks, using the graph.adjacency (idea from biostars)

A Reactome only study

Antitesis of add

Add the possibility to drop a relationship but without removing completely all the genes or GeneSets. The opposite of add (which really adds a relationship, and if not present it adds a Gene or a GeneSet.)

Use the Kullback-Leibler distance

Idea taken from here (already saved the reference)

See if this

https://twitter.com/AchimZeileis/status/1064802740663132160

x <- diag(5) # Or any other matrix
table(x, row(x))

This could come handy for counting size of pathways, or the distribution of genes. Or the centrality of the elements.

This would need a bit more care like adding colnames to the resulting tables or something alike

Check out PathwaysCommons

See this, to use for an article and as a examples or for tests

Local adjacency

Interesting paper on local adjacency https://www.biorxiv.org/content/early/2018/11/04/460964.full.pdf

Improve README

Add the original object stats with the simulations in Figure 3 of the README.
Also do the simulation with the number of genes.

Calculate the number of combinations that exists

While from* creates the new GSC objects the should be a function that calculates how many combinations exists that hold this true.

Use dynamic programming in the functions `from*`

Many function that simulate a GSC are quite slow (more than 300 iterations ~ 1 min) if they reach a solution in timely manner.

Proposed change: use dynamic programming, create the amount of data and remove when they are picked. It might improve time results.

Avoid Vectorize function

In views of recommendations to avoid Vectorize it would be better to chenge this. Possibly ask in codereview?

GSEAdv/R/nested.R

Line 9 in 1ed18f4

all_in_vec <- Vectorize(all_in, vectorize.args = c("x", "y"))

Estimating the number of genes and pathways

While estimating it would be better to provide a parameter if we assume that the original object pass the check function or otherwise

Example

This software can be implemented using the methods on this package.

Make it more general

At the moment it is only designed for GeneSetCollections, but I expect that it will be a need to cluster cell lines. As such it would be interesting to have a general set class in bioconductor and then a subclass "GeneSet", and equivalent "SetCollection" and the "GeneSetCollection".

This would require a feature request in https://github.com/bioconductor/GSEABase

Introduce helper to extract information from genes and pathways

For both gene.R and pathway.R it would be nice to introduce a helper, either shared or not.

It should help also on the case of one gene/pathway or multiples

llrs / gseadv Goto Github PK

gseadv's Issues

Recommend Projects

Recommend Topics

Recommend Org