Code Monkey home page Code Monkey logo

cybayes's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

cybayes's Issues

Node slider

Node Slider move has to be implemented. Just select a node, its parent and parent's parent and change the branch length using a multiplier move and reinsert the parent randomly.

Faster matrix multiplications

Will need faster matrix multiplications since the profiling suggests that most of the time is spent in computing matrix multiplications which are part of numpy dot function. Is there any way to optimize it @Anaphory.

I tried tests with cython's blas wrapper with dgemm function. But the timing results are not better than np.dot function. Any feedback is welcome.

Tests

Tests using binary data of Rama et al.(2018) for five language families.

Add Extended SPR

Added Extended SPR for a controlled version of random SPR. Follow Yang's description.

Multiple aligned datasets

Multiple aligned datasets with preferably SCA/ASJP encoding might be useful for testing the phylogenies. Of course, we are interested in checking how the trees emerge from the runs. @Anaphory has some datasets that have small alphabet size to test.

@LinguList Shall I assign this to you?

Clock trees implementation

Clock trees implementation is something to think about.

Drummond et al. 2002 (http://www.genetics.org/content/161/3/1307.long) is the best paper that has MCMC moves close to what I implemented in this package.

  • Birth-death trees with incomplete sampling and extant languages. Yang and Rannala is the reference when using the tree probability as the prior.
  • Uniform tree prior from Ronquist et al. (2012) is the paper for uniform prior.
  • Yang's chapter on coalescent process is the book chapter which presents the theory behind coalescent prior that is easy to simulate using a exponential waiting time. I will add some code to simulate.
  • Strict clock involves adding a single parameter that links the linguistic branch lengths to geographical timescale.

How to generate a time tree is the question? As of now the trees are rooted. However, a time tree would have all the tips at same level. That will introduce extra complications that might need to be handled. Any ideas on how to sample a starting tree are welcome.

Tests?

Need to write tests for testing where cybayes fail. More cleaning of code to increase efficiency might be required. May be, @xrotwang can suggest some ways to test the code.

Change README

Change README to accommodate the change of input file format.

Cython Blas vs numpy dot

@robertostling I am using numpy's dot multiplication to multiply a KxK matrix with a KxS matrix. K is in the range of 2 to 40 whereas S is in the range of 500 to 10000. The dot product is the main bottleneck since most of the calls are here:

LL_mat[parent] = p_t[parent,child].dot(ll_mats[child])

I am directly calling the numpy dot within Cython. Do you think directly calling BLAS function matrix multiplication would be faster? If I can bring the cost down then it would be great since direct calling might be faster.

Gamma Dirichlet Prior

A Gamma Dirichlet prior from Rannala et al. (2012) for better tree lengths than the exponential prior being used now.

Leaning chain

A Leaning chain option should be implemented and tested.

Guided SPR

A guided SPR based on cluster similarity. Choose a branch randomly. Select the parent and child edge also. Select the edge whose subtree has the highest similarity to the branch to be pruned and regrafted. This is is more guided than the current version where the parent is attached and tested.

Metropolis coupled MCMC (MCMCMC)

The next step is to implement MCMCMC for sampling trees. The process essentially consists of a cold chain and 2<= n <= N hot chains whose exponent is set to 1/(1+alpha*(n-1)) and alpha is 0.1.

Briefly, there will be an exchange of states between two chains once in a while following an acceptance ratio given here:
http://bamm-project.org/mc3.html

The question is how to exchange states. Essentially, this is parallel simulated annealing. @robertostling any thoughts about this. I don't have much experience with MPI programming. May be there is simpler way to achieve this.

Initialize gamma site rates

Initialize Gamma site rates alpha to 1 always to prevent underflow errors. Already done in cogbayes branch. Fix for all the branches.

Add uniform prior

Add uniform prior as in Ronquist et al. (2012). Does not seem difficult to implement.

Rescaling with caching

Implementing rescaling with caching will improve the speed of the simulated tempering chains.

Autotuning

Add autotuning of proposal distribution hyperparameters

Node slider proposal ratio

Node slider proposal ratio needs to be fixed. As of now I added a slider edge to keep things working.

Clean up repo

Many files are repetitions that need to be thrown out or moved.

Simulated Annealing in MH algorithm

@robertostling Does it make sense to anneal the likelihood during burnin? I googled around a bit. But did not find much literature. My idea is to anneal the likelihood from low temperature to high temperature during burnin and then sample sample at high temperature for the rest of the chain.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.