The cybayes from phylostar

Cache scores in mat_mcmc_gamma

Cache scores like in the gamma less system to fasten inference.

Node slider

Node Slider move has to be implemented. Just select a node, its parent and parent's parent and change the branch length using a multiplier move and reinsert the parent randomly.

Bug in ML_scaled function

The scaling function is not executed right when calculating the ML function.

Ascertainment bias correction

Add ascertainment bias correction for binary characters.

Serial tempering hyperparameters

Leaning chain
Adjustment lengt
number of chains
convergence ratio

Will need faster matrix multiplications since the profiling suggests that most of the time is spent in computing matrix multiplications which are part of numpy dot function. Is there any way to optimize it @Anaphory.

I tried tests with cython's blas wrapper with dgemm function. But the timing results are not better than np.dot function. Any feedback is welcome.

Tests

Tests using binary data of Rama et al.(2018) for five language families.

Add Extended SPR

Added Extended SPR for a controlled version of random SPR. Follow Yang's description.

Multiple aligned datasets

Multiple aligned datasets with preferably SCA/ASJP encoding might be useful for testing the phylogenies. Of course, we are interested in checking how the trees emerge from the runs. @Anaphory has some datasets that have small alphabet size to test.

@LinguList Shall I assign this to you?

Clock trees implementation

Clock trees implementation is something to think about.

Drummond et al. 2002 (http://www.genetics.org/content/161/3/1307.long) is the best paper that has MCMC moves close to what I implemented in this package.

Birth-death trees with incomplete sampling and extant languages. Yang and Rannala is the reference when using the tree probability as the prior.
Uniform tree prior from Ronquist et al. (2012) is the paper for uniform prior.
Yang's chapter on coalescent process is the book chapter which presents the theory behind coalescent prior that is easy to simulate using a exponential waiting time. I will add some code to simulate.
Strict clock involves adding a single parameter that links the linguistic branch lengths to geographical timescale.

How to generate a time tree is the question? As of now the trees are rooted. However, a time tree would have all the tips at same level. That will introduce extra complications that might need to be handled. Any ideas on how to sample a starting tree are welcome.

Tests?

Need to write tests for testing where cybayes fail. More cleaning of code to increase efficiency might be required. May be, @xrotwang can suggest some ways to test the code.

Change README

Change README to accommodate the change of input file format.

Cython Blas vs numpy dot

@robertostling I am using numpy's dot multiplication to multiply a KxK matrix with a KxS matrix. K is in the range of 2 to 40 whereas S is in the range of 500 to 10000. The dot product is the main bottleneck since most of the calls are here:

CyBayes/ML_gamma.pyx

Line 25 in a1dc917

LL_mat[parent] = p_t[parent,child].dot(ll_mats[child])

I am directly calling the numpy dot within Cython. Do you think directly calling BLAS function matrix multiplication would be faster? If I can bring the cost down then it would be great since direct calling might be faster.

Gamma Dirichlet Prior

A Gamma Dirichlet prior from Rannala et al. (2012) for better tree lengths than the exponential prior being used now.

Leaning chain

A Leaning chain option should be implemented and tested.

Guided SPR

A guided SPR based on cluster similarity. Choose a branch randomly. Select the parent and child edge also. Select the edge whose subtree has the highest similarity to the branch to be pruned and regrafted. This is is more guided than the current version where the parent is attached and tested.

Metropolis coupled MCMC (MCMCMC)

The next step is to implement MCMCMC for sampling trees. The process essentially consists of a cold chain and 2<= n <= N hot chains whose exponent is set to 1/(1+alpha*(n-1)) and alpha is 0.1.

Briefly, there will be an exchange of states between two chains once in a while following an acceptance ratio given here:
http://bamm-project.org/mc3.html

The question is how to exchange states. Essentially, this is parallel simulated annealing. @robertostling any thoughts about this. I don't have much experience with MPI programming. May be there is simpler way to achieve this.

Initialize gamma site rates

Initialize Gamma site rates alpha to 1 always to prevent underflow errors. Already done in cogbayes branch. Fix for all the branches.

Rooted NNI optimize

Optimize rooted NNI calculation seems to have a bug here.

Fix it.

CyBayes/mat_mcmc_gamma.py

Line 147 in 2d4d235

elif move.__name__ == "rooted_NNI":

Inverse Subtree Scaler

Implement a subtree scaler that scales branch lengths upto root.

Add tree multiplier

Add a tree multiplier move.

PSRF for serial tempering convergence

Compute PSRF to evaluate the serial tempering convergence. Another way is to compute the tree distance and then check convergence.

Random seed

Add a random seed option

Add uniform prior

Add uniform prior as in Ronquist et al. (2012). Does not seem difficult to implement.

Min and Max branch lengths

Add minimum and maximum branch lengths sampling code to nodeslider and branch scaler move.

Rescaling with caching

Implementing rescaling with caching will improve the speed of the simulated tempering chains.

Handle missing data

Use masked array to calculate likelihood.

Autotuning

Add autotuning of proposal distribution hyperparameters

Remove separate files for gamma sites

Node slider proposal ratio

Node slider proposal ratio needs to be fixed. As of now I added a slider edge to keep things working.

Check Gamma dirichlet prior

Settings review and close. Especially the rate vs. scale settings.

Rescaling with Caching for NNI

Rescaling for caching for NNI has to be done.

Clean up repo

Many files are repetitions that need to be thrown out or moved.

Add fixed population size coalescent tree prior

Add a fixed population size coalescent tree prior with independent gamma branch rates.

Low acceptance rate for mvDual Slider

mvDualSlider has low acceptance rate. This has to be fixed.

Print prior value

when printing to a file.

Site rates Alpha parameter

Site rates alpha parameter sampling has to be checked to see if it works well.

Add temperature states visits

Print the counter of the temperature states visited for future analysis.

Change random starting tree to parsimony tree

Change starting tree to greedy stepwise addition parsimony tree.

ML frequencies fixed option

ML estimation of frequencies should be fixed.

Gamma site rates

Gamma site rates implement.

Add Birth-Death Prior

Add a MRCA conditioned Birth-Death Prior

Add GTR implementation

Have to add a GTR implementation.

User Random Tree

Add code for working with user specified tree

Simulated Annealing in MH algorithm

@robertostling Does it make sense to anneal the likelihood during burnin? I googled around a bit. But did not find much literature. My idea is to anneal the likelihood from low temperature to high temperature during burnin and then sample sample at high temperature for the rest of the chain.

phylostar / cybayes Goto Github PK

cybayes's People

Stargazers

Watchers

cybayes's Issues

Recommend Projects

Recommend Topics

Recommend Org