Code Monkey home page Code Monkey logo

raxml-ng's Introduction

RAxML Next Generation

Build Status DOI License

Introduction

RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion. Its search heuristic is based on iteratively performing a series of Subtree Pruning and Regrafting (SPR) moves, which allows to quickly navigate to the best-known ML tree. RAxML-NG is a successor of RAxML (Stamatakis 2014) and leverages the highly optimized likelihood computation implemented in libpll (Flouri et al. 2014).

RAxML-NG offers improvements in speed, flexibility and user-friendliness over the previous RAxML versions. It also implements some of the features previously available in ExaML (Kozlov et al. 2015), including checkpointing and efficient load balancing for partitioned alignments (Kobert et al. 2014).

RAxML-NG is currently under active development, and the mid-term goal is to have most functionality of RAxML 8.x covered. You can see some of the planned features here.

Documentation: github wiki

Installation instructions

  • For most desktop Unix/Linux and macOS systems, the easiest way to install RAxML-NG is by using the pre-compiled binary:
    Download Linux binary (x86)
    Download OSX/macOS binary (x86 and ARM)

  • For clusters/supercomputers (i.e., if you want to use MPI), please use the following installation package which contains pre-built libpll. You will need GCC 6.4+ and CMake 3.0.2+ in order to compile RAxML-NG for your system.
    Download RAxML-NG-MPI for Linux

  • On Windows, you can use linux binary via Windows Subsystem for Linux, but performance might be lower than with native Linux execution.

  • If neither of the above options worked for you, please clone this repository and build RAxML-NG from scratch.

1. Install the dependecies. On Ubuntu (and other Debian-based systems), you can simply run:

sudo apt-get install flex bison libgmp3-dev

For other systems, please make sure you have following packages/libraries installed:

If you do not want to use git submodules (e.g., for packaging), you also need to install:

2. Build RAxML-NG.

PTHREADS version:

git clone --recursive https://github.com/amkozlov/raxml-ng
cd raxml-ng
mkdir build && cd build
cmake ..
make

MPI version:

git clone --recursive https://github.com/amkozlov/raxml-ng
cd raxml-ng
mkdir build && cd build
cmake -DUSE_MPI=ON ..
make

Portable PTHREADS version (static linkage, compatible with old non-AVX CPUs):

git clone --recursive https://github.com/amkozlov/raxml-ng
cd raxml-ng
mkdir build && cd build
cmake -DSTATIC_BUILD=ON -DENABLE_RAXML_SIMD=OFF -DENABLE_PLLMOD_SIMD=OFF ..
make

Documentation and Support

Documentation can be found in the github wiki. For a quick start, please check out the hands-on tutorial.

Also please check the online help with raxml-ng -h.

If still in doubt, please feel free to post to the RAxML google group.

Usage examples

  1. Perform single quick&dirty tree inference on DNA alignment (one parsimony starting tree, general time-reversible model, ML estimate of substitution rates and nucleotide frequencies, discrete GAMMA model of rate heterogeneity with 4 categories):

    ./raxml-ng --search1 --msa testDNA.fa --model GTR+G

  2. Perform an all-in-one analysis (ML tree search + non-parametric bootstrap) (10 randomized parsimony starting trees, fixed empirical substitution matrix (LG), empirical aminoacid frequencies from alignment, 8 discrete GAMMA categories, 200 bootstrap replicates):

    ./raxml-ng --all --msa testAA.fa --model LG+G8+F --tree pars{10} --bs-trees 200

  3. Optimize branch lengths and free model parameters on a fixed topology (using multiple partitions with proportional branch lengths)

    ./raxml-ng --evaluate --msa testAA.fa --model partitions.txt --tree test.tree --brlen scaled

  4. Map support values from existing set of replicate trees:

    ./raxml-ng --support --tree bestML.tree --bs-trees bootstraps.tree

License and citation

The code is currently licensed under the GNU Affero General Public License version 3.

When using RAxML-NG, please cite this paper:

Alexey M. Kozlov, Diego Darriba, Tomáš Flouri, Benoit Morel, and Alexandros Stamatakis (2019) RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics, 35 (21), 4453-4455 doi:10.1093/bioinformatics/btz305

The team

  • Alexey Kozlov
  • Alexandros Stamatakis
  • Diego Darriba
  • Tomáš Flouri
  • Benoit Morel
  • Ben Bettisworth
  • Sarah Lutteropp
  • Julia Haag
  • Anastasis Togkousidis

References

  • Stamatakis A. (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9): 1312-1313. doi:10.1093/bioinformatics/btu033

  • Flouri T., Izquierdo-Carrasco F., Darriba D., Aberer AJ, Nguyen LT, Minh BQ, von Haeseler A., Stamatakis A. (2014) The Phylogenetic Likelihood Library. Systematic Biology, 64(2): 356-362. doi:10.1093/sysbio/syu084

  • Kozlov A.M., Aberer A.J., Stamatakis A. (2015) ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics (2015) 31 (15): 2577-2579. doi:10.1093/bioinformatics/btv184

  • Kobert K., Flouri T., Aberer A., Stamatakis A. (2014) The divisible load balance problem and its application to phylogenetic inference. Brown D., Morgenstern B., editors. (eds.) Algorithms in Bioinformatics, Vol. 8701 of Lecture Notes in Computer Science. Springer, Berlin, pp. 204–216

raxml-ng's People

Contributors

amkozlov avatar benoitmorel avatar bredelings avatar computations avatar pierrebarbera avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

raxml-ng's Issues

Fast support measures

There now is an entire zoo with fast support measures, e.g., aLRT test and derivatives thereof, ultrafast bootstrap (probably not applicable due to the nature of the search algorithm), quartet-based measures: We need to decide at some point which ones we want to integrate.

Uninformative error if partition file ends with a blank line

Hi there

I found a small bug when using partition files that contain a blank line at the end. raxml-ng terminates with an uninformative error message. While fixing the partition file is fine, it would be helpful if the error message hinted at a malformed partition file. A blank line in the partition file may not be obvious to the user. Apologies in advance if I missed/omitted something.

RAxML-NG v. 0.5.0 BETA released on 10.09.2017 by The Exelixis Lab.
Authors: Alexey Kozlov, Alexandros Stamatakis, Diego Darriba, Tomas Flouri, Benoit Morel.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

WARNING: This is a BETA release, please use at your own risk!

RAxML-NG was called as follows:

raxml-ng-mpi --threads 4 --evaluate --msa ./RT.B.DE.nuc.aligned.fixed.fsa --tree RTIQ.bllenfix.nuc.treefile --model rt.b.de.raxmlng.part --brlen scaled --prefix rt.b.de.nuc.raxml

Analysis options:
run mode: Evaluate tree likelihood
start tree(s): user
random seed: 1513523851
tip-inner: ON
pattern compression: ON
per-rate scalers: OFF
site repeats: OFF
branch lengths: ML estimate (proportional)
SIMD kernels: AVX2
parallelization: PTHREADS (4 threads)

[00:00:00] Reading alignment from file: ./RT.B.DE.nuc.aligned.fixed.fsa
[00:00:00] Loaded alignment with 23403 taxa and 1377 sites

ERROR: h6�

Different results from different version

Hi
When I set this dataset as the input alignment, link
I got different loglikelihood from different versions. The model I used:
PTHREADS: --msa alignment.phy --model GTR+G+F
MPI: --msa alignment.phy --model GTR+G+F --threads 1
In PTHREADS version, the loglikehood is about -14406, while in MPI version, the loglikelihood is about -14018. I have tested several times but got similar results. I am just wondering why this is happening.
Thanks in advance.

HPC raxml-ng-mpi make compilation error

Hi,

I'm currently trying to build the mpi version of raxml.

I got the following error:

/usr/bin/ld: /home/dhb13/source/raxml-ng-mpi/localdeps/lib/libpllmodalgorithm_static.a(pllmod_algorithm.c.o): unrecognized relocation (0x2a) in section `.text'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [../bin/raxml-ng-mpi] Error 1
make[1]: *** [src/CMakeFiles/raxml_module.dir/all] Error 2
make: *** [all] Error 2

I'm using gcc 6.2 and openmpi/3.1.0.

Do you have any suggestions for how to fix this error?

Support AVX512F

Since raxml-ng relies on libpll, maybe this feature request is best lodged there rather than here. A quick look shows a similar feature request open since 2013 but presumably for the KNC add-in card and the conclusion was "we should not do this on the current code from the master branch." https://app.assembla.com/spaces/phylogenetic-likelihood-library/tickets/72-port-likelihood-function-xeon-phi/details#

So I think here might be the best place?

Rationale for this request is both personal and general -- personally, I have observed ExaML-KNL is up to 5x faster than NG on my KNL machine; generally, we will all soon have AVX512 anyway! :)

Stop of RAxML-NG with unknown error

Hello,
I tried to run RAxML-NG using a partitioned dataset with models defined by ModelTest-NG. However, RAxML-NG stopped prematurely (see the attached file). I tried to compile the tool myself and get almost the same error; only the path was different.

Thank you!
screenlog.txt

compilation fails

Hello,
I'm getting the following errors when compile the software:
ParallelContext.hpp:32:61: error: ‘function’ in namespace ‘std’ does not name a template type
static void init_pthreads(const Options& opts, const std::function<void()>& thread_main);
ParallelContext.hpp:32:69: error: expected ‘,’ or ‘...’ before ‘<’ token
static void init_pthreads(const Options& opts, const std::function<void()>& thread_main);
ParallelContext.hpp:45:38: error: ‘std::function’ has not been declared
static void mpi_gather_custom(std::function<int(void*,int)> prepare_send_cb,
ParallelContext.hpp:45:46: error: expected ‘,’ or ‘...’ before ‘<’ token
static void mpi_gather_custom(std::function<int(void*,int)> prepare_send_cb,
ParallelContext.hpp:83:57: error: ‘function’ in namespace ‘std’ does not name a template type
static void start_thread(size_t thread_id, const std::function<void()>& thread_main);
ParallelContext.hpp:83:65: error: expected ‘,’ or ‘...’ before ‘<’ token
static void start_thread(size_t thread_id, const std::function<void()>& thread_main);
Checkpoint.cpp:165:58: error: no matching function for call to ‘ParallelContext::mpi_gather_custom(CheckpointManager::gather_model_params()::<lambda(void*, size_t)>&, CheckpointManager::gather_model_params()::<lambda(void*, size_t)>&)’
ParallelContext::mpi_gather_custom(worker_cb, master_cb);

Can you maybe give me a hint about what I'm doing wrong?
Thank you
Alex

Obtain a sub-optimum tree with LH cutoff?

Hello,

Thanks for developing and maintaining this fascinating method.

I want to search an ML tree in my population data. I know it will be very difficult to converge into a bifurcating tree, but I only want to learn about large branches currently, and not hoping to spend a large chunk of time to "optimize" tips. I notice the likelihood value increased quickly and keep increasing just a slight bit (but taking a couple hours each round). In this case, can I stop the run and get the current tree? Or is that any way to set a likelihood cutoff to stop the run? I notice the "--spr-cutoff" parameter but not quite understand what it means and how to set a proper value.

Below is the command I used.

raxml-ng --msa Chr1.imputed.fa --model GTR+G --threads 50 --seed 12315

Any suggestions are welcome. Thanks ahead!

Shujun

doesn't compile against any version of libpll

I can't seem to compile raxml-ng against any version of libpll. I have tried 0.3.2, 0.3.0 and 0.2.0. which version do you compile against ???

libpll provides only one header file. common.h is calling a bunch.
a number of constants aren't defined anywhere in libpll, such as PLLMOD_OPT_MAX_ALPHA

I have tried 0.3.2, 0.3.0 and 0.2.0. which version do you compile against ???

HPC Make error: ld returned 1 exit status

Wondering if you can help with this error after running make on HPC raxml-ng_v0

$ make
Scanning dependencies of target raxml_module
[  3%] Building CXX object src/CMakeFiles/raxml_module.dir/Checkpoint.cpp.o
[  7%] Building CXX object src/CMakeFiles/raxml_module.dir/CommandLineParser.cpp.o
[ 10%] Building CXX object src/CMakeFiles/raxml_module.dir/LoadBalancer.cpp.o
[ 14%] Building CXX object src/CMakeFiles/raxml_module.dir/MSA.cpp.o
[ 17%] Building CXX object src/CMakeFiles/raxml_module.dir/Model.cpp.o
[ 21%] Building CXX object src/CMakeFiles/raxml_module.dir/Optimizer.cpp.o
[ 25%] Building CXX object src/CMakeFiles/raxml_module.dir/Options.cpp.o
[ 28%] Building CXX object src/CMakeFiles/raxml_module.dir/ParallelContext.cpp.o
[ 32%] Building CXX object src/CMakeFiles/raxml_module.dir/PartitionAssignment.cpp.o
[ 35%] Building CXX object src/CMakeFiles/raxml_module.dir/PartitionInfo.cpp.o
[ 39%] Building CXX object src/CMakeFiles/raxml_module.dir/PartitionedMSA.cpp.o
[ 42%] Building CXX object src/CMakeFiles/raxml_module.dir/SystemTimer.cpp.o
[ 46%] Building CXX object src/CMakeFiles/raxml_module.dir/Tree.cpp.o
[ 50%] Building CXX object src/CMakeFiles/raxml_module.dir/TreeInfo.cpp.o
[ 53%] Building CXX object src/CMakeFiles/raxml_module.dir/log.cpp.o
[ 57%] Building CXX object src/CMakeFiles/raxml_module.dir/main.cpp.o
[ 60%] Building CXX object src/CMakeFiles/raxml_module.dir/sysutil.cpp.o
[ 64%] Building CXX object src/CMakeFiles/raxml_module.dir/io/NewickStream.cpp.o
[ 67%] Building CXX object src/CMakeFiles/raxml_module.dir/io/RBAStream.cpp.o
[ 71%] Building CXX object src/CMakeFiles/raxml_module.dir/io/binary_io.cpp.o
[ 75%] Building CXX object src/CMakeFiles/raxml_module.dir/io/msa_streams.cpp.o
[ 78%] Building CXX object src/CMakeFiles/raxml_module.dir/io/part_info.cpp.o
[ 82%] Building CXX object src/CMakeFiles/raxml_module.dir/bootstrap/BootstopCheck.cpp.o
[ 85%] Building CXX object src/CMakeFiles/raxml_module.dir/bootstrap/BootstrapGenerator.cpp.o
[ 89%] Building CXX object src/CMakeFiles/raxml_module.dir/bootstrap/BootstrapTree.cpp.o
[ 92%] Building CXX object src/CMakeFiles/raxml_module.dir/autotune/ResourceEstimator.cpp.o
[ 96%] Building CXX object src/CMakeFiles/raxml_module.dir/terraces/TerraceWrapper.cpp.o
[100%] Linking CXX executable ../../bin/raxml-ng-mpi
/usr/bin/ld: BFD version 2.20.51.0.2-5.47.el6_9.1 20100205 internal error, aborting at reloc.c line 443 in bfd_get_reloc_size

/usr/bin/ld: Please report this bug.

collect2: error: ld returned 1 exit status
make[2]: *** [../bin/raxml-ng-mpi] Error 1
make[2]: *** Deleting file `../bin/raxml-ng-mpi'
make[1]: *** [src/CMakeFiles/raxml_module.dir/all] Error 2
make: *** [all] Error 2

EPA-like option for placing/evaluating different outgroups

this is for a specific use-case, i.e., placing a couple of potential outgroups (genome-size sequences) into an ingroup phylogeny, we don't need it super-urgently nor do we need it to be super-efficient, but something like within the next 6 months would be nice

Handle near-zero branch lengths?

This was a recurrent issue in RAxML when dataset had so little diversity that the br-len optimization hit the branch length minimum set in the code. Also need to be careful to maybe resolve inner nodes with zero branch lengths into multi-furcations. There were 1-2 papers discussing this.

error: wrong likelihood derivatives

Hello! I am trying to create a tree using raxml-ng on an alignment (99322 taxa and 1285 sites) using

./raxml-ng --msa alignment_pfiltered.fasta --model GTR+G4 --search --threads 45 --tree rand{20}

Unfortunately, I run into the following error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  ERROR in branch lenght optimization: wrong likelihood derivatives

Any guidance on what I'm missing or how I might troubleshoot?

Add an outgroup option?

To be discussed I think, for pedagogical reasons as this is only a drawing option maybe we should not add this. It adds quite some code without any added value.

Improve PHYLIP parser performance

PHYLIP parser in libpll is very inefficient on large alignments. e.g for 45 taxa x 6M sites:

  • libpll parser: ~1200 sec
  • genesis parser: ~3 sec

We should either optimize libpll parser, or use a different one for raxml-ng.

Scaled BL: BL opt converged to a worse likelihood score

When running RAxML-NG on supermuc with --brlen scaled option, I get the following error:

ERROR: ERROR in branch length optimization (LIBPLL-2240): BL opt converged to a worse likelihood score by -779.593183368444443 units

I ran the same search with 11 different seeds and it failed 3 times.
I also happens when I do not use LG4X (but I still have LG4M).
I do not reproduce with the flag --blopt nr_safe.

See <..>/ALL_1.1.aa/jobs/raxml-ng/lg4m_lg4x_scaled_median/runs/unsafe_runs/raxml_seed2001_rand_scaled_nodes32 on supermuc.

python scripts

Write python scripts to combine raxml-ng searches with the modules for RF distances, consensus trees etc that Diego has implemented.

Ascertainment Bias Correction

Thanks you so much for this amazing re-write, very much looking forward to using it. Is there any chance to implement the ascertainment bias correction options from RAxML?

Empirical protein models should normalize fixed and user-supplied amino-acid frequencies

Hi,

The fixed amino-acid frequencies supplied by models like WAG and LG do not actually sum to 1.0. This doesn't really affect the rate matrix, since it is going to be rescaled to a fixed rate anyway. However, the frequencies are also used to compute the likelihood at the root. The difference is pretty small, about 1.0e-6, but still wrong I think. Perhaps if user-supplied frequencies are different is more than 1% you should error out instead of rescaling?

It would be nice to fix this so that raxml-ng passes the testiphy tests lg/2 and wag/1. Tests with uniform frequencies or user-supplied frequencies that sum to 1 already pass.

Since I guess this is also a libpll issue, let me know if I should file another bug there.

Assertion `cur_loglh - new_loglh < -new_loglh * 1e-14' failed.

I'm trying to run raxml-ng on comet for a dataset that I seem to see odd behavior when run with the standard raxml.

With both my trimmed and non-trimmed alignment, at some stage in the calculation, the job hits the following error (full output from one of the two jobs is also attached):
raxml-ng-mpi: /home/saladi1/installers/raxml-ng/src/Optimizer.cpp:33: double Optimizer::optimize_model(TreeInfo&, double): Assertion `cur_loglh - new_loglh < -new_loglh * 1e-14' failed.
slurm_raxml-ng.10546635.comet-10-49.out.txt

Do you have any suggestions of how I might get around this issue? Apologies if this is a trivial error/mistake on my part.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.