Code Monkey home page Code Monkey logo

treetime's Introduction

CI anaconda readthedocs

TreeTime: maximum likelihood dating and ancestral sequence inference

Overview

TreeTime provides routines for ancestral sequence reconstruction and inference of molecular-clock phylogenies, i.e., a tree where all branches are scaled such that the positions of terminal nodes correspond to their sampling times and internal nodes are placed at the most likely time of divergence.

To optimize the likelihood of time-scaled phylogenies, TreeTime uses an iterative approach that first infers ancestral sequences given the branch length of the tree (assuming the branch length of the input tree is in units of average number of nucleotide or protein substitutions per site in the sequence). After infering ancestral sequences TreeTime optimizes the positions of unconstrained nodes on the time axis, and then repeats this cycle. The only topology optimization are (optional) resolution of polytomies in a way that is most (approximately) consistent with the sampling time constraints on the tree. The package is designed to be used as a stand-alone tool on the command-line or as a library used in larger phylogenetic analysis work-flows. The documentation of TreeTime is hosted on readthedocs.org.

In addition to scripting TreeTime or using it via the command-line, there is also a small web server at treetime.ch.

Molecular clock phylogeny of 200 NA sequences of influenza A H3N2

Have a look at our repository with example data and the tutorials.

Features

  • ancestral sequence reconstruction (marginal and joint maximum likelihood)
  • molecular clock tree inference (marginal and joint maximum likelihood)
  • inference of GTR models
  • rerooting to maximize temporal signal and optimize the root-to-tip distance vs time relationship
  • simple phylodynamic analysis such as coalescent model fits
  • sequence evolution along trees using flexible site specific models.

Table of contents

Installation and prerequisites

TreeTime is compatible with Python 3.7 upwards and is tested on 3.7 to 3.10. It depends on several Python libraries:

  • numpy, scipy, pandas: for all kind of mathematical operations as matrix operations, numerical integration, interpolation, minimization, etc.

  • BioPython: for parsing multiple sequence alignments and all phylogenetic functionality

  • matplotlib: optional dependency for plotting

You may install TreeTime and its dependencies by running

  pip install .

within this repository. You can also install TreeTime from PyPi via

  pip install phylo-treetime

You might need root privileges for system wide installation. Alternatively, you can simply use it TreeTime locally without installation. In this case, just download and unpack it, and then add the TreeTime folder to your $PYTHONPATH.

Command-line usage

TreeTime can be used as part of python programs that create and interact with tree time objects. How TreeTime can be used to address typical questions like ancestral sequence reconstruction, rerooting, timetree inference etc is illustrated by a collection of example scripts described below.

In addition, TreeTime can be used from the command line with arguments specifying input data and parameters. Trees can be read as newick, nexus and phylip files; fasta and phylip are supported alignment formats; metadata and dates can be provided as csv or tsv files, see below for details.

Timetrees

The to infer a timetree, i.e. a phylogenetic tree in which branch length reflect time rather than divergence, TreeTime offers implements the command:

  treetime --aln <input.fasta> --tree <input.nwk> --dates <dates.csv>

This command will infer a time tree, ancestral sequences, a GTR model, and optionally confidence intervals and coalescent models. A detailed explanation is of this command with its various options and examples is available in the documentation on readthedocs.org.

Rerooting and substitution rate estimation

To explore the temporal signal in the data and estimate the substitution rate (instead if full-blown timetree estimation), TreeTime implements a subcommand clock that is called as follows

  treetime clock --tree <input.nwk> --aln <input.fasta> --dates <dates.csv> --reroot least-squares

The full list if options is available by typing treetime clock -h. Instead of an input alignment, --sequence-length <L> can be provided. Documentation of additional options and examples are available at in the documentation on readthedocs.org.

Ancestral sequence reconstruction:

The subcommand

  treetime ancestral --aln input.fasta --tree input.nwk

will reconstruct ancestral sequences at internal nodes of the input tree. The full list if options is available by typing treetime ancestral -h. A detailed explanation of treetime ancestral with examples is available at in the documentation on readthedocs.org.

Homoplasy analysis

Detecting and quantifying homoplasies or recurrent mutations is useful to check for recombination, putative adaptive sites, or contamination. TreeTime provides a simple command to summarize homoplasies in data

  treetime homoplasy --aln <input.fasta> --tree <input.nwk>

The full list if options is available by typing treetime homoplasy -h. Please see the documentation on readthedocs.org for examples and more documentation.

Mugration analysis

Migration between discrete geographic regions, host switching, or other transition between discrete states are often parameterized by time-reversible models analogous to models describing evolution of genome sequences. Such models are hence often called "mugration" models. TreeTime GTR model machinery can be used to infer mugration models:

  treetime mugration --tree <input.nwk> --states <states.csv> --attribute <field>

where <field> is the relevant column in the csv file specifying the metadata states.csv, e.g. <field>=country. The full list if options is available by typing treetime mugration -h. Please see the documentation on readthedocs.org for examples and more documentation.

Metadata and date format

Several of TreeTime commands require the user to specify a file with dates and/or other meta data. TreeTime assumes these files to by either comma (csv) or tab-separated (tsv) files. The first line of these files is interpreted as header line specifying the content of the columns. Each file needs to have at least one column that is named name, accession, or strain. This column needs to contain the names of each sequence and match the names of taxons in the tree if one is provided. If more than one of name, accession, or strain is found, TreeTime will use the first.

If the analysis requires dates, at least one column name needs to contain date (i.e. sampling date is fine). Again, if multiple hits are found, TreeTime will use the first. TreeTime will attempt to parse dates in the following way and order

order type/format example description
1 float 2017.56 decimal date
2 [float:float] [2013.45:2015.56] decimal date range
3 %Y-%m-%d 2017-08-25 calendar date in ISO format
4 %Y-XX-XX 2017-XX-XX calendar date missing month and/or day

Example scripts

The following scripts illustrate how treetime can be used to solve common problem with short python scripts. They are meant to be used in an interactive ipython environment and run as run examples/ancestral_inference.py.

  • ancestral_inference.py illustrates how ancestral sequences are inferred and likely mutations are assigned to branches in the tree,
  • relaxed_clock.py walks the user through the usage of relaxed molecular clock models.
  • examples/rerooting_and_timetrees.py illustrates the rerooting and root-to-tip regression scatter plots.
  • ebola.py uses about 300 sequences from the 2014-2015 Ebola virus outbreak to infer a timetree. This example takes a few minutes to run.

HTML documentation of the different classes and function is available at here.

Related tools

There are several other tools which estimate molecular clock phylogenies.

  • Beast relies on the MCMC-type sampling of trees. It is hence rather slow for large data sets. But BEAST allows the flexible inclusion of prior distributions, complex evolutionary models, and estimation of parameters.
  • Least-Square-Dating (LSD) emphasizes speed (it scales as O(N) as TreeTime), but provides limited scope for customization.
  • treedater by Eric Volz and Simon Frost is an R package that implements time tree estimation and supports relaxed clocks.

Projects using TreeTime

  • TreeTime is an integral part of the nextstrain.org project to track and analyze viral sequence data in real time.
  • panX uses TreeTime for ancestral reconstructions and inference of gene gain-loss patterns.

Building the documentation

The API documentation for the TreeTime package is generated created with Sphinx. The source code for the documentaiton is located in doc folder.

  • sphinx-build to generate static html pages from source. Installed as
pip install Sphinx

After required packages are installed, navigate to doc directory, and build the docs by typing:

make html

Instead of html, another target as latex or epub can be specified to build the docs in the desired format.

Requirements

To build the documentation, sphinx-build tool should be installed. The doc pages are using basicstrap html theme to have the same design as the TreeTime web server. Therefore, the basicstrap theme should be also available in the system.

Developer info

treetime's People

Contributors

anna-parker avatar corneliusroemer avatar emmahodcroft avatar gtonkinhill avatar huddlej avatar ivan-aksamentov avatar jameshadfield avatar kislyuk avatar ktmeaton avatar pausag avatar rneher avatar stevenweaver avatar tacaswell avatar tavareshugo avatar theosanderson avatar trvrb avatar tsibley avatar victorlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

treetime's Issues

Unable to import tree using treeAnc

Hey,
We're Trying to import a tree using treeAnc. We've encountered several issues:

  1. The module was unable to parse a tree formatted as nwk, generated by python's ete3.NCBITaxa.get_topology . This was solved by downloading a phylip-formatted tree from NCBI. We do not deal with phylogenetic trees regularly, so maybe there's an issue with ete3 module, so we're just raising a flag. The nwk-formatted tree is viewable using mega-x.
  2. treeAnc module was unable to parse the alignment file, a FASTA file downloaded from orthoDB and aligned using Clustal Omega.
  3. In any case, no error was raised, so we needed to inspect the code and try to figure out what went wrong...

We'd be happy for advice regarding the alignment file. All files are attached. treeTime_files.zip

Thanks,
Omer and Alisa,
Pilpel lab

Speed up treetime

The problem

TreeTime is (up to polytomy resolution) a method that scales linearly in tree size, but some steps are slow.
This is partly due to the fact that its in python, partly due to suboptimal implementation.
In timetree estimation, the slowest step is the calculation of convolutions or maxima of functions of the form f(t-tau)g(tau).
While not intrinsically hard, the challenge is to make this robust and numerically stable for branch length of order 1e-7 to 10.

The probability that a branch as a particular length or that a node sits in a particular position is represented as a linear interpolation object of the logarithm of the function. The pivot points of the interpolation are chosen densely around the peak of the distribution and sparsely everywhere else.

In principle, the complexity of this problem should be log-linear in the required accuracy, but the current implementation is quadratic in the accuracy.

Possible solutions

Use FFT to calculate convolutions for marginal inference

This requires a larger regular grip of points on which the functions are stored. An experimental version of this strategy is implemented in the branch fft (https://github.com/neherlab/treetime/tree/fft). This implements FFT as convolution

https://github.com/neherlab/treetime/blob/fft/treetime/node_interpolator.py#L193

This greatly accelerates the inference, but only works for purely marginal inference which can result in inconsistent node placing (as every node is maximized while tracing over all others). Edge cases, robustness, and stability are not established and this is still buggy.

More efficient numerical optimization of integrands in joint inference

We previously just searched for the peak of the function on the parse grid. Numerical optimization should solve this in log time and we can afford a denser grid in that case.

Pitfalls

Generally, this problem is tricky since one has to observe hard constraints (sampling date ordering) and exponentially small numbers.

Underflow issue with large polytomies and state space

When working on a tree with a lot of polytomies and large number of states in real space the marginal subtree likelihood profile (node.marginal_subtree_LH) underflows mid-loop. Perhaps computations should be in log space to begin with or the tree resolved into strict bifurcations with 0.0 branch lengths prior to running _ml_anc_marginal?

node.marginal_subtree_LH *= ch.marginal_Lx

Adding a Newick File

Hello,

Simple question, but I just wanted to know if optionally adding a tree file with TreeTime helps speeds up analysis? If not, what does the optional tree file help assist the program in doing?

Best, and thank you in advance

No Treeanc module found

Hi, trying to run Treetime, i get the following error:

Traceback (most recent call last):
File "/home/bio/treetime/ancestral_reconstruction.py", line 4, in
from treetime import TreeAnc, GTR
File "/home/bio/treetime/treetime/init.py", line 1, in
from treeanc import TreeAnc
ModuleNotFoundError: No module named 'treeanc'

Could you please help me?,
Thanks in advance,
Luis Alfonso.

question on rescaling

In the homology tutorial it states that a rescaling is needed when working from a variable position only alignment -- this is what I do.

However, as I am new to this type of analysis, I don't know what to use to rescale. The tutorial states 271/4411532=0.0000614 -- Is that the number of variable positions in the alignment (e.g. SNVs) divided by the length of genome of the reference?

Thank you.

Add --version to all tools

It would be helpful for packaging to have this to stdout and return error code 0.

% treetime --version
treetime 0.6.3

Easy to do with argparse:

parser.add_argument("--version", action="version", version="%(prog)s " + __version__)

Dealing w/ dates of different granularity

Hi,

I really like treetime, but was wondering how to deal w/ dates of different granularity. For example, one sample might date to 2017-01-01 ("day resolution") while another only has 2016 ("year resolution") in its metadata.

Scaling all dates to year resolution seems wasteful, but I don't know how to encode the year-only dates so as to not bias the analysis.

Thank you.

Please tag releases

Hi,
I'm writing you on behalf of the Debian Med team which is a group inside Debian with the objective to package free software in life sciences and medicine for official Debian. It would be interesting for us to package treetime. It would help if you could tag your releases to enable us automatically detecting what you consider as user targeting release and any ot such updates.
Thanks for considering, Andreas.

Get probability values for ASR

Thank you for making available treetime. Wondering how to get probability values for ancestral states and if less optimal reconstructions can be outputted somehow.

Thanks again,
Sergios

Recursion error with large tree during output.

Hi Richard,

First, thanks for such a great program/package, especially the ASR feature...

However, I was running 'ancestor' a very large tree (~7800 isolates) with two different roots (one tree being more 'unbalanced' than the other, meaning greater node depth on one sub-branch of the root. This lead to the following error whilst outputting the tree:

--- alignment including ancestral nodes saved as  
	 20200422_snppar_root_bat/treetime_out/ancestral_sequences.fasta

2020-05-11_18:31:55.468960: CRITICAL : stderr: Traceback (most recent call last):
  File "/Users/david/miniconda3/bin/treetime", line 12, in <module>
    return_code = params.func(params)
  File "/Users/david/miniconda3/lib/python3.6/site-packages/treetime/wrappers.py", line 658, in ancestral_reconstruction
    report_ambiguous=params.report_ambiguous)
  File "/Users/david/miniconda3/lib/python3.6/site-packages/treetime/wrappers.py", line 215, in export_sequences_and_tree
    Phylo.write(tt.tree, outtree_name, 'nexus', format_branch_length=fmt_bl)
  File "/Users/david/miniconda3/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 83, in write
    n = getattr(supported_formats[format], "write")(trees, fp, **kwargs)
  File "/Users/david/miniconda3/lib/python3.6/site-packages/Bio/Phylo/NexusIO.py", line 69, in write
    **kwargs))]
  File "/Users/david/miniconda3/lib/python3.6/site-packages/Bio/Phylo/NexusIO.py", line 66, in <listcomp>
    nexus_trees = [TREE_TEMPLATE % {"index": idx + 1, "tree": nwk}
  File "/Users/david/miniconda3/lib/python3.6/site-packages/Bio/Phylo/NewickIO.py", line 289, in to_strings
    rawtree = newickize(tree.root) + ";"
  File "/Users/david/miniconda3/lib/python3.6/site-packages/Bio/Phylo/NewickIO.py", line 281, in newickize
    return "(%s)%s" % (",".join(subtrees),
  File "/Users/david/miniconda3/lib/python3.6/site-packages/Bio/Phylo/NewickIO.py", line 280, in <genexpr>
    subtrees = (newickize(sub) for sub in clade)
  File "/Users/david/miniconda3/lib/python3.6/site-packages/Bio/Phylo/NewickIO.py", line 281, in newickize
    return "(%s)%s" % (",".join(subtrees),
  File "/Users/david/miniconda3/lib/python3.6/site-packages/Bio/Phylo/NewickIO.py", line 280, in <genexpr>

... repetitive bit edited out

 File "/Users/david/miniconda3/lib/python3.6/site-packages/Bio/Phylo/NewickIO.py", line 281, in newickize
    return "(%s)%s" % (",".join(subtrees),
  File "/Users/david/miniconda3/lib/python3.6/site-packages/Bio/Phylo/NewickIO.py", line 280, in <genexpr>
    subtrees = (newickize(sub) for sub in clade)
  File "/Users/david/miniconda3/lib/python3.6/site-packages/Bio/Phylo/NewickIO.py", line 272, in newickize
    unquoted_label = re.match(token_dict["unquoted node label"], label)
  File "/Users/david/miniconda3/lib/python3.6/re.py", line 172, in match
    return _compile(pattern, flags).match(string)
RecursionError: maximum recursion depth exceeded

Attribute error

I installed treetime but when I tried running it I encountered an error which can be found below:

0.00    -TreeAnc: set-up

0.06    -TreeAnc: loading alignment failed...

0.06    -TreeAnc.infer_ancestral_sequences: method: ml

0.06    -TreeAnc inferring the GTR model from the tree...
Traceback (most recent call last):
File "C:/Users/Idowu/AppData/Local/Programs/Python/Python36/Scripts/ancestral_reconstruction.py", line 4, in <module>
  __import__('pkg_resources').run_script('treetime==0.4.1', 'ancestral_reconstruction.py')
File "C:\Users\Idowu\AppData\Local\Programs\Python\Python36\lib\site-packages\pkg_resources\__init__.py", line 658, in run_script
  self.require(requires)[0].run_script(script_name, ns)
File "C:\Users\Idowu\AppData\Local\Programs\Python\Python36\lib\site-packages\pkg_resources\__init__.py", line 1445, in run_script
  exec(script_code, namespace, namespace)
File "c:\users\idowu\appdata\local\programs\python\python36\lib\site-packages\treetime-0.4.1-py3.6.egg\EGG-INFO\scripts\ancestral_reconstruction.py", line 92, in <module>
File "c:\users\idowu\appdata\local\programs\python\python36\lib\site-packages\treetime-0.4.1-py3.6.egg\treetime\treeanc.py", line 719, in infer_ancestral_sequences
File "c:\users\idowu\appdata\local\programs\python\python36\lib\site-packages\treetime-0.4.1-py3.6.egg\treetime\treeanc.py", line 759, in reconstruct_anc
File "c:\users\idowu\appdata\local\programs\python\python36\lib\site-packages\treetime-0.4.1-py3.6.egg\treetime\treeanc.py", line 680, in infer_gtr
File "c:\users\idowu\appdata\local\programs\python\python36\lib\site-packages\treetime-0.4.1-py3.6.egg\treetime\treeanc.py", line 1248, in _ml_anc_joint
AttributeError: 'TreeAnc' object has no attribute 'multiplicity'

Can you kindly help out? Thanks

error with treetime when running a nextstrain build

@rneher when trying to run a nextstrain build, I'm getting an error message with treetime, "raise UnknownMethodError("TreeTime.run: rate variation for confidence estimation is not available. Either specify it explicitly, or estimate from root-to-tip regression." do you have a fix for this? or suggestions on how to solve it?

Improve polytomy resolution

The problem

TreeTime has currently a very rudimentary way of resolving polytomies (multi-furcations in the tree).
When this was initially put in place it was never meant to resolve large polytomies but mostly meant to split 3- or 4-fold nodes.
Hence the entire process is very ad-hoc (and slow).

In essence, for all pairs of nodes in a polytomy we compute the LH gain when "pulling" this pair out of the polytomy:

https://github.com/neherlab/treetime/blob/master/treetime/treetime.py#L556

This currently only uses temporal ordering and is horribly inefficient (n^2 with polytomy size -- didn't bother other when n=5.)

Desired improvements

We'd like to be able to use other types of information (geography, known linkages, etc) to inform polytomie resolution.
Ideally that would also scale as n^1.5 (fast NJ).

Possible courses of actions

  • refactor this part and isolate the resolution (current _poly function). => extract polytomy and reinfect a (partially) resolved tree
  • find a solution to flexibly use different kind of resolutions. Challenge is to provide the necessary information and distance measure in a robust and flexible way
  • One could try to use an established tree-builder that allows flexible distance functions.

treetime: error: too few arguments

Hi,

I am receiving this error when running treetime.

metadata=st239.metadata.tsv
aln=alignment.filtered_polymorphic_sites.fasta
tree=alignment.final_tree.nwk
module load python/3.6.5
treetime --tree ${tree} --aln ${aln} --dates ${metadata}

usage: TreeTime: Maximum Likelihood Phylodynamics
treetime: error: too few arguments

treetime --version
treetime 0.7.3

input data abbreviated below:
head $metadata
name date
1 2012.43
2 2012.43
3 2012.55

head $aln

1
GCGAGGGTTGAGCGGCTTGGGACCGAACAACGCTGAATCCCTTGTGGGGCCTCCCGCATCCGCGGATAGAGGATCACGGCGCCAACACTGGGAAGCTCTGGCTTGATGATTCCGACACCTGAAATGGAAG
..
10
GCGAGGGTTGAGCGGCTTGGGACCGAACAGCGCTGAATCCCTTGTGGGGCCTCCCGCATCCGAGGATAGAGGATCACGGCGCCAACACTGGAAAGCTCCGGCTTGATGATTCCGACACCTGAAATGGAAA...

head $tree
(((70:13.297602,((64:22.415506,8:27.633684):0.409884,(2:5.034209,36:3.025597):29.162792):1.024431):0.000772,((((97:1.008812,
..

Thank you!

Polytomies in output tree?

Hello,

I am trying to run TimeTree on a bacterial dataset. I successfully run it with a relaxed clock model (--relax 1.0 0.5), but I end up with 2 polytomies, each consisting of 3 isolates, in my final tree (timetree.pdf file).

Why would polytomies appear in the final tree? Could this be a sign of something not being right in the analysis/with the input data? The data is moderately clock-like (r^2 = ~0.57), but I figured this would be okay for a relaxed clock approach...the polytomies in question have a minimum distance of 2 SNPs between the closest 2 isolates, and 4-6 SNPs between the most distant two, so no two isolates are identical and under a standard ML phylogenetic approach, no polytomies are formed...

I have also tried running it without the "--relax" option (strict clock?) and the polytomes remain.

Any help in understanding why this might be occurring would be greatly appreciated.

Thank you,
Conrad Izydorczyk

wrappers.py

Hi,

I am trying to use the mugration function and I am getting this error:

File "/Users/nartuhi/miniconda3/bin/treetime", line 12, in
return_code = params.func(params)
File "/Users/nartuhi/miniconda3/lib/python3.7/site-packages/treetime/wrappers.py", line 761, in mugration
pc=params.pc, sampling_bias_correction=params.sampling_bias_correction, verbose=params.verbose)
File "/Users/nartuhi/miniconda3/lib/python3.7/site-packages/treetime/wrappers.py", line 664, in reconstruct_discrete_traits
unique_states = sorted(set(traits.values()))
TypeError: '<' not supported between instances of 'str' and 'float'

I have checked my files, and I do not have such character within the attribute nor the nwk, I tried example files and they work thought.
I do not understand what I should do to fix the error.

Can you help me please?

Thank you very much

Carla

Feature request; turn off outlier identification

Hi there.
I am looking at a number of stats (particularly R^2) for closely related datasets (alignments in/excluding regions of interest), but different outliers are identified and subsequently ignored for each dataset. Could I please request an option to turn off the outlier identification so that I can directly compare the stats for such datasets?
Thanks.
Tim

Some issues with tests

Hi,
I have packaged treetime for Debian and uploaded it to the new packages queue. When trying to run the tests I needed to apply some patches. One test does not run anyway

$ python tree_time_test.py 
test_aln_to_leaves (__main__.TestTreeAnc) ... ERROR
test_optimize_ (__main__.TestTreeAnc) ... ok
test_read_newick (__main__.TestTreeAnc) ... ERROR
testset_additional_tree_params (__main__.TestTreeAnc) ... ERROR

======================================================================
ERROR: test_aln_to_leaves (__main__.TestTreeAnc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tree_time_test.py", line 25, in test_aln_to_leaves
    anc_t = treeanc.TreeAnc.from_file(resources_dir+'PR.B.100.nwk', 'newick')
AttributeError: type object 'TreeAnc' has no attribute 'from_file'

======================================================================
ERROR: test_read_newick (__main__.TestTreeAnc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tree_time_test.py", line 14, in test_read_newick
    anc_t = treeanc.TreeAnc.from_file(resources_dir+'PR.B.100.nwk', 'newick')
AttributeError: type object 'TreeAnc' has no attribute 'from_file'

======================================================================
ERROR: testset_additional_tree_params (__main__.TestTreeAnc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tree_time_test.py", line 19, in testset_additional_tree_params
    anc_t = treeanc.TreeAnc.from_file(resources_dir+'PR.B.100.nwk', 'newick')
AttributeError: type object 'TreeAnc' has no attribute 'from_file'

----------------------------------------------------------------------
Ran 4 tests in 0.000s

FAILED (errors=3)

Kind regards, Andreas.

treetime clock fails (partially)

treetime clock --tree core.nwk --aln core.aln --dates dates.csv --reroot least-squares

0.00	-TreeAnc: set-up
The following leaves don't follow a loose clock and will be ignored in rate estimation:
	A
	B
	C
	D

 Root-Tip-Regression:
 --rate:	1.074e-02
 --r^2:  	0.49

The R^2 value indicates the fraction of variation in
root-to-tip distance explained by the sampling times.
Higher values corresponds more clock-like behavior (max 1.0).

The rate is the slope of the best fit of the date to
the root-to-tip distance and provides an estimate of
the substitution rate. The rate needs to be positive!
Negative rates suggest an inappropriate root.


The estimated rate and tree correspond to a root date:

--- root-date:	 1998.44


--- re-rooted tree written to
	2018-08-24_clock/rerooted.newick

Traceback (most recent call last):
  File ".../bin/treetime", line 264, in <module>
    return_code = params.func(params)
  File ".../lib/python3.6/site-packages/treetime/wrappers.py", line 813, in estimate_clock_model
    ofile.write("%s, %f, %f\n"%(n.name, n.raw_date_constraint, n.dist2root))
TypeError: must be real number, not list

treetime --confidence output

Hi,

Maybe I did overlook something: But is there a "textual" output of the confidence intervals calculated by treetime ... --confidence?

If not, I think this would be a very useful feature.

Thank you.

Lowercase letters are unknown characters

Because the profile_maps and alphabets in seq_utils.py are uppercase, lowercase (but otherwise valid) bases spark warnings and are treated as missing information:
image

I believe this is VCF-specific, as treeanc.py line 333 converts to uppercase - but not if it's VCF (as it deals with a MultipleSequenceAlignment object).

In VCF files themselves I believe only uppercase characters are used, but the reference sequence is a FASTA, and these can be upper- or lowercase. (This is what's causing the problem at present.) I think this could be fixed by just ensuring the reference is converted to uppercase when read in (vcf_utils.py line 260). I'll try this.

_attach_sequences_to_nodes Called Twice In TreeAnc.py init

In TreeAnc.py:

Inside of the init 'self.aln' is assigned in line 89, which called the 'setter' on line 241. Inside the setter, if a tree is attached, this calls _attach_sequences_to_nodes on line 254.
Back in init, after 'self.aln' is assigned, if aln is not None, _attach_sequences_to_nodes is called on line 91

How to measure reliability when results are unstable?

Hi Richard,

When I use treetime to infer the evolution relationship of virus sequences, I find that the results of multiple runs are quite different.
Here are the real running results for 8 times [the R squares are shown here: 0.29, 0.28, 0.28, 0.27, 0.28, 0.28, 0.26, 0.28], and the topology of timetree.nexus is different each time.

  1. Is this result due to too similar sequences between viruses? How do I measure the reliability of the results? Is it possible to filter the R square of the results of multiple runs and choose the highest one as the final result?

  2. Does the selective deletion of some sequences help improve the reliability of the results? Some samples may have problems such as repeated sampling leading to sequence redundancy, some may lack date information, some sequences may be of poor quality, and so on.

Here is the command I run TreeTime:
treetime --aln virus.fasta --tree virus.nwk --dates metadata.csv --gtr infer --reroot best --covariation --confidence --reconstruct-tip-states --max-iter 10 --outdir resultTreetime

Any helpful answers are welcome.

Kai

Import Iterable from collections.abc instead of collections

Testing out treetime with Python 3.8, I got the following warning:

  /opt/conda/envs/py38/lib/python3.8/site-packages/treetime/distribution.py:4: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
    from collections import Iterable

I'm just leaving a note here so we don't forget that from collections import Iterable needs to be from collections.abc import Iterable before Python 3.9 can be supported.

Plotting and/or viewing annotated nexus tree

Hi Richard,

I ran the "treetime mugration" command on my data and received the annotated nexus tree output with no problems. Due to the specific way you annotate this nexus file (for example [&country="brazil"]), is there a way to convert this file to a nexus file that can be opened in any of the various tree viewers? Thanks!

Taylor

Add inferred discrete traits to timetree

Hello Richard,

I was trying to infer discrete traits(location) and add those traits to a time tree(inferred by treetime too) using the command line. The annotated_tree.nexus produced by "mugration" and the timetree.nexus produced don't seem to have the same topology(albeit with unresolved polytomies that I'm working on resolving now) despite using the same ML tree as input. Is there a way to do this apart from using treetime as a library?

I was digging into wrappers.py but wanted to see if this was already done before going further. Thanks!

Best,
Karthik

rescaling treetime clock estimate

When I run treetime clock on a core alignment, i.e. only variable positions -- how can I rescale the substitution rate estimate? Simply multiply the estimated rate by the rescaling factor as in #63 ?

Thank you.

Mark inferred dates in output, remove dates of bad branches

Dates of tips that don't follow a rough molecular clock will be detected and ignored by TreeTime (and internally flagged as n.bad_branch=True). Their plausible dates will be inferred instead and this is written to the console with a request to remove those tips and rerun. The output of dates, however, should not include these inferred dates as they are likely not useful.
This output happens here:
https://github.com/neherlab/treetime/blob/master/treetime/wrappers.py#L185

Question about --clock-rate and --relax behavior

Hi Richard,

I have a couple of questions regarding the behaviour of the --clock-rate and --relax options I was hoping you could answer.

  1. As I understand it, using the --clock-rate option specifies a single rate for the entire tree, which is equivalent to fitting a strict clock to the data but without estimating the rate - is that correct? Does not specifying the --relax option then mean a strict clock is estimated from the data?

  2. If the above is true, then I assume the --relax option uses an auto-correlated relaxed clock instead. Is it possible to use the two options together? As in, specify a clock rate (median, average, etc.) and fit a relaxed clock to the data such that the specified rate is used as a guide for fitting the relaxed clock? Or does one option supersede the other and either a fixed rate is used or a relaxed clock fitted, but not both?

Any help is greatly appreciated.

Thank you,
Conrad

match the taxon names in the tree error

Hi,

I'm wondering if you could please help with this error:

Attempting to parse dates...
	Using column 'name' as name. This needs match the taxon names in the tree!!
	Using column 'date' as date.

0.00	-TreeAnc: set-up

Traceback (most recent call last):
  File "/usr/local/python/3.6.5/bin/treetime", line 12, in <module>
    return_code = params.func(params)
  File "/usr/local/python/3.6.5/lib/python3.6/site-packages/treetime/argument_parser.py", line 213, in toplevel
    timetree(params)
  File "/usr/local/python/3.6.5/lib/python3.6/site-packages/treetime/wrappers.py", line 517, in timetree
    verbose=params.verbose, fill_overhangs=not params.keep_overhangs)
  File "/usr/local/python/3.6.5/lib/python3.6/site-packages/treetime/treetime.py", line 34, in __init__
    super(TreeTime, self).__init__(*args, **kwargs)
  File "/usr/local/python/3.6.5/lib/python3.6/site-packages/treetime/clock_tree.py", line 83, in __init__
    self._assign_dates()
  File "/usr/local/python/3.6.5/lib/python3.6/site-packages/treetime/clock_tree.py", line 130, in _assign_dates
    raise MissingDataError("ERROR: ALMOST NO VALID DATE CONSTRAINTS")
treetime.MissingDataError: ERROR: ALMOST NO VALID DATE CONSTRAINTS

My input looks something like:
metadata.tsv
name date
1 2012.43
2 2012.43
..
111 2013.84
112 2014.01

nwk tree looks something like:
(((70:13.297602,((64:22.415506,8:27.633684):0.409884,
...
(19:0.000772,72:2.015511):1.003667):1.003583,68:4.033345):0.000772):23.999073):0.000772):0.971035):0.998182):0.0):0.0;

The names appear to match 1, 2 .. 112

Thanks,
Rosie

root to tip regression and evolutionary rate estimate

Hi,
I'm trying to use tree time to build a time-aligned tree, but I get strange evolutionary rate estimates: 2e-3 instead of the 4e-3 expected. When I look at the root to tip regression, the regression doesn't seem to match the plot (see attached), and when I make a regression on excel from the rtt file I get my 4e-3.
root_to_tip_regression.pdf
rtt excel
I run it directly as a command-line on Mac, if you need my data and my command please tell me.

Thanks,

Tip state inference confidence not working

@rneher ---

@evogytis originally discovered this issue while working on an Ebola dataset and I was just able to reproduce. The issue is that if a tip has attribute ? for say country and ancestral state inference is done, it will be given country state as appropriate. However, it's country_confidence will always be 100%. This was seen across 60 tips in Gytis's example. Internal nodes basically never have confidences that hit 100%. This obviously makes sense for a tip with an assigned country state, but does not make sense for a tip with unknown country state.

To reproduce:

  1. I gave PRVABC59 country state of ? instead of puerto_rico here: nextstrain/augur@0f744cd
  2. Run Zika modular pipeline with input_fasta = "example_data/zika.fasta" from current augur master.
  3. Examine the resulting traits.json to see that "country_confidence": {"brazil": 1.0} for PRVABC59. This is off on branch all by itself and definitely should not be assigned 100% confidence to brazil.

Error running Ancestral Reconstruction

I am getting the error below when running treetime. Could it be my conda python?

File "/Users/jotieno/miniconda2/bin/ancestral_reconstruction.py", line 4, in <module> __import__('pkg_resources').run_script('treetime==0.2.4', 'ancestral_reconstruction.py') File "/Users/jotieno/miniconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 658, in run_script self.require(requires)[0].run_script(script_name, ns) File "/Users/jotieno/miniconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1445, in run_script exec(script_code, namespace, namespace) File "/Users/jotieno/miniconda2/lib/python2.7/site-packages/treetime-0.2.4-py2.7.egg/EGG-INFO/scripts/ancestral_reconstruction.py", line 4, in <module> __import__('pkg_resources').run_script('treetime==0.2.4', 'ancestral_reconstruction.py') File "build/bdist.macosx-10.6-x86_64/egg/treetime/__init__.py", line 2, in <module> File "build/bdist.macosx-10.6-x86_64/egg/treetime/treetime.py", line 2, in <module> File "build/bdist.macosx-10.6-x86_64/egg/treetime/clock_tree.py", line 1, in <module> File "build/bdist.macosx-10.6-x86_64/egg/treetime/utils.py", line 3, in <module> File "/usr/local/lib/python2.7/site-packages/scipy/__init__.py", line 118, in <module> File "/usr/local/lib/python2.7/site-packages/scipy/_lib/_ccallback.py", line 1, in <module> ImportError: cannot import name _ccallback_c

Treetime mugration removes trait comments from timetree.

When doing both dating of a divergence tree as well as ancestral state reconstruction of a given discrete character, I presume the best workflow is to first run treetime to get the dated tree and then run treetime mugration on the time tree. If it's possible to do both steps in a single analysis, that would be great. Otherwise, I've noticed that the tree which is produced by treetime mugration does not retain the traits (specifically &mutations and &date in my case) comments for each tip/node in the timetree. Thanks for the great program!

not able to import parse_dates

This might be some basic mistake on my part but I'm not able to import parse_dates, please help.

See error below:
screenshot 2018-11-15 16 20 01

Thanks.

treeanc.py throws error on augur build of american-only Zika genomes

When I try to do a nextstrain/augur build with a fasta that includes Zika genomes only sampled from the Americas, the build breaks with this error:

File "../treetime_augur/treetime/treeanc.py", line 297, in make_reduced_alignment
n.cseq = self.reduced_alignment[seq_count]
IndexError: index 5 is out of bounds for axis 0 with size 5

I think this might be an error in how the tree traversal is occurring when assigning the compressed sequences, but that's just a guess.

The build and the ancestral state reconstruction works fine however if I include sequences from French Polynesia in addition to the Americas only sequences. Do you have an idea of what might be going on here?

I've attached both the Americas + French Polynesia file (build that works) and the Americas only file (build that doesn't work) here.

augur-americas-andfp.txt
augur-americas-only.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.