Comments (7)
I'm glad it was a relatively simple issue! You have given me a bit to think about with this dataset and treetime troubleshooting
Thanks so much for your assistance.
from treetime.
Hi Ben, happy to help! It sounds like somewhere between you and treetime there's a misunderstanding about what the sampling dates are. Could be as simple as a different column name for your dates. But rather than speculating, the best way forward is if you share your inputs (the tree exact files), the exact command you use (copy paste) and the output of treetime --version
. You can send the files to [email protected] if you can't share publicly.
from treetime.
Thank you for the quick reply!
I am just checking the restrictions that may be in place surrounding sharing files on my end but when/if possible I will send the files through email
from treetime.
It should be possible to debug with a lot of columns removed to reduce scope of sharing.
You could try reducing sample numbers to 5 or so, maybe you have some public samples in there anyways, just keep these?
Otherwise, just the header of the csv could be useful - that shouldn't contain anything sensitive.
from treetime.
I have sent through the files. Did you receive them?
from treetime.
I have sent through the files. Did you receive them?
Yes, thanks! Just had a look. It appears that the clock-filter filters out too many tips/sequences causing some assumption somewhere to be violated. This case should probably be handled better, so thanks a lot for the report!
As a workaround you could try some of the following options:
- Switch off clock filter, by passing
--clock-filter 0
In the future, you could try to find out more about what's going on inside treetime by passing e.g. --verbose 4
or an even higher number to see more verbose output.
A key line in the output is:
0.90 TreeTime.clock_filter: More than a third of leaves have been excluded by
the clock filter. Please check your input data.
When treetime runs successfully (which you can achieve by passing --clock-filter 0
) you'll see why the clock filter ends up throwing out almost all of the data:
Almost none of the data lies in the "acceptable" regression range, unless you use large clock filter values (10+ standard deviations) or switch it off altogether). Your data deviates so much from the assumptions of the clock filter model that it fails here.
You can find this plot and other diagnostic information in the run-output folder which should appear in your working directory, see screenshot for the standard content:
from treetime.
This is the full log I get with default verbosity:
treetime --covariation --confidence --clock-filter 5 --tree N1_subset.aln.clean.fasta.treefile.nwk --aln N1_subset.aln.clean.fasta --dates Matched_Metadata.csv
Attempting to parse dates...
Using column 'strain' as name. This needs match the taxon names in the tree!!
Using column 'date' as date.
0.00 -TreeAnc: set-up
0.16 WARNING: Previous versions of TreeTime (<0.7.0) RECONSTRUCTED sequences of
tips at positions with AMBIGUOUS bases. This resulted in unexpected
behavior is some cases and is no longer done by default. If you want to
replace those ambiguous sites with their most likely state, rerun with
`reconstruct_tip_states=True` or `--reconstruct-tip-states`.
0.66 TreeTime.reroot: with method or node: least-squares
0.66 TreeTime.reroot: rerooting will ignore covariance and shared ancestry.
0.90 TreeTime.clock_filter: More than a third of leaves have been excluded by
the clock filter. Please check your input data.
0.91 TreeTime.reroot: with method or node: least-squares
0.91 TreeTime.reroot: rerooting will account for covariance and shared ancestry.
Traceback (most recent call last):
File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 57, in run
return self._run(**kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 221, in _run
self.clock_filter(reroot=reroot_mechanism, n_iqd=n_iqd, plot=plot_rtt, fixed_clock_rate=fixed_clock_rate)
File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 439, in clock_filter
self.reroot(root=reroot, clock_rate=fixed_clock_rate)
File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 521, in reroot
new_root = self._find_best_root(covariation=use_cov,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 949, in _find_best_root
return Treg.optimal_reroot(force_positive=force_positive, slope=slope, keep_node_order=self.keep_node_order)['node']
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 433, in optimal_reroot
best_root = self.find_best_root(force_positive=force_positive, slope=slope)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 340, in find_best_root
x, chisq = self._optimal_root_along_branch(n, tv, bv, var, slope=slope)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 396, in _optimal_root_along_branch
chisq_grid = np.array([chisq(x) for x in grid])
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 396, in <listcomp>
chisq_grid = np.array([chisq(x) for x in grid])
^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 386, in chisq
return base_regression(tmpQ, slope=slope)['chisq']
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 32, in base_regression
raise ValueError("No variation in sampling dates! Please specify your clock rate explicitly.")
ValueError: No variation in sampling dates! Please specify your clock rate explicitly.
ERROR: No variation in sampling dates! Please specify your clock rate explicitly.
ERROR in TreeTime.run: An error occurred which was not properly handled in TreeTime. If this error persists, please let us know by filing a new issue including the original command and the error above at: https://github.com/neherlab/treetime/issues
Some things to address within treetime to make such issues easier to debug for users:
The log message 0.90 TreeTime.clock_filter: More than a third of leaves have been excluded by the clock filter. Please check your input data.
is hard to spot. In this case it correctly indicates a path to the root cause, but this tip would be better in the error itself.
When that "no variant in sampling dates" error happens, it would be good to help the user by reporting the following:
- How many samples are left at this point in the program (post clockfilter): "only 0/1/5/10 samples left, check whether or why clockfilter has filtered them all out"
- What the sampling dates are: maybe the wrong column was inferred: sampling dates are "strain A: 2045-12-23, ...", report up to say ~5 for quick recognition of this being a problem a not
- Suggest the user set
--clock-filter 0
in case clock filter causes the problem, then inspect the "root_to_tip_regression.pdf" to see what's going on
from treetime.
Related Issues (20)
- scipy bug: `scipy.optimize.minimize_scalar` ignores `bounds` when `method` is not set
- In mugration analysis: State of root node is not output to annotated nexus tree (but to CSV when using --confidence) HOT 1
- ENH: Make multiple root-to-tip-plots to help with debugging
- Typo in documentation HOT 1
- ERROR: local variable 'peak' referenced before assignment HOT 4
- Unexpected behavior detected in multiply function when determining peak of function with y-values HOT 17
- ENH: Allow /dev/stdin as input alignment arg
- `IndexError: index 0 is out of bounds for axis 0 with size 0` in `--use-fft` mode HOT 3
- Binary Sequence Alignement HOT 3
- A value in x_new is above the interpolation range's maximum value HOT 2
- More than one record found in handle HOT 4
- how to convert the annotated nexus file (from treetime mugration) to JSON format? HOT 2
- New crash when running without alignment HOT 1
- Joint ancestral sequence reconstruction fails when input sequences are identical and GTR is inferred
- Not generating confidence intervals of divergence times, rate or root date HOT 1
- skyline: TypeError: can't multiply sequence by non-int of type 'float'
- ENH: Better error message when input alignment has sequences of unequal length HOT 1
- Newick formatting error fasttree HOT 3
- IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from treetime.