Code Monkey home page Code Monkey logo

Comments (7)

BCMollett avatar BCMollett commented on July 26, 2024 1

I'm glad it was a relatively simple issue! You have given me a bit to think about with this dataset and treetime troubleshooting

Thanks so much for your assistance.

from treetime.

corneliusroemer avatar corneliusroemer commented on July 26, 2024

Hi Ben, happy to help! It sounds like somewhere between you and treetime there's a misunderstanding about what the sampling dates are. Could be as simple as a different column name for your dates. But rather than speculating, the best way forward is if you share your inputs (the tree exact files), the exact command you use (copy paste) and the output of treetime --version. You can send the files to [email protected] if you can't share publicly.

from treetime.

BCMollett avatar BCMollett commented on July 26, 2024

Thank you for the quick reply!
I am just checking the restrictions that may be in place surrounding sharing files on my end but when/if possible I will send the files through email

from treetime.

corneliusroemer avatar corneliusroemer commented on July 26, 2024

It should be possible to debug with a lot of columns removed to reduce scope of sharing.

You could try reducing sample numbers to 5 or so, maybe you have some public samples in there anyways, just keep these?

Otherwise, just the header of the csv could be useful - that shouldn't contain anything sensitive.

from treetime.

BCMollett avatar BCMollett commented on July 26, 2024

I have sent through the files. Did you receive them?

from treetime.

corneliusroemer avatar corneliusroemer commented on July 26, 2024

I have sent through the files. Did you receive them?

Yes, thanks! Just had a look. It appears that the clock-filter filters out too many tips/sequences causing some assumption somewhere to be violated. This case should probably be handled better, so thanks a lot for the report!

As a workaround you could try some of the following options:

  • Switch off clock filter, by passing --clock-filter 0

In the future, you could try to find out more about what's going on inside treetime by passing e.g. --verbose 4 or an even higher number to see more verbose output.

A key line in the output is:

 0.90    TreeTime.clock_filter: More than a third of leaves have been excluded by
         the clock filter. Please check your input data.

When treetime runs successfully (which you can achieve by passing --clock-filter 0) you'll see why the clock filter ends up throwing out almost all of the data:

image

Almost none of the data lies in the "acceptable" regression range, unless you use large clock filter values (10+ standard deviations) or switch it off altogether). Your data deviates so much from the assumptions of the clock filter model that it fails here.

You can find this plot and other diagnostic information in the run-output folder which should appear in your working directory, see screenshot for the standard content:
image

from treetime.

corneliusroemer avatar corneliusroemer commented on July 26, 2024

This is the full log I get with default verbosity:

treetime --covariation --confidence --clock-filter 5 --tree N1_subset.aln.clean.fasta.treefile.nwk --aln N1_subset.aln.clean.fasta --dates Matched_Metadata.csv            

Attempting to parse dates...
        Using column 'strain' as name. This needs match the taxon names in the tree!!
        Using column 'date' as date.

0.00    -TreeAnc: set-up

0.16    WARNING: Previous versions of TreeTime (<0.7.0) RECONSTRUCTED sequences of
        tips at positions with AMBIGUOUS bases. This resulted in unexpected
        behavior is some cases and is no longer done by default. If you want to
        replace those ambiguous sites with their most likely state, rerun with
        `reconstruct_tip_states=True` or `--reconstruct-tip-states`.

0.66    TreeTime.reroot: with method or node: least-squares

0.66    TreeTime.reroot: rerooting will ignore covariance and shared ancestry.

0.90    TreeTime.clock_filter: More than a third of leaves have been excluded by
        the clock filter. Please check your input data.

0.91    TreeTime.reroot: with method or node: least-squares

0.91    TreeTime.reroot: rerooting will account for covariance and shared ancestry.
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 57, in run
    return self._run(**kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 221, in _run
    self.clock_filter(reroot=reroot_mechanism, n_iqd=n_iqd, plot=plot_rtt, fixed_clock_rate=fixed_clock_rate)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 439, in clock_filter
    self.reroot(root=reroot, clock_rate=fixed_clock_rate)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 521, in reroot
    new_root = self._find_best_root(covariation=use_cov,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 949, in _find_best_root
    return Treg.optimal_reroot(force_positive=force_positive, slope=slope, keep_node_order=self.keep_node_order)['node']
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 433, in optimal_reroot
    best_root = self.find_best_root(force_positive=force_positive, slope=slope)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 340, in find_best_root
    x, chisq = self._optimal_root_along_branch(n, tv, bv, var, slope=slope)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 396, in _optimal_root_along_branch
    chisq_grid = np.array([chisq(x) for x in grid])
                          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 396, in <listcomp>
    chisq_grid = np.array([chisq(x) for x in grid])
                           ^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 386, in chisq
    return base_regression(tmpQ, slope=slope)['chisq']
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 32, in base_regression
    raise ValueError("No variation in sampling dates! Please specify your clock rate explicitly.")
ValueError: No variation in sampling dates! Please specify your clock rate explicitly.

ERROR: No variation in sampling dates! Please specify your clock rate explicitly. 
 
ERROR in TreeTime.run: An error occurred which was not properly handled in TreeTime. If this error persists, please let us know by filing a new issue including the original command and the error above at: https://github.com/neherlab/treetime/issues

Some things to address within treetime to make such issues easier to debug for users:

The log message 0.90 TreeTime.clock_filter: More than a third of leaves have been excluded by the clock filter. Please check your input data. is hard to spot. In this case it correctly indicates a path to the root cause, but this tip would be better in the error itself.

When that "no variant in sampling dates" error happens, it would be good to help the user by reporting the following:

  • How many samples are left at this point in the program (post clockfilter): "only 0/1/5/10 samples left, check whether or why clockfilter has filtered them all out"
  • What the sampling dates are: maybe the wrong column was inferred: sampling dates are "strain A: 2045-12-23, ...", report up to say ~5 for quick recognition of this being a problem a not
  • Suggest the user set --clock-filter 0 in case clock filter causes the problem, then inspect the "root_to_tip_regression.pdf" to see what's going on

from treetime.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.