brettc / partitionfinder Goto Github PK
View Code? Open in Web Editor NEWPartitionFinder discovers optimal partitioning schemes for DNA sequences.
License: Other
PartitionFinder discovers optimal partitioning schemes for DNA sequences.
License: Other
Running PF in rcluster mode can result in different orders of the clustering steps.
I have started PF on one machine. Eventually it turned out that RAM was insufficient.
In order to save time I copied the analysis folder to a machine with more RAM
and continued the analysis there.
When restarting PF on this data set, I expected it to make its way until the point
where it stopped before calling raxml again, since it can read all results from its data base. However, after a few hundred clustering steps it conducted a step it did not do in the first run, so that raxml was called for all successive clustering steps to evaluate a small number of subsets.
Potential cause: Rounding errors due to different machines or after writing and reading from the data base could lead to this effect.
It's not a critical and potentially unavoidable issue.
So I just realised something that could speed things up, but I don't think there's any point implementing it.
At the start of a kmeans tigger run, we calculate the pairwise compatibility matrix for the WHOLE ALIGNMENT. THe site rates are then just colum averages, right?
If that's right, then we never need to calculate anything with tigger again, because for any given subset, we should be able to extract the relevant bit of the matrix and just re-calculate the column averages.
At the moment we re-run tigger for every single new subset.
Thoughts? Have I missed something?
I think this is only worth impelmenting if it turns out that entropies aren't so good after all. For that we need to do some benchmarking.
R
A bunch of failing tests for the parser. I've looked, but can't figure it out (though I suspect it's simple).
@brettc, can you take a look?
R
tests/test_parser.py::test_one FAILED
tests/test_phyml.py::test_simple PASSED
tests/test_phyml.py::test_interleaved PASSED
tests/test_phyml.py::test_subset PASSED
tests/test_raxml.py::test_one PASSED
tests/test_raxml.py::test_parse_nucleotide FAILED
tests/test_raxml.py::test_parse_aminoacid FAILED
tests/test_submodels.py::test_consistency PASSED
tests/test_submodels.py::test_scheme_lengths PASSED
tests/test_subset.py::test_identity PASSED
tests/test_subset.py::test_overlap ERROR
tests/PF2/test_pf2.py::test_missing_sites_warning ERROR
tests/PF2/test_pf2.py::test_overlapping_blocks ERROR
The "fabricated subsets" feature requires that some sort of BIC score be assigned to subsets that we cannot analyze. To do this we must estimate the log likelihood for the subset as a whole. Since the definition of the fabricated subset is that raxml/phyml cannot analyze it, we don't have the subset log likelihood. In the first version of kmeans, we simply added up the site log likelihoods that we had conveniently generated for the clustering step, i.e.:
This is no longer viable since we now use TIGER site rates rather than site likelihoods.
Shall we:
It looks like PhyML now writes alignments differently. All that needs to be done here is that the base files for all of the rerun tests need to be updated.
I am following the isntruction for installing the phyml on Biolinux that uses Ubuntu 14.04.
However, when I type
make
make all-recursive
make[1]: Entering directory `/usr/local/lib/partitionfinder-master/programs/phyml_source'
Making all in src
make[2]: Entering directory `/usr/local/lib/partitionfinder-master/programs/phyml_source/src'
:: Building [phytime]. Version 20150123 ::
gcc -I. -I.. -ansi -pedantic -Wall -std=c99 -O3 -fomit-frame-pointer -funroll-loops -arch i386 -mmacosx-version-min=10.4 -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c
gcc: error: i386: No such file or directory
gcc: error: unrecognized command line option ‘-arch’
gcc: error: unrecognized command line option ‘-mmacosx-version-min=10.4’
make[2]: *** [main.o] Error 1
make[2]: Leaving directory `/usr/local/lib/partitionfinder-master/programs/phyml_source/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/local/lib/partitionfinder-master/programs/phyml_source'
make: *** [all] Error 2
Any idea what is the reasonn for these errors?
thanks
One thing about raxml is that you can only use a single model of rate variation for each partition (parameters estimated independently per partition).
So one thing that would be useful in PF would be to be able to provide a list of models, e.g.
models = GTR, GTR+G, GTR+I+G;
and then rather than consider combinations of these models, just run three analyses - one with all models set to GTR, one with all set to GTR+G, and one with GTR+I+G.
We would then just pick the scheme with the lowest AICc, as usual, and output that along with the best models.
This is a low priority, as it is fairly cosmetic, but user-friendly nonetheless
The basic idea here is that sometimes other programs we rely on will die horribly, for reasons we don't understand and can't fix. Right now, PF just exits saying that RAxML (or whatever) didn't execute successfully. It would be helpful to provide more information. The tips below from a user are to help sort that out.
INFO | 2014-02-18 00:35:33,405 | raxml | Estimating LG+G branch
lengths on tree using RAxML
ERROR | 2014-02-18 00:35:46,995 | raxml | RAxML did not execute
successfully
ERROR | 2014-02-18 00:35:48,179 | raxml | RAxML output follows,
in case it's helpful for finding the problem
ERROR | 2014-02-18 00:35:48,179 | raxml |
ERROR | 2014-02-18 00:35:48,179 | raxml |
ERROR | 2014-02-18 00:35:48,878 | main | Failed to run. See
previous errors.
Email from the user:
we have been trying to get PartitionFinder 1.1.1 to run on a Linux
machine of Dani Bartel with 8 GB of RAM. the test alignment file is 12
MB in size and has about 338,000 amino acid positions in 1348 partitions
(I ran it on my machine in Aussie successfully, the one we both started
together).
We compiled standard RAxML version from Brett's github 1 using the
Makefile for GCC. RAxML crashed during LG+G branch length estimation
(BLTREE) with a segmentation fault. We found out by running RAxML
manually using this command (PartitionFinder runs it during its
analysis):
raxml -f -e -s DATEN.phy -t TREE.phy -m PROTGAMMALG -n BLTREE -w
START_TREE -e 1.0 -O
While a segmentation fault is never pretty and this is a RAxML issue
(has already been posted to its developers) it would be nice if
PartitionFinder would trap that segfault and provide a more useful error
message than in the attached file.
You could trap the segmentation fault signal (SIGSEGV) using the signal
module 2 and write a handler like described in 3 that at least tells
the user that something terribly wrong happened to the subprocess.
[this here was written by Malte Petersen with helps me currently here in
the US ti get run everything)
Would this be possible? I'm sure future users would greatly appreciate
this improvement!
currently, if you have e.g. search = 'greedy'; and a single subset, PF gets confused and quits without being informative.
This should be very very easy to fix. Just put a catch at the top of each heuristic search to make sure that we only do the search if there's >1 subset in the initial scheme. Otherwise, just analyse that one subset and send out the results.
R
Following a conversation on twitter, we realised it might be helpful to output at the end of any given run the suggested citations and methods text. This could incorporate:
All of these would be determined by the details of the analysis someone ran. E.g. did it use PhyML, RAxML, or something else? Did it use algorithms (like the relaxed clustering or kmeans) described in other publications? Which version of PF did it use?
E.g. if some ran relaxed clustering in PF2, that would use RAxML, the relaxed clustering algorithm, and PF2, so the text might read:
"To determine an appropriate partitioning scheme, we used the relaxed clustering algorithm [ref1] implemented in PartitionFinder 2.0 [ref2], which relies on RAxML [ref3]".
And then the refs in text and possibly bibtex format.
So right now the TIGER K-means keeps recalculating TIGER rates.
I am not sure this is good for two reasons:
So, I'm wondering if we should instead stick with calculating TIGER rates once, at the top of the algorithm, and then just stick with them. Thoughts?
The new model loading doesn't have names for a number of options contained in the tests (ie. 'raxml'). This breaks a number of tests.
Currently, output is dumped into a subfolder below the configuration files. This doesn't allow for separation of the results from configuration. This would be useful for our own testing (so we don't clog up the development folders with output), but also (I imagine) in situations where large amounts of output would be better off in a separate place from the configuration. (Note: we need to make sure that the "restart" tests are working again before doing this).
Right now, the site rates aren't cached or saved. We should probably cache them in case the run is interrupted and the user has to restart. Note that this might not always allow the user to 'skip' to the exact place that the algorithm left off since the k-means algorithm isn't guaranteed to converge on the same solution, even given identical rates.
PF 2.0 moves away from the "batteries included" approach in PF 1.0, as we dependencies on various packages (numpy etc), and we might also require building a cython extension. Python packaging is notoriously crappy, so we could consider anaconda packaging as a supported solution. ie. Download anaconda; use "conda install partfinder2".
In testing with the 1Kite crew, we have discovered a small bug in the RAxML LG4X model implementation. This is currently being fixed, so before release we have to know that it's fixed, and update our Windows and Mac executables for RAxML. We may need help compiling the windows version, but I have a few people I could ask.
At some point, @brettc removed these tests for development. They need to get put back in.
There are two types of test currently missing:
Rob
For certain analyses (anything with RAxML, but more specifically rcluster and scluster) we should output subset parameters in the best_schemes.txt file, e.g. including:
base frequencies
relative rate
model parameters
gamma rate parameter (if present, NA if not)
invariant sites parameter (if present, NA if not)
Would it be possible for PartitionFinder to automatically create mrBayes blocks for partitioned analyses? I feel like this could help automate using PartitionFinder for mrBayes analyses, rather than having to manually construct the bayes block.
We have a couple of issues that could be solved by making people use semi-colons at the end of lines in the .cfg file.
part1 = 1-100\3
12S = 101-1000
at the moment, our parser thinks the "12" is a site in the 'part1' partition. Since 12S, 16S, 28S etc are common genes for phylogenetics, we'll see this problem a lot.
models = all k
If we made people use semi-colons, we could get around both of these problems, and have a much better crack at giving people useful parser error messages, which I suspect will be where most of our problems come from if and when people start using the program.
What do you think? Happy to implement this.
R
Right now, we're not doing as well as we could with models.
At the moment, a subset just picks the best model based on all the models it has analysed. This is inconvenient if I want to run a full analysis (i.e. all models) and then see what the answer would have been with a more restricted model list, or vice versa.
It's not a problem in that users are banned from re-running an analysis with a different model list. However, it would be something that would be useful (and possible straightforward) to fix.
R
See this thread in google groups:
https://groups.google.com/forum/#!topic/partitionfinder/HPULy1ZlkzA
The cython version of tiger on the development branch has a flag requiring mac os 10.8 or greater--mmacosx-version-min=10.8
. It would be nice if we could compile the cython code on a linux machine.
It woud be useful to keep some files, like the best "Subset Partitions", or even all of the tested partitions.
It might be a good idea to add support for amino acid alignments using iterative k-means. I think most of it works already, we just need to add support for the estimation of amino acid site rates. @cmayer pointed out that since there are 20 character states in amino acids (rather than 4) that there will likely be a greater amount of conflict leading to some very small rates. I'm not sure what effect this will have and it will have to be tested to see if it works. I still think it would be worth adding since we should be able to implement it with minimal extra effort.
Hi,
I ran into a problem running partition finder on linux. This applies to using it with phyml version 20141029, but maybe affects phyml on linux more generally. I'm running the test example "python PartitionFinderProtein.py examples/aminoacid".
On linux phyml output files don't have a .txt suffix - they just end .phy_phyml_tree (or stats). However, your file handling code in partfinder.phyml.py (functions make_tree_path and make_output_path) assumes that the .txt suffix is there. So this leads to an error when trying to read the BioNJ starting tree at the beginning of the analysis. The tree file gets written successfully, it just doesn't have the suffix, so PF can't find it, and it errors out with a standard python IOError.
If I alter the code in phyml.py (just delete ".txt" in the three places it occurs) then everything runs OK.
I don't know if you've run into this problem before, or if you already have another fix, but I thought I'd let you know.
The following tests fail, because the new version of phyml or raxml is finding better likelihoods so breaking our rerun tests (either that or updates to the rcluster algorithm are giving slight improvements). In each of these cases I've checked the output, and the differences are very small. Everything else looks good.
For the protein tests that are failing, they fail because I simplified the tests to make them run quicker, so the expected output has changed.
The fix is to update the results.bin files to runs from the current value. @brettc, can you do this for the following tests:
tests/full_analysis/test_full.py::test_dna[DNA2] FAILED
tests/full_analysis/test_full.py::test_dna[DNA7] FAILED
tests/full_analysis/test_full.py::test_dna[DNA8] FAILED
tests/full_analysis/test_full.py::test_prot[prot1] FAILED
tests/full_analysis/test_full.py::test_prot[prot6] FAILED
tests/full_analysis/test_full.py::test_prot[prot7] FAILED
tests/full_analysis/test_full.py::test_prot[prot8] FAILED
OK,
At the moment we go gung ho for all the processors we can find. This is almost always the best thing to do, but there are a couple of places where we should be more careful. Particularly with big datasets, as the partitions we analyse get bigger (say, in the greedy analysis), each processor needs more RAM to do the calculations.
I think the best thing here would be to use the processors more dynamically, this doesn't need to be too difficult. I'm pretty sure that Stephane has a formula (something to do with the number of sites and the number of species) to calculate roughly how much memory any given phyml analysis will use. If we could also get an estimate of how much memory is avialable on the host machine, we could do a simple test before adding a task to the threadpool:
if available_memory>estimated_memory:
add_next_task
else:
dont_add_task
If we wanted to get clever, we could also just try runnning through all the waiting tasks to see if there's one that will fit in the available_memory, but we'd have to remember to still do this in order of priority of model difficulty (which is coded currently to run the longer analyses first). This part will only really be relevant if we change the job scheduling to do >1 partition's analyses at a time though.
That is all. Just a thought.
R
We need to update the version of raxml we use to version 8.x
As far as I remember, there are some things which used to work in RAxML, which no longer work. I will attempt to work on this as soon as possible, so I can contact Alexis and resolve any outstanding issues with the RAxML code that we might need to be fixed.
Note that someone has been compiling windows versions of RAxML, so once we have a version of RAxML we can use, we might be able to ask that person to compile us a version too...
https://github.com/stamatak/standard-RAxML/tree/master/WindowsExecutables_v8.1.15
RAxML has only two models for branch lengths currently:
The latter corresponds to 'unlinked' branch lengths in PF (i think it's by using the option -M)
The former is not implemented in PF, but could be as an 'equal' branch lengths option.
Note that right now, RAxML doesn't implement anything that's equivalent to what we call 'linked' branch lengths, where there's an underlying set of branch lengths but each partition gets its own rate multiplier. This is a shame, because it's my suspicion that this is by far the best of the three approaches (i.e. out of 'unlinked', 'linked', and 'equal').
I am on Linux. When I try to run the examples I get the following error:
$ python2 PartitionFinder.py -v --force-restart examples/nucleotide
...
Traceback (most recent call last):
File "PartitionFinder.py", line 23, in <module>
sys.exit(main.main("PartitionFinder", "DNA"))
File "/home/wookietreiber/src/idiv/partitionfinder/partfinder/main.py", line 333, in main
options.processes)
File "/home/wookietreiber/src/idiv/partitionfinder/partfinder/analysis.py", line 55, in __init__
self.make_tree(cfg.user_tree_topology_path)
File "/home/wookietreiber/src/idiv/partitionfinder/partfinder/analysis.py", line 149, in make_tree
self.cfg.cmdline_extras)
File "/home/wookietreiber/src/idiv/partitionfinder/partfinder/phyml.py", line 133, in make_branch_lengths
dupfile(topology_path, tree_path)
File "/home/wookietreiber/src/idiv/partitionfinder/partfinder/phyml.py", line 101, in dupfile
shutil.copyfile(src, dst)
File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: './analysis/start_tree/filtered_source.phy_phyml_tree.txt'
$ python2 PartitionFinderProtein.py -v --force-restart examples/aminoacid
...
Traceback (most recent call last):
File "PartitionFinderProtein.py", line 23, in <module>
sys.exit(main.main("PartitionFinderProtein", "protein"))
File "/home/wookietreiber/src/idiv/partitionfinder/partfinder/main.py", line 333, in main
options.processes)
File "/home/wookietreiber/src/idiv/partitionfinder/partfinder/analysis.py", line 55, in __init__
self.make_tree(cfg.user_tree_topology_path)
File "/home/wookietreiber/src/idiv/partitionfinder/partfinder/analysis.py", line 149, in make_tree
self.cfg.cmdline_extras)
File "/home/wookietreiber/src/idiv/partitionfinder/partfinder/phyml.py", line 133, in make_branch_lengths
dupfile(topology_path, tree_path)
File "/home/wookietreiber/src/idiv/partitionfinder/partfinder/phyml.py", line 101, in dupfile
shutil.copyfile(src, dst)
File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: './analysis/start_tree/filtered_source.phy_phyml_tree.txt'
I wonder why that is and whether this is a Linux-only problem. It must be or else, I suppose, you would have figured it out already for the Mac and Windows versions. Maybe, though, it may be related to one of the external tools, i.e. phyml or raxml, since I installed those locally and symlinked them to the programs directory like this:
$ ls -go programs/phyml programs/raxml
lrwxrwxrwx 1 14 Oct 16 14:57 programs/phyml -> /usr/bin/phyml
lrwxrwxrwx 1 14 Oct 16 14:57 programs/raxml -> /usr/bin/raxml
phyml is version 20140926
raxml is version 8.1.1
Hi all-
If anyone wants to give a spin to the morphological extension to PF, it's here:
https://github.com/wrightaprilm/partitionfinder/tree/feature/morphology2
instructions are in the README. It was mostly playtested on 335 datasets on Ubuntu, though I think @pbfrandsen got it working on Mac. Feel free to reply here with issues, or raise them on the morphology2 branch on my fork.
It would be much cleaner to have a single list of models, parameter values, command line options for different programs.
Right now we duplicate this to some extent between raxmlmodels.py and phymlmodels.py
The benefits of a single list would be:
A very helpful user of the develop branch said this:
"That exception [to otherwise good performance] is a PF-develop run (search=greedy, MrBayes-specific models) I attempted for comparison with a PF-1.1.1 run of the same parameters. As I have mentioned previously, the PF-1.1.1 search=greedy runs were going very slowly, so I was hoping the PF-develop run would be fast or even finish before the previously-started PF-1.1.1 run.
In the end, the PF-1.1.1 run took >25 days (28/Dec – 25/Jan) to finish. The PF-develop run started very quickly and progressed to ~50% in 4-5 days, after which progress pretty much stopped. I let it run for a few more days and thought it had locked up so restarted. After the restart I let it run for another 5-6 days with little progress, after which I needed the computer for other analyses and killed the job. Interestingly, the computer had written a >40GB swap trying to deal with this analysis."
Need to figure this out and fix it. My suspicion is that the current method of loading up ALL the schemes at once is no good (this is the big change in the greedy algorithm from 1.1.1). But it could also be something to do with the databasing - the greedy algorithm uses a lot of subsets, and I wonder if DB.py is getting overloaded (if so, what to do?). Third option - it's because we abandoned the weakref dictionary, and we're just keeping too many subsets around.
2 things to try:
RAxML 8.x has a lot more models available. Easy to add support for these.
K-means is stochastic. This is fine, but to make sure we can replicate things, we should include a commandline option to set the random number seed, e.g.
--seed
which feeds through to the k-means output.
A sub-issue, is that we should record the seed (user specified or not) in the saved .cfg file, so that we can re-start and checkpoint easily (see issue 35 too: #35)
pretty self explanatory, but if anyone thinks of anything we should add (this list will also be useful for the paper), then stick it here.
Simon has found a problem,
If you have LOTS of subsets (he has 50), then our way of making up filenames gives you names that are too long, and nothing works.
However, I like our filenames in general. I wonder if we could add in a conditional (if len(name)>50) and in those situations switch to a different system. Not sure what the system would be, because we need to make sure it's replicable each and every time, so that we can reliably look up old subset results.
Any ideas?
R
Hi All,
I have a proposal for PF2. Right now we calculate three metrics:
AIC, AICc, BIC
My proposal is that we ditch them all apart from the AICc. Here's why.
First, there is no reason to ever use the AIC over the AICc. The AICc corrects for small sample sizes, and converges to the AIC for larger sample sizes (see Burnham and Anderson's book). So providing both is pointless and perhaps even misleading.
Second, the BIC is not really an information criterion at all (it's only named as such). And more importantly, it makes some totally untenable assumptions for molecular phylogenetics. Most egregiously, it assumes that the TRUE model is in the set being considered. This is ridiculously far from the truth for phylogenetics. Perhaps because of this, the only studies that really show support for the accuracy of the BIC are simulation studies, where that one ridiculous rule is actually true - in the simulation studies that showed support for the BIC the true model IS in the set being considered.
Reducing everything to a single metric makes the program simpler, and would help to move the field along a bit too.
Any thoughts?
Rob
We should try and update to the latest version of PhyML for PF2
See #15.
There are a bunch of sim links in partitionfinder/programs/phyml_source/
, and ./configure
doesn't run:
$: ./configure
configure: error: cannot find install-sh or install.sh in "." "./.." "./../.."
Am I missing something?
Thanks,
-Steve
Hey,
I have added a test which fails the alignment parser. Tried for a while to fix it, but ran out of energy after a couple of hours. This one's important to fix before release.
The issue is with interleaved phylip alignments. like this one: http://molecularevolution.org/resources/fileformats/phylip_dna
The deal is that there are multiple sequence 'blocks', each separated by an empty line. The names are contained only in the first block.
So, the phylip parser as is is doing fine on the top block, but it fails when there are additional blocks of sequences.
Brett - reckon you can fix this?
I had a go by defining top_block, and then OneOrMore(extra_block). This seemed OK, but then I got stuck on zipping up the sequences so that each sequence from the top block gets stuck onto the end of the corresponding seq. from the top block.
R
Hey @brettc, can you take a look at this ASAP? This is a very important check that currently doesn't work.
Previously, we have checked that sites in data blocks are non-overlapping, and also spit a warning for any missing sites.
E.g. with the aminoacid example file:
This is fine
'''
[data_blocks]
COI = 1-407;
COII = 408-624;
EF1a = 625-949;
'''
This should spit a warning about missing sites 1-3 and 941-949
'''
[data_blocks]
COI = 4-407;
COII = 408-624;
EF1a = 625-940;
'''
And this should quit with an error about a site only being allowed to appear in a single data block:
'''
[data_blocks]
COI = 1-407;
COII = 408-624;
EF1a = 625-949;
OOPS = 1-949;
'''
Right now, the second and third options are not behaving properly. No warning, and no error. Can we re-instate these?
A few users have been finding out that the latest version of RAxML doesn't work. This is most important for linux users, who have to compile their own version.
We need to make this more obvious. E.g. from a recent user:
"
Hope everything’s going well! Just writing to offer a suggestion. In your recent Google Group post for PartitionFinder, you state that:
"RAxML changes very frequently, and sometimes in ways that make it incompatible with PartitionFinder (e.g., as you discovered, the very latest RAxML doesn't work with PF). We appreciate that this can be a pain, and so we ship RAxML binaries that should work on most Mac and Windows machines. If the binaries don't work on any given machine, you can just go to our fork of RAxML here: https://github.com/brettc/standard-RAxML, and download and compile the source code for your machine. This fork of RAxML should always be compatible with PF."
Do you think that it’d be possible to stick this in the manual or emphasize this, with a modified URL, on the FAQ? Maybe it was ignorant of me not to go to the Google Group page immediately, but (since I do lots of programming) my first instinct was just to go grab older versions(s) of RAxML and compile them from source. Once I pulled from the version listed above, after exhausting the last few versions of RAxML, everything has been going smoothly.
"
Easy to fix:
Sometimes, users have very small data blocks. They can end up with this error from RAxML:
Empirical base frequency for state number 1 is equal to zero in DNA data partition No Name Provided
However, although we do print out the output from RAxML, , because of our threading etc. it looks like something odd has gone on.
So, what we should do is catch this particular RAxML error and make PartitionFinder output a clear description of the problem.
Here's the thread on the google group:
https://groups.google.com/forum/?fromgroups#!topic/partitionfinder/KZU_lvQcekU
And here's my response to the user, which we could use as part of the description output by PF:
"What this means is that you have a single data block which has no A's C's or G's, but just T's (I think I got that right, in any case it has only a single base). Since it's not advisable to try and estimate the parameters for a GTR model from this kind of data, RAxML won't do it, and will exit. You could confirm this (if you wanted to!) by running the alignment b12c320a3c8cae07356dd884b5f54e3a.phy (in your subsets folder) in RAxML. You'll get the same error message.
The pragmatic solution here is to just merge that data block with another one (the most similar one you can think of a priori). Obviously this is not 100% ideal, but it's the only way to get RAxML to analyse this data, and so to get PartitionFinder to work on your data."
R
April reported some problems with the error handling in threads. We need to ensure that errors generated in threads are properly handled and propagated to the caller without killing off all threads (this issues is probably related to how we handle crashes/errors in phyml / raxml).
(see #32 (comment))
on develop branch. @brettc, can you take a quick look?
I can't seem to specify a user scheme, e.g. like this for the example dataset:
## SCHEMES, search: all | greedy | rcluster | hcluster | user | kmeans ##
[schemes]
search = greedy;
s1 = (Gene1_pos1) (Gene1_pos2) (Gene1_pos3) (Gene2_pos1) (Gene2_pos2) (Gene2_pos3) (Gene3_pos1) (Gene3_pos2) (Gene3_pos3);
We are still calling fast_TIGER within kmeans.py gen_per_site_stats(). I thought that we made the change to the cython tiger rates during the workshop, but I can't find that function anywhere.
Right now we call the fast_TIGER C++ program and parse the results. This is kind of clunky and the C++ code used to calculate the rates isn't very complicated. Perhaps we should use something like Cython to estimate TIGER rates, so that we can avoid calling an external program? @brettc might have some ideas on this. If we can maintain a similar runtime, it might be worth implementing for PF 2.0.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.