hahnlab / cafe Goto Github PK

Analyze changes in gene family size and provide a statistical foundation for evolutionary inferences.

Home Page: https://hahnlab.github.io/CAFE/

License: Other

Makefile 1.06% C 26.91% Python 4.87% Shell 0.23% C++ 42.31% M4 0.48% Terra 0.39% Perl 23.15% Dockerfile 0.07% Raku 0.52%

gene-families bioinformatics phylogenetics

cafe's Introduction

CAFE

Software for Computational Analysis of gene Family Evolution

The purpose of CAFE is to analyze changes in gene family size in a way that accounts for phylogenetic history and provides a statistical foundation for evolutionary inferences. The program uses a birth and death process to model gene gain and loss across a user-specified phylogenetic tree. The distribution of family sizes generated under this model can provide a basis for assessing the significance of the observed family size differences among taxa.

to https://github.com/hahnlab/CAFExp

CAFE v4.2.1 is the latest in a regular series of releases to the CAFE application. The manual and various tutorials may be viewed on the website (https://hahnlab.github.io/CAFE/) . This document describes how to download and use CAFE v4.2.1.

Use

The necessary inputs for CAFE v4.2.1 are:

a data file containing gene family sizes for the taxa included in the phylogenetic tree
a Newick formatted phylogenetic tree, including branch lengths

From the inputs above, CAFE v4.2.1 will compute:

the maximum likelihood value of the birth & death parameter, λ (or of separate birth and death parameters (λ and μ, respectively), over the whole tree or for user-specified subsets of branches in the tree
ancestral states of gene family sizes for each node in the phylogenetic tree
p-values for each gene family describing the likelihood of the observed sizes given average rates of gain and loss
average gene family expansion along each branch in the tree
numbers of gene families with expansions, contractions, or no change along each branch in the tree

Install

Run "configure" and "make" from the home directory. The only result is the "cafe" executable in the release directory. This file should be copied to a convenient location.

History

CAFE v3.0 was a major update to CAFE v2.1. Major updates in 3.0 included: 1) the ability to correct for genome assembly and annotation error when analyzing gene family evolution using the errormodel command. 2) The ability to estimate separate birth (λ) and death (μ) rates using the lambdamu command. 3) The ability to estimate error in an input data set with iterative use of the errormodel command using the accompanying python script caferror.py. This version also included the addition of the rootdist command to give the user more control over simulations.

CAFE v4.0 was the first release in a regular series of releases in order to make CAFE easier and more user-friendly, in addition to adding features and fixing bugs.

cafe's People

Contributors

Stargazers

Watchers

cafe's Issues

CAFE should support a version flag so tools can more easily detect what version they are using

No warning is issued if user attempts to load a tree that is not ultrametric

(Distances from the root to all tips should be the same length) This is a requirement for a successful CAFE run.

Questionable what amount of difference we should consider to be ultrametric. If one branch length is 317.645538 and another is 317.645508, should we report a warning?

WARNING

I 'm trying to run CAFEv4.0 on some OrthoFinder data, but I got this warning, before the lambda command:

Empirical Prior Estimation Result: 13
Poisson lambda: 0.138575 & Score: inf
WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14133435029884 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14840106781378 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.13426763278389 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.13780099154136 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14486770905631 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14310102967757 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.13956767092010 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14045101060947 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14221768998820 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14177602014352 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14089268045415 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14111351537649 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14155518522118 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14144476776001 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14122393283766 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14127914156825 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14138955902942 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14136195466413 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14130674593354 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14132054811619 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14134815248148 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14134125139016 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14132744920751 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14133089975317 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14133780084450 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14133607557167 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14133262502600 & Score: -inf
.WARNING: Calculated posterior probability for family OG0000000 = 0
Lambda : 0.14133348766242 & Score: -inf
.
Lambda Search Result: 13
Lambda : 0.14133435029884 & Score: inf
DONE: Lambda Search or setting, for command:
lambda -s

OG00000000 species count distribution (18 species) looks the following:

OG0000000 14 114 76 15 241 945 38 79 475 148 204 43 81 188 52 54 18 101

while the species count distribution from other OrthoGroups are not really different from OG0000000

OG0000001 12 2 16 3 211 8 7 4 5 10 18 81 27 1018 4 21 6 42
OG0000002 0 1 15 1 179 5 0 0 2 45 21 10 519 325 2 65 1 117

So, I was wondering If anyone would knew, why I got this warning?

Any help would be much appreciated!

Regards

Wannes

Load command should support additional formats

CSV and XSLX probably. Perhaps JSON.

get ultrametric tree

Hi,

in the manual using r8s to get a ultrametric tree, nsites was specified. whether nsites is the number of sites in alignment (include gaps) or sites used to infer tree (if one site has same base in all species, without variation among species, tree infering algorithm will not use it, so this site should include in nsites or not?)

All the best

rows in estimated model from esterror output do not sum to 1

Hello @benfulton

I was trying to run CAFE 4.0.2 with exmaple_data.tab. I succeeded in running the follow script:

#!cafe
version
date
load -i example_data.tab -t 12 -l cafe.log.txt -p 0.01 -r 1000
tree (((chimp:6,human:6):81,(mouse:17,rat:17):70):6,dog:93)
lambdamu -s
report cafe_without_error.result

Now I would like to estimate error distribution for chimp and human using 'esterror' function. The command I used was: esterror -dataerror chimp.gene_family.tab human.gene_family.tab -symm -o chimp_human.error. Then I got estimated error like this:

$ head chimp_human.error
maxcnt:26
cntdiff -2 -1 0 1 2
 0 #nan #nan 0.73 0.025 0.01
1 #nan 0.026 0.72 0.025 0.01
2 0.01 0.025 0.72 0.025 0.01
3 0.01 0.025 0.72 0.025 0.01
4 0.01 0.025 0.72 0.025 0.01
5 0.01 0.025 0.72 0.025 0.01
6 0.01 0.025 0.72 0.025 0.01
7 0.01 0.025 0.72 0.025 0.01

Then I planed to set this error to chimp and human, to estimate lamda and mu, with the following script:

#!cafe
version
date
load -i example_data.tab -t 12 -l cafe.log.txt -p 0.01 -r 1000
tree (((chimp:6,human:6):81,(mouse:17,rat:17):70):6,dog:93)
errormodel -model chimp_human.error -sp chimp
errormodel -mocel chimp_human.error -sp human
lambdamu -s
report cafe_with_chimp-human_error.result

I got error like this:

Version: 4.0.2, built at Aug 20 2017
Tue Aug 22 08:52:31 2017-----------------------------------------------------------
Family information: example_data.tab
Log: cafe.log.txt
The number of families is 59
Root Family size : 1 ~ 42
Family size : 0 ~ 84
P-value: 0.01
Num of Threads: 12
Num of Random: 1000
(((chimp:6,human:6):81,(mouse:17,rat:17):70):6,dog:93)
Segmentation fault

With this I have few questions:

what caused this segmentation fault, and how to solve it?
is there something wrong with my steps as shown above?
I realize the rows in error model file should sum to 1 for errormodel function, and the estimated error model from esterror output does not sum to 1 in each row, is this the problem?

Your help and explanation will be highly appreciated, thanks.
Chongjing

GO enrichment for family that rapid evolved

Hi,

I would like to know what kinds of genes that got rapidly evolved during evolution (e.g. rapidly expanded in mammalian clade). Thus, I think I could perform a GO enrichment analysis on the genes from the families that got a rapid expansion. However, I'm not sure which to choose to be the background for the enrichment. Could I infer the ancestral genes based on the CAFE's result, then perform the analysis using ancestral gene set as background?

Best,
Yang

lhtest crash

lhtest command crashes if you do not specify a directory,

New seed command not working with multiple threads

Per the mailing list, the command sequence

$ seed 100
$ load -i filtered_Mollusks_only_Jan7_families.txt -t 8 -l log_Apr9_Mollusk_only_reg_p001_.txt -p 0.01
$ tree ((((BG:138.0008395,LS:138.0008395):67.4885290,AC:205.4893684):217.5106316,LG:423.0000000):93.3000000,CG:516.3000000)
$ lambda -s -t ((((1,1)1,1)1,1)1,1)
$ report reg_Mollusk_only_Apr9_p001

returns slightly different values rather than identical values each time, as intended by the Seed command. This is probably due to the "-t 8" parameter which causes the application to use multiple threads.

report command occurred abort (core dumped)

#!cafe
load -i filtered_cafe_input.txt -t 10 -l cafe.log
tree (((Nym:185.832403,(((((AT:89.277535,Potri:89.277535):11.398834,(Solyc:91.410783,Bv:91.410783):9.265586):5.535909,VIT:106.212277):18.787723,NNU:125.000000)Eudicots:41.420759,((GSMUA:130.230110,(Aco:116.317945,(Sobic:65.000000,LOC_Os:65.000000)Grass:51.317945):13.912164):19.123848,(Spipo:136.145436,Zosma:136.145436):13.208521)Monocots:17.066802):19.411644):14.167597,scaffold:200.000000)Angio:148.774455,Gb_:348.774455)
lambda -s -t (((1,(((((1,1)1,(1,1)1)1,1)1,1)1,((1,(1,(1,1)1)1)1,(1,1)1)1)1)1,1)1,1)
report cafe.report

filtered_cafe_input.txt

load , tree and lambda command can run normally, but report command will occurred "core dumped" error.

the python_script folder is not part of the release

I am running cafe on my data following the tutorial and noticed that the python_scripts are not part of the release I have installed. I did copy the python scripts from the cafe image on Jetstream to my computer to run the remaining steps.

Were these script left out from the release on purpose, it would be great if they could be made available here as part of the cafe installation.

"Calculated posterior probability for family" after size filtering

I've previously followed the tutorial and used Cafe with my own data. Now, with this new batch of data I'm getting ".WARNING: Calculated posterior probability for family 14866 = 0" until a Bus Error crashes it. I've rerun Cafe 4.2 on my old data and it still works.

Lambda : 0.01050260385206 Mu : 0.01697552947024 & Score: -inf
.WARNING: Calculated posterior probability for family 14866 = 0

Lambda : 0.01050260385206 Mu : 0.01697552947024 & Score: -inf
.WARNING: Calculated posterior probability for family 14866 = 0
.Bus Error

$ head -n 1 filtered_OG_counts.txt
Desc    Family ID       FRAX00  FRAX01  FRAX03  FRAX04  FRAX05  FRAX06  FRAX07  FRAX08  FRAX09  FRAX11  FRAX12  FRAX13  FRAX14  FRAX15  FRAX16  FRAX19  FRAX20  FRAX21  FRAX23  FRAX25  FRAX26  FRAX27  FRAX28  FRAX29  FRAX30  FRAX31  FRAX32  FRAX33  Mguttatus   Oeuropea        Slycopersicum
]$ grep 14866 filtered_OG_counts.txt
OG0014866       14866   1       1       0       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1    1       1       1       1       1       1       1       2       1       1

14866 Seems exceedingly normal. Removing this line just moves the error to a new family.

Things I have checked

Tree is ultrametric from r8s
There's no polytomies in the trees, shortest branch length is 0.047221
Header and tree species names match between Random_Sample_1000.GeneCount.txt and cafe script (below)
cafetutorial_clade_and_size_filter.py for size filtering
Branch lengths are not integer as one comment indicated, but I have other non-integer inputs working fine.

python ../cafe_tutorial/python_scripts/cafetutorial_clade_and_size_filter.py -i Cafe_orthofinder_Orthogroups.GeneCount.csv -o filtered_OG_counts.txt -s

head filtered_OG_counts.txt -n 1 > Random_Sample_1000.GeneCount.txt  # important to add header
shuf -n 1000 filtered_OG_counts.txt >> Random_Sample_1000.GeneCount.txt

#!cafe
load -i Random_Sample_1000.GeneCount.txt -t 1 -l reports/run1__orthofinder_full.txt
#Approximated ultrametric based on 763103 sites and 5 calibration points
tree (((((((((((FRAX30:2.062238,FRAX32:2.062238):0.73118,FRAX28:2.793419):1.660572,FRAX12:4.45399):4.76951,(FRAX07:8.140409,FRAX29:8.140409):1.083092):3.809055,FRAX08:13.032555):0.967445,(((((FRAX01:2.412569,FRAX16:2.412569):3.979589,FRAX15:6.392158):1.96257,FRAX00:8.354728):1.6873,(FRAX06:8.890557,FRAX23:8.890557):1.15147):3.083198,FRAX25:13.125226):0.874774):3.475312,FRAX21:17.475312):1.524688,(((FRAX19:8.534639,FRAX20:8.534639):1.589727,((FRAX11:5.06782,FRAX27:5.06782):5.009324,FRAX04:10.077145):0.047221):0.875635,(((((FRAX03:0.834928,FRAX09:0.834928):0.892633,FRAX13:1.72756):2.522183,(FRAX26:2.287722,FRAX14:2.287722):1.962021):3.374113,FRAX05:7.623856):1.770855,FRAX33:9.394711):1.605289):8.0):14.705751,FRAX31:33.705751):2.294249,Oeuropea:36.0):43.0,(Slycopersicum:36.746625,Mguttatus:36.746625):42.253375)
lambdamu -s

I suspect there's something wrong with my tree, I just don't know what that would be.

Any advice you can offer is much appreciated. Thanks.

no binary file - serveral issues

Dear Cafe,

I have downloaded the latest version from github. I have come across several issues:

The is no binary file in the download from Github, in the cafe folder or in the release folder..
In the documantation the link to the tarbar is broken:
https://hahnlab.github.io/CAFE/download.html (https://github.com/hahnlab/CAFE/releases/download/v4.0/CAFE.tar.gz) page 404 webpage not found
If I try and install from source using, I get the following error:
autoconf (not in the docs yet)
./configure
make
I get the following error:
[pt@gruffalo CAFE]$ make
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/cafe_family.o cafe/cafe_family.c
cafe/cafe_family.c: In function ‘cafe_family_read_query_family’:
cafe/cafe_family.c:288:4: warning: implicit declaration of function ‘strdup’ [-Wimplicit-function-declaration]
species = strdup(data->array[2]);
^
cafe/cafe_family.c:288:12: warning: assignment makes pointer from integer without a cast
species = strdup(data->array[2]);
^
cafe/cafe_family.c: In function ‘cafe_family_read_validate_species’:
cafe/cafe_family.c:331:25: warning: assignment makes pointer from integer without a cast
param->cv_species_name = strdup(data->array[2]);
^
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/cafe_main.o cafe/cafe_main.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/cafe_report.o cafe/cafe_report.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/cafe_tree.o cafe/cafe_tree.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/cafe_shell.o cafe/cafe_shell.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/birthdeath.o libtree/birthdeath.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/chooseln_cache.o libtree/chooseln_cache.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/phylogeny.o libtree/phylogeny.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/tree.o libtree/tree.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/fminsearch.o libcommon/fminsearch.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/grpcmp.o libcommon/grpcmp.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/histogram.o libcommon/histogram.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/input_values.o libtree/input_values.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/matrix_exponential.o libcommon/matrix_exponential.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/regexpress.o libcommon/regexpress.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/utils_string.o libcommon/utils_string.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/gmatrix.o libcommon/gmatrix.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/hashtable.o libcommon/hashtable.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/mathfunc.o libcommon/mathfunc.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/memalloc.o libcommon/memalloc.c
gcc -c -Wall -std=c11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/utils.o libcommon/utils.c
g++ -c -Wall -std=c++11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/branch_cutting.o cafe/branch_cutting.cpp
g++ -c -Wall -std=c++11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/cafe_commands.o cafe/cafe_commands.cpp
g++ -c -Wall -std=c++11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/conditional_distribution.o cafe/conditional_distribution.cpp
g++ -c -Wall -std=c++11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/error_model.o cafe/error_model.cpp
g++ -c -Wall -std=c++11 -I cafe -I libtree -I libcommon -O3 -DNDEBUG -o release/gene_family.o cafe/gene_family.cpp
cafe/gene_family.cpp: In function ‘CafeFamily* load_gene_families(std::istream&, int, char)’:
cafe/gene_family.cpp:114:71: error: ‘runtime_error’ was not declared in this scope
throw runtime_error("Failed to identify species for gene families");
^
make: *** [release/gene_family.o] Error 1
Also, the dependancies/ requirements: breathe can you put more details in the file about this. If you google this you get a lot back.
I managed to run version 3.0 which failed to produce any pdf output and therefore the html failed. But did produce results. Have you any idea why this is, could it be fixed in version 4?

cheers,

Pete

weird output

dear all,

cafe v4.1 threw me WARNING: Calculated posterior probability for family OG0000004 = 0 even I followed the tutorial to filter out families contained one or more species had >= 100 gene copies.

Then I get a weird result which all internal nodes had 0 gene copies.

Could you please help me to find out what's going wrong?
Thank you!

Attached please find the input file and cafe shell script.
filtered_cafe_input.txt
cafe.script.txt

output file format for "Newick" column

In the output result file, there is a column named "Newice", for each species name with a suffix "_0", "_1" , "_2" or "_3", what's mean for this four tags? and how to decide a gene family for a species is "contract" or "expand"?

Add a "seed" command for more reproducible results

Add the command "seed" which will set the random seed in order that results that rely on random values may be reproducible. If no seed is set, commands that use randomness will return different values each time.

Accept a JSON input file in the load command

Allow providing a tree in the file as well so all parameters can be set with one file.

Specifiying an invalid log file may cause memory corruption

If the file specified with the -l parameter to the load command is invalid or corrupt, future attempts to log may result in errors, or the program may crash on exit.

Implement Cecile Ane's polyploidy method as a new CAFE feature

It would be interesting to implement the method Ane developed in Rabier et al. (2014). Ane has given permission and we need to make sure to give credit where due.

https://www.ncbi.nlm.nih.gov/pubmed/24361993

Software: http://www.stat.wisc.edu/~ane/wgd/

Error when loading a particular tree

ERROR(tree.c): running infix on nonbinary tree
(memory_new) Error in memory allocation: 12: Cannot allocate memory

I pasted the tree into Seaview and it makes the tree without problem.

(((((((A:109.5082649,B:109.5082649):83.2172160,C:192.7254808):262.8484741,D:455.5739549):87.4210709,E:542.9950258):373.0049742,(F:425.0000000,(G:346.0000000,(H:94.0000000,(I:6.8896471,J:6.8896471):87.1103529):252.0000000):79.0000000):491.0000000):405.4946417,K:1321.4946417));

Seems to have an extra set of parentheses around your tree. If I remove the outer most opening and closing parentheses the tree loads ok!

Provide better HTML output from the "report html" command

Maybe a table more like the input file, but this time with the internal nodes as columns as well? The individual cells could show the observed/inferred count and the change from that nodes ancestral state. For example, if I make up an input family:
Family ID Human Chimp Gorilla
1 3 1 0
The output could be a table like this:

Tree ((Human,Chimp)<1>,Gorilla)<2>

Family ID Human Chimp Gorilla <1> <2>
1 3 / +2 1 / +0 0 / -1 1 / +0 1 / NA
Another column could be added for the family p-value, and we could add a symbol next to the changes (ie +2*) or make them bold if they are rapidly evolving (if the lineage p-value < 0.01).
It would also be useful to have an image of the tree with the internal nodes labeled somewhere visible.

Cannot allocate memory

Hi,

I am running CAFE 3.1 and get following error while running with the sample data:

(memory_new) Error in memory allocation: 12: Cannot allocate memory

The script is like this:
#!~/bin/cafe
#version
#date
load -i /gpfs0/home/test_data -l logfile.txt -p 0.05
tree (((Chimp:6,Human:6):81,(Mouse:17,Rat:17):70):6,Dog:93)
lambda –s
report resultfile

The test_data looks like this:
Description ID Chimp Human Mouse Rat Dog
EF ENSF00000000004 5 8 6 12 40
HLA2 ENSF00000000007 4 4 3 3 3
HLA1 ENSF00000000014 5 3 5 6 3
RAG1 ENSF00000000015 1 1 1 1 1
IG ENSF00000000020 32 42 51 60 18
ACTIN ENSF00000000027 27 30 22 28 25
OPSIN ENSF00000000029 2 2 2 2 2
HEAVY ENSF00000000030 25 25 23 24 18

Could you please provide some suggestions to solve this?

Thanks!

CAFE reports "inconsistency in tree size" on load when there are more than about 100 species

If a tree with 100 or so leaves is entered, and a matching family file is loaded, cafe writes "inconsistency in tree size" and exits. This is legitimate behavior if the family file has more species in it than the tree does, but it should not be limited to 100 species.

ERROR(tree.c): running infix on nonbinary tree

I have followed the tutorial of cafe v4.0 to filter the mcl cluster result by python_scripts/cafetutorial_mcl2rawcafe.py , python_scripts/cafetutorial_clade_and_size_filter.py.
When I run "load -i filtered_cafe_input.txt -t 32 -l reports/log_run1.txt
tree ((((((((Oam:5.489859,She:5.489859):4.999583,Goa:10.489442):13.516361,Cow:24.005802):38.721828,Pig:62.727630):19.175412,Hor:81.903042):2.544888,Dog:84.447930):11.519883,Hom:95.967813):62.740158,Opo:158.707971)
lambda -s -t ((((((((1,1)1,1)1,1)1,1)1,1)1,1)1,1)1,1)
report reports/report_run1
", I got the right result.
But when running cafe with the large_filtered_cafe_input.txt and the fixed lambda value, I got the following error:
Family information: large_filtered_cafe_input.txt
Log: reports/log_run2.txtThe number of families is 5
Root Family size : 1 ~ 605
Family size : 0 ~ 580
P-value: 0.01
Num of Threads: 4
Num of Random: 1000
((((((((Oam:5.48986,She:5.48986):4.99958,Goa:10.4894):13.5164,Cow:24.0058):38.7218,Pig:62.7276):19.1754,Hor:81.903):2.54489,Dog:84.4479):11.5199,Hom:95.9678):62.7402,Opo:158.708)
ERROR(tree.c): running infix on nonbinary tree
Lambda has a different topology from the tree
std::exception

So, help me to fix this, please.

Setting random family sizes sometimes sets the size to one more than the max

The function cafe_tree_random_familysize sometimes overflows its array. This is the cause of the issue reported by 张毅 <zyworship at 163.com>

Subject: usage for cafe
Date: December 13, 2016 at 10:57:05 PM EST

Is the software can't calculate a big data, such as 52 species and 200 gene family? In my calculating , the software display "Aborted (core dumped)". And i reduced the data of gene family ,it works and create the report.

Support gcc 5.4 which, I think, is the default in Ubuntu 16.

Also decide on a set of compiler / OS options to officially support.

The issues are in error_model.cpp, and go back and forth on whether isnan and isinf should be loaded from the global namespace or the std namespace.

Provide install and dist options in the makefile

These are standard options for Linux projects. The best way to add them would be by using automake.

CAFE underflow

Does the max_size option or the filter command keep track of which families are left out of the analysis? I think the reason I don't use the filter command is that it doesn't tell me which families were filtered out.
It's not that important for the filter command, but for max_size, what I typically do is remove families greater than max size, estimate lambda, and then go back and use the estimated lambda to reconstruct gene counts for the families that were over the max size. So I think it would be nice if max_size did that as well!

genfamily crash

genfamily command crashes if you don't specify the directory in the right format.

core dump CAFE

Hi!

I get a segfault during Viterbi in CAFE 4.1 and 4.0.1 (did not try other versions). I tried different number of threads and number of families, but i receive a segfault.

My CF input:

load -i temp -p 0.01 -l COGCAFE.csv.small.log
tree ((((((friEnd5311:11,friEnd534:11):38,friSim5184:50):114,horWer:164):292,((((aspFum:109,aspRub:109):67,claCla:176):130,(claImm:95,exoDer:95):211):130,acaStr:437):20):132,sacCer:590):41,schPom:631)
lambda -s -t ((((((1,1)1,1)1,1)1,((((1,1)1,1)1,(1,1)1)1,1)1)1,1)1,1)
report COGCAFE.csv.small.results

My family input

FAMILY acaStr aspFum aspRub claCla claImm exoDer friEnd5311 friEnd534 friSim5184 horWer sacCer schPom
0PFTW 4 8 10 6 11 7 11 15 11 10 3 9
0PFSV 4 4 2 7 9 11 8 9 9 13 18 8
0PFMP 2 4 4 10 13 12 8 8 6 7 1 2
0PFRH 2 3 6 4 3 4 6 9 10 10 2 3

If i use valgrind to profile the memory the process does not crash! So probably a memory corruption problem

Thank you

make on Mac

I will share the error and on the solution which occurred on Mac.

I got errors when did "make" on Mac High Sierra 10.13.4

cafe / error_model.cpp: 473: 55: error: no member named 'isfinite' in namespace 'std'; did you mean 'finite'?
cafe / lambda.cpp: 706: 18: error: no member named 'isnan' in namespace 'std'

To resolved these issue, I added below line
#include <cmath>
on each files; cafe / lambda.cpp, cafe / error_model.cpp.

Enjoy!

I have 413 species for statistics

I have 413 species for statistics. I have checked the tree with the MEGA, the tree could be open normally.
but it as a input file with the CAFE, but it doesn't run normally. I don't know where have the problem.

Support loading a tree from a file

Trees can be complicated and you might want to refer to the same one in multiple scripts. CAFE should support loading a tree structure from a file as well as a string.

lambdamu freeing issue

Running initial tutorial code but replacing "lambda" with "lambdamu" gives the following error: cafe(76722,0x7fff74b84180) malloc: *** error for object 0x7fd642dc9a40: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Add a Filter argument to load

Argument will take an integer and filter out any families with a max size greater.

(memory_new) Error in memory allocation: 12: Cannot allocate memory

Hi, I have run CAFE 4.0.2 and get the following error:

Family information: species.ogcount.txt
Log: log-run2.txt
The number of families is 37995
Root Family size : 1 ~ 2399
Family size : 0 ~ 2302
P-value: 0.01
Num of Threads: 4
Num of Random: 1000
((((((((((Gossypium-raimondii:25.9804,Durio-zibethinus:25.9804):4.29216,Theobroma-cacao:30.2726):53.3515,(Carica-papaya:75.9728,Arabidopsis-thaliana:75.9728):7.65128):3.32215,((Dimocarpus-longan:31.3362,Xanthoceras-sorbifolium:31.3362):28.0047,Citrus-grandis:59.341):27.6053):3.05376,Populus-trichocarpa:90):16.6329,((Malus-domestica:94.5268,Cucumis-melo:94.5268):6.07958,Glycine-max:100.606):6.02656):5.91262,(Punica-granatum:83.1347,Eucalyptus-grandis:83.1347):29.4108):11.4545,Vitis-vinifera:124):13.7855,Coffea-canephora:137.785):128.232,Oryza-sativa:266.018)
The number of lambdas is 1
Lambda Tree: ((((((((((1,1)1,1)1,(1,1)1)1,((1,1)1,1)1)1,1)1,((1,1)1,1)1)1,(1,1)1)1,1)1,1)1,1)
Empirical Prior Estimation Result: 14
Poisson lambda: 0.211068 & Score: inf
WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00056219243791 & Score: -inf
.WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00059030205981 & Score: -inf
.WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00053408281602 & Score: -inf
.WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00054813762696 & Score: -inf
.WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00057624724886 & Score: -inf
.WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00056921984339 & Score: -inf
.WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00055516503244 & Score: -inf
.WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00055867873518 & Score: -inf
.WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00056570614065 & Score: -inf
.WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00056394928928 & Score: -inf
.WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00056043558654 & Score: -inf
.WARNING: Calculated posterior probability for family OG-1 = 0
Lambda : 0.00056131401223 & Score: -inf
.
Lambda Search Result: 5
Lambda : 0.00056219243791 & Score: inf
DONE: Lambda Search or setting, for command:
lambda -s -t ((((((((((1,1)1,1)1,(1,1)1)1,((1,1)1,1)1)1,1)1,((1,1)1,1)1)1,(1,1)1)1,1)1,1)1,1)
Running Viterbi algorithm....
(memory_new) Error in memory allocation: 12: Cannot allocate memory
(memory_new) Error in memory allocation: 12: Cannot allocate memory
(memory_new) Error in memory allocation: 12: Cannot allocate memory
/opt/gridengine/default/spool/compute-0-5/job_scripts/125237: line 1: 62302 Segmentation fault (core dumped) cafe run_cafe2.cafe

This is the file run_cafe2.cafe:
#!cafe
load -i species.ogcount.txt -t 4 -l log-run2.txt
tree ((((((((((Gossypium-raimondii:25.980427,Durio-zibethinus:25.980427):4.292164,Theobroma-cacao:30.272591):53.351498,(Carica-papaya:75.972809,Arabidopsis-thaliana:75.972809):7.651280):3.322150,((Dimocarpus-longan:31.336222,Xanthoceras-sorbifolium:31.336222):28.004735,Citrus-grandis:59.340957):27.605282):3.053761,Populus-trichocarpa:90.000000):16.632920,((Malus-domestica:94.526782,Cucumis-melo:94.526782):6.079577,Glycine-max:100.606359):6.026561):5.912620,(Punica-granatum:83.134742,Eucalyptus-grandis:83.134742):29.410799):11.454460,Vitis-vinifera:124.000000):13.785478,Coffea-canephora:137.785478):128.232231,Oryza-sativa:266.017709)
lambda -s -t ((((((((((1,1)1,1)1,(1,1)1)1,((1,1)1,1)1)1,1)1,((1,1)1,1)1)1,(1,1)1)1,1)1,1)1,1)
report report-run2

The data file species.ogcount.txt looks like this:
Desc Family ID Arabidopsis-thaliana Carica-papaya Citrus-grandis Coffea-canephora Cucumis-melo Dimocarpus-longan Durio-zibethinus Eucalyptus-grandis Gossypium-raimondii Malus-domestica Oryza-sativa Populus-trichocarpa Punica-granatum Theobroma-cacao Vitis-vinifera Xanthoceras-sorbifolium Glycine-max
(null) OG-1 0 0 0 0 1 1919 0 0 1 0 0 0 00 0 0 1
(null) OG-2 2 7 2 270 10 196 116 54 61 130 8 115 29 70 60 51 75
(null) OG-3 5 7 79 148 11 74 88 208 75 136 96 41 39 46 22 57 21
(null) OG-4 79 11 0 13 27 32 46 173 30 208 0 170 12 18 20 4 138
(null) OG-5 27 13 35 31 20 82 36 179 46 78 27 140 34 53 33 24 83
(null) OG-6 16 15 70 80 17 69 101 148 50 48 21 66 25 35 41 19 64
(null) OG-7 13 3 2 15 6 68 61 147 64 34 22 70 40 43 9 4 47
(null) OG-8 0 0 2 0 0 644 0 0 0 0 0 1 10 0 0 12
(null) OG-9 0 0 0 0 0 0 1 0 0 0 656 0 00 0 0 0

Would you please help me? Thank you.

Segmentation fault with trailing newline in data file

Hi,
I've noticed that CAFE exits with a segmentation fault during the load command when my data file has an extra blank line at the end of the file.

MacOS v10.13.4
CAFE v4.1, built at Apr 16 2018

Let me know if you need more info!

Maximum value for λ × t < 1 doesn't hold true for µ

The product of λ or µ and the depth of the tree should not exceed one (i.e.,
λ × t < 1 and µ × t < 1 must be true; where t is the time from tips to the
root).

While verifying my results, I noticed the death rate was higher than 1 / total branch length. From root to tip I have two branches 42 + 37 = 79. Cafe estimated the death rate at Mu = 0.0202 * 79 = 1.5958

There was no warning in the output that I could see. It may just be that this restriction doesn't apply to µ like the manual states, or it may be that the check overlooks this scenario incorrectly.

Related to this, in the documentation there are placeholders that were never filled in:
Cafe Manual March 14, 2017

If the product of λ and the distance from the tips to the root is greater than 1,
then CAFE will not return accurate results. If λ is specified by the user, this
problem is seen as “@@”. If the λ-search option is used, then the value of λ output
will be the maximum possible for λ × t < 1. If this is a problem, CAFE will print
a caution message and “@@” will appear before the Newick-formatted tree in the
output. In our experience, this is the most common error encountered by users.

Why does this restriction exist in the first place? Wouldn't λ × t > 1 simply mean that gene families have more than doubled on average?

Better error message for invalid arguments

For example, if a script says "lambda search" instead of "lambda -s" the error message doesn't make sense. All commands should consistently throw exceptions if they are passed invalid arguments.

Segfault when species in the family file does not appear in the tree

Output as reported on the mailing list:

Empirical Prior Estimation Result: (51 iterations)
Poisson lambda: 34.602272 & Score: 1076.460101
DONE: Lambda Search or setting, for command:
lambda -l 0.5
Running Viterbi algorithm....
Warning: Tree and family indices not synchronized
Segmentation fault (core dumped)

https://groups.google.com/forum/#!topic/hahnlabcafe/EE5jOECb70Q

Compilation of CAFE3.1

Hi,

would like to use CAFE3.1 but am having trouble compiling it under Ubuntu.

it compiles under macosx, though...

thanks in advance,

commands.txt

dom

Errormodel command fails if model has fewer lines than the expected maximum family size

Fix output format for reporting

Drop Metapost output. Support HTML format for clarity, and a JSON or XML format for structured reporting.

./configure - no file

Dear CAFE,

I am going to try to use your software but I cannot install it. The instructions say ./configure then make. If I try ./configure or ./configure.ac it fails. Can you please help? Thanks Pete

[pt@gruffalo CAFE-master]$ ./configure
-bash: ./configure: No such file or directory

[pt@gruffalo CAFE-master]$ ./configure.ac
./configure.ac: line 1: syntax error near unexpected token cafe,' ./configure.ac: line 1: AC_INIT(cafe, 4.0)'

[pt@gruffalo CAFE-master]$ ls
cafe configure.ac example lib libtree main.cpp mcl2rawcafe.py README.md src_docs
CHANGELOG.md docs INSTALL libcommon LICENSE Makefile.in old_log.txt requirements.txt tests

Viterbi calculation in report is showing 0 in many cases

Issue began in v4.0.1 or v4.0.2

caferror.py - Segmentation fault (core dumped) on first model run

Hi,
I am trying to estimate a single error model for the whole dataset with a single lambda. However, I get a segmentation fault right when the first error model run (0.4) is about to start. The initial cafe run (-f 1) will run and complete without issue, if enabled. I get the same error using my dataset and the example data set provided here.

I am using CAFE 4.0.1 and the version of caferror.py that came with it - I will eventually upgrade to get the new filter option for gene family sizes, but I could not see anything related to this issue in the change logs.

Tried moving python script to location where cafe got compiled (usr/local/bin). This made no difference.

I also noticed that the caferror.sh shells which get spawned by caferror.py are hardcoded to use 10 threads. I lowered this to 4, since my system does not have 10 threads available. This also made no difference.

I attached the verbose stdout and also the shell script give to caferror.py with the -i option. I can also supply any other required files.

I am running Ubuntu 14.04lts on a machine with 8GB RAM, and an intel i7-3770 with 8x 3,40GhHz cores. I am monitoring the resource use, and it does not seem to be a problem.

I would be very thankful for any suggestions!

Best,
Jan Philip Oeyen

caferror_stdout.txt
initialshell.txt

Initial parameters for the Nelder-Mead optimization algorithm can be changed after the optimizer begins

In some circumstances the initial values passed to the optimizer can be changed after the optimizer has started. This does not seem to have an effect on the final values but may cause the path the optimizer takes to be a bit surprising.

Install

I am getting this error after "make" command

Makefile:116: recipe for target 'release/error_model.o' failed
make: *** [release/error_model.o] Error 1

solved, changed:

isnan to std::isnan

isinf to std::isinf

History

CAFE should support a command that lists the command history. It should also support up- and down-arrows to scroll back through the history.

hahnlab / cafe Goto Github PK

cafe's Introduction

CAFE

Use

Install

History

cafe's People

Contributors

Stargazers

Watchers

Forkers

cafe's Issues

Recommend Projects

Recommend Topics

Recommend Org