gaius-augustus / braker Goto Github PK

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes

License: Other

Shell 3.37% Perl 87.70% R 0.38% Python 7.51% Dockerfile 1.03%

braker's People

Contributors

Stargazers

Watchers

Forkers

alphaneer nijibabulu dayedepps healthvivo smoe baberlevi tgeneralovic wangdi2014 sunyoungbio jebbd stephanholgerd leennabraham flypythons anaflaviamorim jasonsydes pythseq emagallong visoca youreprettygood xuelei-dai eernst fd012020 gorliver gitbackspacer nkm47 niexiaoqing maozhitao feigeliudan01 shiyi-pan skerker heziqing yuzhenpeng wook2014 kiwiroy onlinearts aditi17142 hans-zhao831 prasoonnema ammmachado harvardinformatics epaule jacky2207 hlkfoz ricardoi rsettlage caspoer lindberghge khem2015 ad3002 julienppichon marialui aexbrayat shahed30 jijiaojiao01 v-jj hengbingao josepfabril jiangchb ko0000 quentinrougemont quentin-rougemont arslan9732 chiisansan wjt0925 darrengao628 rlibouba abdubidopsis sebc31 gkanogiannis kimikim2 pansapiens paveleg sanjaysrikakulam t03i neilernst fakher77 ashgene nbelle1 wanghuan766

braker's Issues

gtf2gff error

Dear,
I have downloading the latest version of BRAKER2 and runing wiht --gff3 flag. But I get follows stderr in gtf output file convert to gff3 format.

ERROR in file /home/software/BRAKER/scripts/braker.pl at line 9415
Failed to execute: cat /home/results/BRAKER/augustus.hints.gtf | perl -ne 'if(m/\tAUGUSTUS\t/) {print $_;}' | perl /home/software/Augustus/scripts/gtf2gff.pl --gff3 --out=/home/results/BRAKER/augustus.hints.gff3 >> /home/results/BRAKER/gtf2gff3.log 2>> /home/results/BRAKER/errors/gtf2gff3.err

and gtf2gff3.err file show

transcript jg1.t1 has conflicting gene parents: and jg1. Remember: In GTF txids need to be overall unique. at /home/software/Augustus/scripts/gtf2gff.pl line 119, <STDIN> line 590303.

Any help is much appreciated.
Thanks.

Errors in the annotations with --UTR=ON

I try to annotate the genome with example data using command
"braker.pl --genome=../genome.fa --prot_seq=../prot.fa --prg=gth --bam=../RNAseq.bam --gth2traingenes --softmasking --UTR=on --gff3 --workingdir=$wd --cleanup --core=20" ,
but error came like below:

ERROR in file /public/home/fanlj/software/Braker2/BRAKER/scripts/braker.pl at line 8741
Failed to execute: perl /public/home/fanlj/software/Braker2/Augustus/scripts/aa2nonred.pl /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utr_genes_in_gb.fa /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utr_genes_in_gb.nr.fa --BLAST_PATH=/public/home/fanlj/software/miniconda3/bin --cores=20 1> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utr.aa2nonred.stdout 2> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/errors/utr.aa2nonred.stderr!

And the file "utr_genes_in_gb.fa" was not created in the pipeline, and i cannot find error inthe braker.log, the last 20 line of braker.log was:

# Fri Mar 22 09:34:51 2019: sorting bam file...

/public/software/apps/samtools-1.3.1/bin/samtools sort -@ 19 -o /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/merged.s.bam /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/merged.bam 1> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/samtools_sort_before_wig.stdout 2> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/errors/samtools_sort_before_wig.stderr
# Fri Mar 22 09:34:58 2019: Creating wiggle file...

/public/home/fanlj/software/Braker2/Augustus/bin/../auxprogs/bam2wig/bam2wig /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/merged.s.bam 1>/public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/merged.wig 2> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/errors/bam2wig.err
# Fri Mar 22 09:36:01 2019: Creating /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utrs.gff

/public/home/fanlj/software/Braker2/Augustus/bin/utrrnaseq --in-scaffold-file /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/genome.fa -C /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/stops.and.starts.gff -I /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/rnaseq.utr.hints -W /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/merged.wig -o /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utrs.gff -r 76 -v 100 -n 15 -i 0.7 -m 0.3 -w 70 -c 100 -p 0.5  1> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/rnaseq2utr.stdout 2> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/errors/rnaseq2utr.err
# Fri Mar 22 09:36:11 2019: fixing utrrnaseq output

mv /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utrs.f.gff /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utrs.gff
# Fri Mar 22 09:36:11 2019: Creating gb file for UTR training

cat /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utrs.gff /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/augustus.hints.f.gtf | grep -P "(CDS|5'-UTR|3'-UTR)" | sort -n -k 4,4 | sort -s -k 10,10 | sort -s -k 1,1 >> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/genes.gtf 2> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/errors/cat_utrs_augustus_noUtrs.err

perl /public/home/fanlj/software/Braker2/Augustus/scripts/gff2gbSmallDNA.pl /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/genes.gtf /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/genome.fa 1854 /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utr.gb --good=/public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/bothutr.lst 1> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/gff2gbSmallDNA.utr.stdout 2> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/errors/gff2gbSmallDNA.utr.stderr
# Fri Mar 22 09:36:11 2019: BLAST training gene structures (with UTRs) against themselves:
perl /public/home/fanlj/software/Braker2/Augustus/scripts/aa2nonred.pl /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utr_genes_in_gb.fa /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utr_genes_in_gb.nr.fa --BLAST_PATH=/public/home/fanlj/software/miniconda3/bin --cores=20 1> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/utr.aa2nonred.stdout 2> /public/home/fanlj/software/Braker2/BRAKER/example_mao/tests/test6/errors/utr.aa2nonred.stderr

AUGUSTUS Segmentation fault

While using BRAKER I keep getting such error:

braker.pl --genome=/media/damian/Toshiba/Canu_Pilon.fasta --esmode Logfile: /home/damian/Programs/BRAKER-master/braker/Sp_4/braker.log! ERROR in file /home/damian/Programs/BRAKER-master/scripts/braker.pl at line 6556 Failed to execute: perl /home/damian/Programs/Augustus/scripts/optimize_augustus.pl --rounds=5 --species=Sp_4 --kfold=8 --AUGUSTUS_CONFIG_PATH=/home/damian/Programs/Augustus/config/ --onlytrain=/home/damian/Programs/BRAKER-master/braker/Sp_4/train.gb.train.train /home/damian/Programs/BRAKER-master/braker/Sp_4/train.gb.train.test 1>/home/damian/Programs/BRAKER-master/braker/Sp_4/optimize_augustus.stdout 2>/home/damian/Programs/BRAKER-master/braker/Sp_4/errors/optimize_augustus.stderr!

And this is stdout and stderr
Splitting training file into 8 buckets... Reading in the meta parameters used for optimization from /home/damian/Programs/Augustus/config/species/generic/generic_metapars.cfg... Reading in the starting meta parameters from /home/damian/Programs/Augustus/config/species/Sp_4/Sp_4_parameters.cfg... bucket

Segmentation fault (core dumped) Could not read the accuracy values out of predictions.txt when processing bucket 1. at /home/damian/Programs/Augustus/scripts/optimize_augustus.pl line 1128.

any idea what may be the reason?

optimize_augustus.pl line 1224:Could not read the accuracy values out of predictions.txt when processing bucket 1

While using BRAKER2 I keep getting such error:
my code:
perl /he_lab/share/data/tuguangxian/tgx/software/miniconda3/envs/augustus/bin/optimize_augustus.pl --rounds=5 --species=qiaozuigui --kfold=9 --AUGUSTUS_CONFIG_PATH=/he_lab/share/data/local/augustus/augustus-3.3.2/config --onlytrain=/he_lab/share/data/tuguangxian/tgx/data/genome/qiaozuigui/nGS/05_braker2/braker/qiaozuigui/train.gb.train.train --cpus=9 /he_lab/share/data/tuguangxian/tgx/data/genome/qiaozuigui/nGS/05_braker2/braker/qiaozuigui/train.gb.train.test 1>/he_lab/share/data/tuguangxian/tgx/data/genome/qiaozuigui/nGS/05_braker2/braker/qiaozuigui/optimize_augustus.stdout

error:
replaced tx with 0 MEA txs
replaced tx with 0 MEA txs
replaced tx with 0 MEA txs
sh: 行 1: 81016 段错误 (核心已转储) augustus --species=qiaozuigui --AUGUSTUS_CONFIG_PATH=/he_lab/share/data/local/augustus/augustus-3.3.2/config/ --/Constant/dss_end=4 --/Constant/dss_start=3 --/Constant/ass_start=3 --/Constant/ass_end=2 --/Constant/ass_upwindow_size=30 --/IntronModel/d=100 --/IntronModel/ass_motif_memory=3 --/IntronModel/ass_motif_radius=3 --/ExonModel/tis_motif_memory=3 --/ExonModel/tis_motif_radius=2 --/Constant/trans_init_window=20 --/Constant/init_coding_len=15 --/ExonModel/patpseudocount=5.0 --/ExonModel/etpseudocount=3 --/ExonModel/etorder=2 --/Constant/intterm_coding_len=5 --/ExonModel/slope_of_bandwidth=0.3 --/ExonModel/minwindowcount=10 --/IGenicModel/patpseudocount=5.0 --/IntronModel/patpseudocount=5.0 --/IntronModel/slope_of_bandwidth=0.4 --/IntronModel/minwindowcount=4 --/IntronModel/asspseudocount=0.00266 --/IntronModel/dsspseudocount=0.0005 --/IntronModel/dssneighborfactor=0.00173 --/ExonModel/minPatSum=233.3 --/Constant/probNinCoding=0.23 --/Constant/decomp_num_steps=1 --/ExonModel/infile=exon-tmp2.pbl --/IntronModel/infile=intron-tmp2.pbl --/IGenicModel/infile=igenic-tmp2.pbl --/UtrModel/infile=utr-tmp2.pbl tmp_opt_qiaozuigui/bucket2.gb > tmp_opt_qiaozuigui/predictions-2.txt
Could not read the accuracy values out of predictions.txt when processing bucket 1. at /he_lab/share/data/tuguangxian/tgx/software/miniconda3/envs/augustus/bin/optimize_augustus.pl line 1224
any idea what may be the reason?Thank you!

extracting protein sequences from AUGUSTUS predictions if BRAKER ran with proteins & RNA-Seq

When running with proteins and RNA-Seq, BRAKER executes joingenes to merge two AUGUSTUS gene sets. In contrast to AUGUSTUS, the joingenes output does not contain coding sequence or protein sequence. Users would like to have protein sequences in FASTA format.

Failing during optimize_augustus.pl

Hey, it's me again,
I tried to generate a model for my species and during the optimize_augustus.pl, I got the following error:

bio@biocomp04:~/Documents/Purpureocillium/March2019$ perl /home/bio/apps/augustus-3.3.2/scripts/optimize_augustus.pl --rounds=5 --species=Purp --kfold=8 --AUGUSTUS_CONFIG_PATH=/home/bio/apps/augustus-3.3.2/config --onlytrain=/home/bio/Documents/Purpureocillium/March2019/braker/Purp/train.gb.train.train --cpus=8 /home/bio/Documents/Purpureocillium/March2019/braker/Purp/train.gb.train.test
Splitting training file into 8 buckets...
Reading in the meta parameters used for optimization from /home/bio/apps/augustus-3.3.2/config/species/generic/generic_metapars.cfg...
Reading in the starting meta parameters from /home/bio/apps/augustus-3.3.2/config/species/Purp/Purp_parameters.cfg...
bucket Segmentation fault (core dumped)
2 Segmentation fault (core dumped)
Segmentation fault (core dumped)
3 Segmentation fault (core dumped)
8 5 Segmentation fault (core dumped)
1 Segmentation fault (core dumped)
6 Segmentation fault (core dumped)
Segmentation fault (core dumped)
4 7 Could not read the accuracy values out of predictions.txt when processing bucket 1. at /home/bio/apps/augustus-3.3.2/scripts/optimize_augustus.pl line 1224.

Later I used the following command, and Braker2 executed perfectly,

bio@biocomp04:~/Documents/Purpureocillium/March2019$ ~/apps/BRAKER-2.1.2/scripts/braker.pl --genome=./working-genomes/Purp-polished-abyss.fasta.masked --bam=./Purp-annotation/Purp-RNAseq-to-genome-unmasked.bam --species=Purp --cores=4 --AUGUSTUS_CONFIG_PATH=/home/bio/apps/augustus-3.3.2/config --GENEMARK_PATH=/home/bio/apps/gm_et_linux_64/gmes_petap --fungus --skipOptimize --useexisting

What do you think it could be,
Am I executing it wrong?
Cheers,
Luis Alfonso.

RNA.bam does not exist. ERROR

Hi, I am attempting to run a BRAKER annotation on my species of interest.

I keep running into this issue regarding the RNAseq.bam files unable to be detected;

Working Directory: /home/tng23/rds/rds-cj107-jiggins-rds/rds-cj107-heliconius/tng23/Project-Anno-HerIll/Hermetia_illucens-BRAKER

Working on Node: login-n-1

Date Started: Tue  2 Apr 14:04:49 BST 2019

intel/cce(15):ERROR:105: Unable to locate a modulefile for 'intel/cce/14.0.3.174'
intel/fce(15):ERROR:105: Unable to locate a modulefile for 'intel/fce/14.0.3.174'
intel/mkl(15):ERROR:105: Unable to locate a modulefile for 'intel/mkl/11.1.3.174'
NEXT STEP: check files and settings
NEXT STEP: check options
ERROR: BAM file ~/rds/rds-cj107-jiggins-rds/rds-cj107-heliconius/tng23/Project-Anno-HerIll/Hermetia_illucens-BRAKER/STAR-map_test-01.output/RNA-mapped-test.bam does not exist. Please check.
... options check complete.

I have ran on both test data (after grabbing separately) and my own data with the same error running from a script as follows;

#!/bin/bash
#SBATCH -p skylake
#SBATCH -A JIGGINS-SL2-CPU
#SBATCH -J BRAKER
#SBATCH --time=36:00:00
#SBATCH [email protected]
#SBATCH --mail-type=ALL
#SBATCH --nodes=1
#SBATCH --tasks=32
#SBATCH --exclusive

# Provide general information about job

printf "\nWorking Directory: $(pwd)\n"
printf "\nWorking on Node: $(hostname)\n"
printf "\nDate Started: $(date)\n\n"

# Activate the miniconda environment with required tools on;

module load perl/5.20.0
module load genemark/4.32
module load python/3.4.1
module load bamtools/2.4.2
module load miniconda3/4.5.1
source activate Biopython-local

# Give command for the script to run;

time braker.pl \
        --GENEMARK_PATH=~/privatemodules/gm_et_linux_64/gmes_petap \
        --BAMTOOLS_PATH=~/.conda/envs/Biopython-local/bin \
        --AUGUSTUS_CONFIG_PATH=~/privatemodules/augustus-3.3.2/bin \
        --genome=~/rds/hpc-work/Data1/Genome-Assembly_BSF_PB-10X_Sanger_2019-02-12/genomic_resources/iHerIll_ref-genome.fasta \
        --bam=~/rds/rds-cj107-jiggins-rds/rds-cj107-heliconius/tng23/Project-Anno-HerIll/Hermetia_illucens-BRAKER/STAR-map_test-01.output/RNA-mapped-test.bam \
        --softmasking \
        --species=Hermetia_illucens01 \
        --cores=1

# Information on time finished

printf "\nDate Finished: $(date)\n\n"

This is the same for the test data also;

#!/bin/bash

#SBATCH -J BRAKER
#SBATCH -o BRAKER-%j.out
#SBATCH -e BRAKER-%j.out
#SBATCH --time=36:00:00
#SBATCH [email protected]
#SBATCH --nodes=1
#SBATCH --tasks=32
#SBATCH -p skylake
#SBATCH --exclusive

# Provide general information about job

printf "\nWorking Directory: $(pwd)\n"
printf "\nWorking on Node: $(hostname)\n"
printf "\nDate Started: $(date)\n\n"

# Activate the miniconda environment with required tools on;

module load bamtools/2.4.2
module load miniconda3/4.5.1
source activate Biopython-local

# Give command for the script to run;

time    braker.pl \
        --GENEMARK_PATH=/home/tng23/privatemodules/gm_et_linux_64/gmes_petap \
        --BAMTOOLS_PATH=/home/tng23/.conda/envs/Biopython-local/bin \
        --AUGUSTUS_CONFIG_PATH=/home/tng23/privatemodules/augustus-3.3.2/bin \
        --genome=genome.fa --bam=RNAseq.bam --softmasking \
        --species=Species-Test-01 \
        --cores=1

# Information on time finished

printf "\nDate Finished: $(date)\n\n"

I have double checked paths and ensured they are OK and running from the command line directly appears to detect the same BAMs before running into another error so the issue must lie with submitting from a script.

I haven't seen any recorded issues from script submission, is there any advice or similar problems observed elsewhere?

Thanks,
Tom

EDIT: BRAKER v1.9 GeneMark-ES Suite version 4.38 augustus-3.3.2

On submitting from the command line

(Biopython-local) [tng23@login-n-1 example]$ braker.pl --GENEMARK_PATH=/home/tng23/privatemodules/gm_et_linux_64/gmes_petap --BAMTOOLS_PATH=/home/tng23/.conda/envs/Biopython-local/bin --AUGUSTUS_CONFIG_PATH=/home/tng23/privatemodules/augustus-3.3.2/bin --genome=genome.fa --bam=RNAseq.bam --species=Species-Test-01 --cores=1

NEXT STEP: check files and settings
NEXT STEP: check options
... options check complete.

WARNING: /home/tng23/privatemodules/BRAKER/example/braker/Species-Test-01 already exists. Braker will use existing files, if they are newer than the input files. You can choose another working directory with --workingdir=dir or overwrite it with --overwrite

NEXT STEP: create SAM header file /home/tng23/privatemodules/BRAKER/example/braker/Species-Test-01/RNAseq_header.sam.
SAM file /home/tng23/privatemodules/BRAKER/example/braker/Species-Test-01/RNAseq_header.sam complete.

NEXT STEP: check BAM headers
headers check for BAM file /home/tng23/privatemodules/BRAKER/example/RNAseq.bam complete.

NEXT STEP: make hints from BAM file /home/tng23/privatemodules/BRAKER/example/RNAseq.bam
failed to execute: Inappropriate ioctl for device
'''

Failure during training

Hi,
Running BRAKER2 I get the following error:

bio@biocomp04:~/Documents/Purpureocillium/March2019$ ~/apps/BRAKER-2.1.2/scripts/braker.pl  --genome=./working-genomes/361.fa --species=Purp --AUGUSTUS_CONFIG_PATH=/home/bio/apps/augustus-3.3.2/config/ --prot_seq=hints-annotation.faa --prg=gth --ALIGNMENT_TOOL_PATH=/home/bio/apps/gth-1.7.1-Linux_x86_64-64bit/bin --trainFromGth
# Wed Mar 27 20:01:51 2019: Logfile: /home/bio/Documents/Purpureocillium/March2019/braker/Purp/braker.log!
ERROR in file /home/bio/apps/BRAKER-2.1.2/scripts/braker.pl at line 5831
Failed to execute: /home/bio/apps/augustus-3.3.2/config/../bin/etraining --species=Purp --AUGUSTUS_CONFIG_PATH=/home/bio/apps/augustus-3.3.2/config /home/bio/Documents/Purpureocillium/March2019/braker/Purp/train.f.gb 1> /home/bio/Documents/Purpureocillium/March2019/braker/Purp/gbFilterEtraining.stdout 2>/home/bio/Documents/Purpureocillium/March2019/braker/Purp/errors/gbFilterEtraining.stderr

When I do:

less /home/bio/Documents/Purpureocillium/March2019/braker/Purp/errors/gbFilterEtraining.stderr

I get:

/home/bio/apps/augustus-3.3.2/config/../bin/etraining: ERROR
        Input file not in genbank format.

Find attached the genbank file

train.gb.zip

How can I fix it?
Thanks in advance,
Luis Alfonso.

AUGUSTUS_CONFIG_PATH not writeable

Hello!

I've been trying to use BRAKER but it seems that there is a problem with AUGUSTUS_CONFIG_PATH.
This is the command I run:

katerina87@WE11sv03:~/Tools/BRAKER2$ ./BRAKER/scripts/braker.pl -softmasking --genome=../RepeatMasker/polished_assembly1_x3.fasta.masked -cores=8 --AUGUSTUS_CONFIG_PATH=/opt/augustus-3.3.2/config --AUGUSTUS_BIN_PATH=/opt/augustus-3.3.2/bin --AUGUSTUS_SCRIPTS_PATH=/opt/augustus-3.3.2/scripts --bam=../../My_MinION/MMETSP/RNA_seqs_raw/Aligned.out.sorted.bam

And this is the error:

Use of uninitialized value $species in concatenation (.) or string at ./BRAKER/scripts/braker.pl line 1455.
Mon Mar 4 10:42:02 2019: braker.pl version 2.1.2

Mon Mar 4 10:42:02 2019: Configuring of BRAKER for using external tools...

Mon Mar 4 10:42:02 2019: Command line flag --AUGUSTUS_CONFIG_PATH was provided. Setting $AUGUSTUS_CONFIG_PATH in braker.pl to /opt/augustus-3.3.2/config.
Mon Mar 4 10:42:02 2019: ERROR: in file ./BRAKER/scripts/braker.pl at line 1453
AUGUSTUS_CONFIG_PATH/species (in this case /opt/augustus-3.3.2/config/) is not writeable.
There are 3 alternative ways to set this variable for braker.pl:
a) provide command-line argument --AUGUSTUS_CONFIG_PATH=/your/path
b) use an existing environment variable $AUGUSTUS_CONFIG_PATH
for setting the environment variable, run
export AUGUSTUS_CONFIG_PATH=/your/path
in your shell. You may append this to your .bashrc or
.profile file in order to make the variable available to all
your bash sessions.
c) braker.pl can try guessing the location of
$AUGUSTUS_CONFIG_PATH from an augustus executable that is
available in your $PATH variable.
If you try to rely on this option, you can check by typing
which augustus
in your shell, whether there is an augustus executable in
your $PATH
Be aware: the $AUGUSTUS_CONFIG_PATH must be writable for
braker.pl because braker.pl is a pipeline that
optimizes parameters that reside in that
directory. This might be problmatic in case you
are using a system-wide installed augustus
installation that resides in a directory that is
not writable to you as a user.

In the beginning I thought the problem was that I didnt have permission to write to the config folder but I get the same error even after becoming the owner of the augustus-3.3.2 folder.
In the meantime when I try to copy to my local folder running:

cp -r \texttt{/opt/augustus-3.3.2/config/ . export AUGUSTUS_CONFIG_PATH=./config export AUGUSTUS_BIN_PATH=/opt/augustus-3.3.2/bin export AUGUSTUS_SCRIPTS_PATH=/opt/augustus-3.3.2/scripts

I get this:
cp: target 'AUGUSTUS_SCRIPTS_PATH=/opt/augustus-3.3.2/scripts' is not a directory

which I don't understand since the path is correct.

Maybe I should also mention that augustus is executed through a link found in /usr/local/bin/augustus
Could the problem be that?
Any ideas that could help solve this, would be greatly appreciated.

Thank you!

Error with joingenes

Dear developers,
I am running BRAKER2 on a plant genome with RNA-seq and proteins of long evolutionary distance. But the program always died on the joingenes step. I am not sure.

I am running BRAKER as follow
perl braker.pl --species=myspecies --genome=genome.fasta --cores=8 \ --bam=hisat.sorted.bam --stranded=. --softmasking --UTR=on \ --prot_seq=protein.fa --prg=gth --gth2traingenes --gff3

BRAKER died with error like

Can't locate object method "tx_structures" via package "g45625.t1" (perhaps you forgot to load "g45625.t1"?) at braker.pl line 8402, line 5.

I can only find errors reported in errors/joingenes.err file like

Load warning: Did not expect feature "initial". Known features are "CDS", "UTR", "3'-UTR", "5'-UTR", "exon", "intron", "gene", "transcript", "tss", "tts", "start_codon" and "stop_codon". This feature is going to be ignored.
This warning may affect the result.
Load warning: Did not expect feature "terminal". Known features are "CDS", "UTR", "3'-UTR", "5'-UTR", "exon", "intron", "gene", "transcript", "tss", "tts", "start_codon" and "stop_codon". This feature is going to be ignored.
This warning may affect the result.
Load warning: Did not expect feature "internal". Known features are "CDS", "UTR", "3'-UTR", "5'-UTR", "exon", "intron", "gene", "transcript", "tss", "tts", "start_codon" and "stop_codon". This feature is going to be ignored.
This warning may affect the result.
Load warning: Did not expect feature "single". Known features are "CDS", "UTR", "3'-UTR", "5'-UTR", "exon", "intron", "gene", "transcript", "tss", "tts", "start_codon" and "stop_codon". This feature is going to be ignored.
This warning may affect the result.
Load warning: Did not expect feature "terminal". Known features are "CDS", "UTR", "3'-UTR", "5'-UTR", "exon", "intron", "gene", "transcript", "tss", "tts", "start_codon" and "stop_codon". This feature is going to be ignored.
This warning may affect the result.
Load warning: Did not expect feature "internal". Known features are "CDS", "UTR", "3'-UTR", "5'-UTR", "exon", "intron", "gene", "transcript", "tss", "tts", "start_codon" and "stop_codon". This feature is going to be ignored.
This warning may affect the result.
Load warning: Did not expect feature "initial". Known features are "CDS", "UTR", "3'-UTR", "5'-UTR", "exon", "intron", "gene", "transcript", "tss", "tts", "start_codon" and "stop_codon". This feature is going to be ignored.
This warning may affect the result.
Load warning: Did not expect feature "single". Known features are "CDS", "UTR", "3'-UTR", "5'-UTR", "exon", "intron", "gene", "transcript", "tss", "tts", "start_codon" and "stop_codon". This feature is going to be ignored.
This warning may affect the result.

Any suggestions on how to solve this problem would be greatly appreciated.

Best rgds.

question about splice sites

Dear Braker,
(your software has been great for me on my published genomes - thanks!!!)

In the code line 7303 in the latest Braker (December2018) there is a line --allow_hinted_splicesites=gcag,atac - these are non-cononical splice sites, which I know my beast of interest has. Does Braker by default allow non-cononical splice sites (depending on RNAseq or protein evidence)? or do you have to specify them, see the line below

In the developmental options:
--splice_sites (default GTAG) ... this says it is for the UTR regions. Is this for the UTR only, or do you need to specify the non-cononical splice sites for Braker to use in the main genic predictions?

cheers,

Peter Thorpe

Input file not in genbank format.

Hi,
I ran into this error:

/work/waterhouse_team/miniconda2/envs/braker2/bin//etraining: ERROR
        Input file not in genbank format.

Here is the full log output:

# Tue Apr 16 12:10:52 2019: braker.pl version 2.1.2

# Tue Apr 16 12:10:52 2019: Configuring of BRAKER for using external tools...

# Tue Apr 16 12:10:52 2019: Found environment variable $AUGUSTUS_CONFIG_PATH. Setting $AUGUSTUS_CONFIG_PATH to /work/waterhouse_team/miniconda2/envs/braker2/config/
# Tue Apr 16 12:10:52 2019: Found environment variable $AUGUSTUS_BIN_PATH. Setting $AUGUSTUS_BIN_PATH to /work/waterhouse_team/miniconda2/envs/braker2/bin/
# Tue Apr 16 12:10:52 2019: Found environment variable $AUGUSTUS_SCRIPTS_PATH. Setting $AUGUSTUS_SCRIPTS_PATH to /work/waterhouse_team/miniconda2/envs/braker2/bin/
# Tue Apr 16 12:10:52 2019: Did not find environment variable $GENEMARK_PATH  (either variable does not exist, or the path given in variable does not exist). Will try to set this variable in a different
 way, later.
# Tue Apr 16 12:10:52 2019: Trying to guess $GENEMARK_PATH from location of gmes_petap.pl executable that is available in your $PATH.
# Tue Apr 16 12:10:52 2019: Setting $GENEMARK_PATH to /work/waterhouse_team/apps/gm_et_linux_64/gmes_petap
# Tue Apr 16 12:10:52 2019: Did not find environment variable $BAMTOOLS_PATH (either variable does not exist, or the path given in variable does not exist). Will try to set this variable in a different 
way, later.
# Tue Apr 16 12:10:52 2019: Trying to guess $BAMTOOLS_BIN_PATH from location of bamtools executable that is available in your $PATH.
# Tue Apr 16 12:10:52 2019: Setting $BAMTOOLS_BIN_PATH to /work/waterhouse_team/miniconda2/envs/braker2/bin
# Tue Apr 16 12:10:52 2019: Did not find environment variable $SAMTOOLS_PATH  (either variable does not exist, or the path given in variable doesnot exist). Will try to set this variable in a different 
way, later.
# Tue Apr 16 12:10:52 2019: Trying to guess $SAMTOOLS_PATH from location of samtools executable in your $PATH.
# Tue Apr 16 12:10:52 2019: Setting $SAMTOOLS_PATH to /work/waterhouse_team/miniconda2/envs/braker2/bin
# Tue Apr 16 12:10:52 2019: Did not find environment variable $BLAST_PATH
# Tue Apr 16 12:10:52 2019: Trying to guess $BLAST_PATH from location of blastp executable that is available in your $PATH.
# Tue Apr 16 12:10:52 2019: Setting $BLAST_PATH to /work/waterhouse_team/miniconda2/envs/braker2/bin
# Tue Apr 16 12:10:52 2019: Did not find environment variable $PYTHON3_PATH
# Tue Apr 16 12:10:52 2019: Trying to guess $PYTHON3_PATH from location of python3 executable that is available in your $PATH.
# Tue Apr 16 12:10:52 2019: Setting $PYTHON3_PATH to /work/waterhouse_team/miniconda2/envs/braker2/bin
# Tue Apr 16 12:10:52 2019: Configuration of BRAKER for using external tools is complete!

# Tue Apr 16 12:10:52 2019: WARNING: /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp already exists. Braker will use existing files, if they are newer than the input files. You can ch
oose another working directory with --workingdir=dir or overwrite it with --overwrite.
# Tue Apr 16 12:10:53 2019: changing into working directory /lustre/scratch/waterhouse_team/braker/braker
cd /lustre/scratch/waterhouse_team/braker/braker

# Tue Apr 16 12:10:53 2019: Creating parameter template files for AUGUSTUS with new_species.pl

# Tue Apr 16 12:10:53 2019: new_species.pl will create parameter files for species NbV1ChF-NbRNASeq_Wlp in /work/waterhouse_team/miniconda2/envs/braker2/config//species/NbV1ChF-NbRNASeq_Wlp
perl /lustre/work-lustre/waterhouse_team/miniconda2/envs/braker2/bin/new_species.pl --species=NbV1ChF-NbRNASeq_Wlp --AUGUSTUS_CONFIG_PATH=/work/waterhouse_team/miniconda2/envs/braker2/config/ 1> /dev/nu
ll 2>/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/errors/new_species.stderr

# Tue Apr 16 12:11:06 2019: Converting bam files to hints
# Tue Apr 16 12:11:06 2019: Preparing hints for running GeneMark

# Tue Apr 16 12:11:06 2019: Filtering intron hints for GeneMark from /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/hintsfile.gff...
cat /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/genemark_hintsfile.gff.rnaseq | sort -n -k 4,4 | sort -s -n -k 5,5 | sort -s -k 3,3 | sort -s -k 1,1 | /lustre/work-lustre/waterhou
se_team/miniconda2/envs/braker2/bin/join_mult_hints.pl > /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/genemark_hintsfile.gff.rnaseq.tmp
mv /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/genemark_hintsfile.gff.rnaseq.tmp /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/genemark_hintsfile.gff

# Tue Apr 16 12:11:09 2019: Checking whether file /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/genemark_hintsfile.gff contains sufficient multiplicity information...
# Tue Apr 16 12:11:09 2019: Executing GeneMark-ET

# Tue Apr 16 12:11:09 2019: changing into GeneMark-ET directory /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/GeneMark-ET
cd /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/GeneMark-ET

# Tue Apr 16 12:11:09 2019: Executing gmes_petap.pl
perl /work/waterhouse_team/apps/gm_et_linux_64/gmes_petap/gmes_petap.pl --verbose --sequence=/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/genome.fa --ET=/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/genemark_hintsfile.gff --et_score 10 --max_intergenic 50000 --cores=1 1>/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/GeneMark-ET.stdout 2>/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/errors/GeneMark-ET.stderr

# Sun Apr 21 10:34:10 2019: change to working directory /lustre/scratch/waterhouse_team/braker/braker
cd /lustre/scratch/waterhouse_team/braker/braker

# Sun Apr 21 10:34:10 2019 Filtering output of GeneMark for generating training genes for AUGUSTUS
# Sun Apr 21 10:34:10 2019: Checking whether hintsfile contains single exon CDSpart hints
# Sun Apr 21 10:34:10 2019: filtering GeneMark genes by intron hints
perl /lustre/work-lustre/waterhouse_team/miniconda2/envs/braker2/bin/filterGenemark.pl --genemark=/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/GeneMark-ET/genemark.gtf --introns=/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/hintsfile.gff 1>/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/filterGenemark.stdout 2>/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/errors/filterGenemark.stderr

#Sun Apr 21 10:34:14 2019: downsampling good genemark genes according to poisson distribution with Lambda 2:
perl /lustre/work-lustre/waterhouse_team/miniconda2/envs/braker2/bin/downsample_traingenes.pl --in_gtf=/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/GeneMark-ET/genemark.f.good.gtf --out_gtf=/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/GeneMark-ET/genemark.d.gtf --lambda=2 1> /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/downsample_traingenes.log 2> /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/errors/downsample_traingenes.err

# Sun Apr 21 10:34:14 2019: training AUGUSTUS
# Sun Apr 21 10:34:14 2019: creating softlink from /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/GeneMark-ET/genemark.gtf to /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/traingenes.gtf.
ln -s /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/GeneMark-ET/genemark.gtf /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/traingenes.gtf
# Sun Apr 21 10:34:15 2019: Converting gtf file /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/traingenes.gtf to genbank file
# Sun Apr 21 10:34:15 2019: Computing flanking region size for AUGUSTUS training genes
# Sun Apr 21 10:34:16 2019: create genbank file /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/train.gb
perl /lustre/work-lustre/waterhouse_team/miniconda2/envs/braker2/bin/gff2gbSmallDNA.pl /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/traingenes.gtf /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/genome.fa 1477 /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/train.gb 2>/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/errors/traingenes.gtf_gff2gbSmallDNA.stderr

# Sun Apr 21 10:35:43 2019: $trainGb1 file /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/train.gb contains 77478 genes.
#  Sun Apr 21 10:35:43 2019: concatenating good and downsampled GeneMark training genes to /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/good_genes.lst.
cat /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/GeneMark-ET/genemark.d.gtf > /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/good_genes.lst
# Sun Apr 21 10:35:43 2019: Filtering train.gb for "good" mRNAs:
perl /lustre/work-lustre/waterhouse_team/miniconda2/envs/braker2/bin/filterGenesIn_mRNAname.pl /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/good_genes.lst /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/train.gb > /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/train.f.gb 2>/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/errors/filterGenesIn_mRNAname.stderr

# Sun Apr 21 10:35:48 2019: $trainGb2 file /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/train.f.gb contains 0 genes.
# Sun Apr 21 10:35:48 2019: Running etraining to catch gene structure inconsistencies:
/work/waterhouse_team/miniconda2/envs/braker2/bin//etraining --species=NbV1ChF-NbRNASeq_Wlp --AUGUSTUS_CONFIG_PATH=/work/waterhouse_team/miniconda2/envs/braker2/config/ /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/train.f.gb 1> /lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/gbFilterEtraining.stdout 2>/lustre/scratch/waterhouse_team/braker/braker/NbV1ChF-NbRNASeq_Wlp/errors/gbFilterEtraining.stderr

What did I miss?

Thank you in advance,

Michal

Incorporating stranded RNA-Seq

Extend BRAKER to separate stranded RNA-Seq, probably with samtools like this: https://www.biostars.org/p/92935/

Use counts of putative splice sites to find correct strandedness of data.

Treatment of stranded RNA-Seq is important for UTR prediction.

samtools sort code

Hi,
Running braker with RNAseq data (bam files) and UTR=on, I encountered one error at the samtools sort step:
In the original braker.pl the code is (line 8991):
$cmdString .= "$SAMTOOLS_PATH/samtools sort -\@ " .($CPU-1) . " -o $otherfilesDir/merged.s.bam " . "$otherfilesDir/merged.bam " . "1> $otherfilesDir/samtools_sort_before_wig.stdout " . "2> $errorfilesDir/samtools_sort_before_wig.stderr"; print LOG "\n$cmdString\n" if ($v > 3);

Looking at samtools sort:

Usage: samtools sort [options] <in.bam> <out.prefix>
Options: -n sort by read name
-f use <out.prefix> as full file name instead of prefix
-o final output to stdout
-l INT compression level, from 0 to 9 [-1]
-@ INT number of sorting and compression threads [1]
-m INT max memory per thread; suffix K/M/G recognized [768M]

So, I thought it may be not correct running it with with "-o as final output to stdout". I changed it to:

$cmdString .= "$SAMTOOLS_PATH/samtools sort -\@ " .($CPU-1) . " $otherfilesDir/merged.bam " . "$otherfilesDir/merged.s " . "1> $otherfilesDir/samtools_sort_before_wig.stdout " . "2> $errorfilesDir/samtools_sort_before_wig.stderr"; print LOG "\n$cmdString\n" if ($v > 3);

Deleting the "-o" option, where merged.bam is the input and merged.s is the output name which will be merged.s.bam

I hope this is right.

Thanks

PS. The braker version I am using: I did git clone braker last December'18, and I see in README.TXT file it is from August 22nd.

WARNING: Number of good genes is low (30).

I run braker2 with RNA-seq data for a green algae genome with low GC content.
I get the final output with warning:
WARNING: Number of good genes is low (30). Recommended are at least 600 genes

Then I run busco with eukarya database for augustus.hints.aa file generated by braker, the missing busco group is 87.5%.

How should I do for this genome?

ERROR in randomSplit.pl line 47: LOCUS names in genbank file are not unique!

Hi,
I'm trying to run BRAKER/2.10 using only proteins of short evolutionary distance. My script is :

braker.pl \
 --cores=8 \
 --softmasking=1 \
 --genome=/home/CAM/qlin/CE10_Genome/versions/1.9.2/repeatMasker_20181201_soft/CE10g_v1.92.fa.masked \
 --prot_seq=/home/CAM/qlin/LF10_Genome/BRAKER2/t3/braker/Sp_12/augustus.hints.filter.fa \
 --prg=gth \
 --gth2traingenes \
 --trainFromGth

The program failed with this error:
ERROR in randomSplit.pl line 47: LOCUS names in genbank file are not unique!
And I found this error in the gbFilterEtraining.stderr file:

mRNA contains character m
GBProcessor::getGeneList(): GBProcessor::getJoin( ):  failed!!!
Encountered error after reading 1 annotations.

Could you help me solve this problem? Thank you so much!

BRAKER on Conda Cloud

Hi,

Would it be possible to provide the newest version of BRAKER on the Anaconda cloud. Currently, BRAKER version 1.7 is available on the Anaconda Cloud: [https://anaconda.org/bioconda/braker].

Many thanks,

Which version of BRAKERv2.1.2 depends on GeneMark-ES?

Hi,

I install braker2 by conda install braker2, and download GeneMark-ESv4.38 at http://topaz.gatech.edu/GeneMark/license_download.cgi.

When I run braker.pl --species=test --genome=../00.ref/ref.fa --hints=../01.gth_alignment/merge.hints --epmode, I got a error

error on command line: /programs/GeneMark-ES/gm_et_linux_64/gmes_petap/gmes_petap.pl
Unknown option: ep_score
Unknown option: ep

gmes_petap.pl Algorithm options like:

Algorithm options
  --ES           to run self-training
  --fungus       to run algorithm with branch point model (most useful for fungal genomes)
  --ET           [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format)
  --et_score     [number]; 4 (default) minimum score of intron in initiation of the ET algorithm
  --evidence     [filename]; to use in prediction external evidence (RNA or protein) mapped to genome
  --training     to run only training step
  --prediction   to run only prediction step
  --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps)

Does this mean that the EP algorithm has been removed?

I want to use the protein sequence of unknown evolutionary distance to predict the genes of the genome. Looking forward to your answer

Shenglong

Alternative Transcripts input and output question

Does the --alternatives_from_evidence=true require a separate hints file supplied in the command script or will it use the ones generated by GeneMark in the output directory? I ran two jobs, one with this command and one without, the one without had more predicted genes listed in the output gtf/gff file. Does this command take alternative transcripts into account when predicting genes or will should it give an output file of alternate transcripts? Should I include this command when predicting genes from an unannotated non-model organism?

Secondly, I have a question about the predicted genes output. Several genes have the suffix t1, t2 etc., (e.g. g55.t1, g55.t2) do these indicate alternate transcripts/isoforms?

Thank you

YAML not install

I try to run braker but it shown the following error.

Perl module 'YAML' is required but not installed yet

I try to install YAML module by both conda and cpan.However, It's still produce this error. What should I do?

example files missing or test commands wrong

I was trying to test my BRAKER installation using test1.sh in directory example/tests. The braker command is

braker.pl --genome=../genome.fa --bam=../RNAseq.bam --softmasking --workingdir=$wd

however, there is no file RNAseq.bam in the example directory. Should it perhaps be

braker.pl --genome=../genome.fa --hints=../RNAseq.hints --softmasking --workingdir=$wd

Also, the documentation says there should be a result file called augustus.gff and augustus.gtf. Directory example/results/test? only contain files augustus.hints.gtf. Is this the same as augustus.gtf, i.e. the final resulting predictions of augustus?

Thanks for your help!

Problem wit GeneMark

Dear developers,

I am running BRAKER 2.1.2 on a plant genome with around 40K scaffolds. But the job is dying at some stage with GeneMark. GeneMark dies with the following error reported to the STDERR:

ERROR in file /Storage/progs/BRAKER-2.1.2/scripts/braker.pl at line 5307
Failed to execute: perl /Storage/progs/gm_et_linux_64_v4.38/gmes_petap/gmes_petap.pl --verbose --seq /Storage/data1/riano/Thalictrum/GenomeAnnotation/BRAKER/Masked/MaSuRCA_WT478/braker/MaSuRCA_WT478/genome.fa --max_intergenic 50000 --evidence /Storage/data1/riano/Thalictrum/GenomeAnnotation/BRAKER/Masked/MaSuRCA_WT478/braker/MaSuRCA_WT478/GeneMark-ETP/evidence.gff --et_score 10 --ET /Storage/data1/riano/Thalictrum/GenomeAnnotation/BRAKER/Masked/MaSuRCA_WT478/braker/MaSuRCA_WT478/genemark_hintsfile.gff --cores=1 --soft_mask 1000 1>/Storage/data1/riano/Thalictrum/GenomeAnnotation/BRAKER/Masked/MaSuRCA_WT478/braker/MaSuRCA_WT478/GeneMark-ETP.stdout 2>/Storage/data1/riano/Thalictrum/GenomeAnnotation/BRAKER/Masked/MaSuRCA_WT478/braker/MaSuRCA_WT478/errors/GeneMark-ETP.stderr

braker.log does not report any error. GeneMark-ETP.stdout has the following:

check before run
create directories
commit input data
data report
commit training data
training data report
prepare initial model
get GC of sequence
GC 36
build initial ET model
running step ET_A
running gm.hmm on local system
3 contigs in training
concatenate predictions: /Storage/data1/riano/Thalictrum/GenomeAnnotation/BRAKER/Masked/MaSuRCA_WT478/braker/MaSuRCA_WT478/GeneMark-ETP/run/ET_A_1
training level ET_A: /Storage/data1/riano/Thalictrum/GenomeAnnotation/BRAKER/Masked/MaSuRCA_WT478/braker/MaSuRCA_WT478/GeneMark-ETP/run/ET_A_1
From 261 loaded 232 and ignored dublications 29
exon no_match match_one match_two
Initial 7 5 0
Internal 8 11 28
Terminal 9 6 0
Single 4 0 0
CDS_no_match all short long seq_short seq_long
CDS_no_match 28 20 8 5927 13691
Intergenic all between_match seq_match
Intergenic: 18 2 2893
error, no valid sequences were found
error on call: /Storage/progs/gm_et_linux_64_v4.38/gmes_petap/make_nt_freq_mat.pl --cfg /Storage/data1/riano/Thalictrum/GenomeAnnotation/BRAKER/Masked/MaSuRCA_WT478/braker/MaSuRCA_WT478/GeneMark-ETP/run.cfg --section stop_TAG --format TERM_TAG

I am running BRAKER as the following:
braker.pl --etpmode --softmasking --species=$SP --genome=$ASSEMBLY --bam=${SP}.scf_gt1000bp.sorted.bam --hints=prot_hintsfile.aln2hints.gff --cores=$NSLOTS --AUGUSTUS_CONFIG_PATH=AugustusCONFIG --AUGUSTUS_BIN_PATH=/Storage/progs/Augustus-3.3.1-tag1/bin --AUGUSTUS_SCRIPTS_PATH=/Storage/progs/Augustus-3.3.1-tag1/scripts
prot_hintsfile.aln2hints.gff are hits to a related species (same family), generated in a previous run of BRAKER (unmasked genome) with GenomeThreader. The genome has been softmasked using RepeatModeller/RepeatMasker.

Any suggestion on how to carry on with genome annotation is greatly appreciated.
Thanks,
Diego

Changing genetic code table

It is currently not possible to switch genetic code table for running BRAKER. AUGUSTUS can use a different genetic code, but GeneMark-ES/ET must be extended before BRAKER can be extended.

BRAKER code extension is planned, depending on GeneMark-ES/ET extension.

Required changes:

command line option for genetic code table in BRAKER
translation script prior aa2nonred.pl (part of AUGUSTUS)
editing AUGUSTUS parameters prior training and prediction
command line option for getAnnoFastaFromJoingenes.pl in BRAKER

Can't locate File/HomeDir.pm in @INC

Dear Braker team,

I have this problem when running BRAKER ;

./braker.pl
Can't locate File/HomeDir.pm in @inc (you may need to install the File::HomeDir module) (@inc contains: /home/ulg/.local/share.cpan/build/GD-2.67-0/ /home/ulg/miniconda3/lib/perl5/site_perl/5.22.0/x86_64-linux-thread-multi /home/ulg/miniconda3/lib/perl5/site_perl/5.22.0 /home/ulg/miniconda3/lib/perl5/5.22.0/x86_64-linux-thread-multi /home/ulg/miniconda3/lib/perl5/5.22.0 .) at ./braker.pl line 21.
BEGIN failed--compilation aborted at ./braker.pl line 21.

cpanm File::HomeDir
File::HomeDir is up to date. (1.004)

Can you help me ?

/yourSpecies/genome.fa replaced by protein file.

/yourSpecies/genome.fa replaced by protein file.
When I run something like the following. i find that the /yourSpecies/genome.fa which should be the genome.fasta file gets replace with proteins.fa. Therefore the hints file comes up empty.
The filterIntronsFindStrand.stderr file are filled with the following error for each sequence of the genome file.
WARNING: 'Scbe7cn_1026_HRSCAF_1040' does not match any sequence in the fasta file. Maybe the two files do not belong together even though the sequences are there in the input file.

braker.pl --species=yourSpecies --genome=genome.fasta
--bam=file1.bam,file2.bam --prot_seq=proteins.fa
--prg=(gth|exonerate|spaln)

The hints file is empty

Hello, I am using BRAKER for predicting genes using a RNAseq bam file and protein fasta file. I consistently get the following error.

Tue May 21 04:24:52 2019: ERROR: in file /opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/braker-2.1.2-75wblifp2zieps5rf7tzp7ajcwvzo2oz/bin/braker.pl at line 4189

The hints file is empty. Maybe the genome and the RNA-seq file do not belong together. I do not know what this error means. Could you help me undersand what is going wrong?
The script I am using is
braker.pl --cores=1 --overwrite --prg=exonerate --BAMTOOLS_PATH=/opt/rit/spack-app/linux-rhel7-x86_64/gcc-7.3.0/bamtools-2.5.1-7rljcjuix7pff6pcjqnu6c5ztdhk4coj/bin/ --SAMTOOLS_PATH=/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/samtools-1.9-k6deogajvbc2bpx3csxjuwtmqh5w65nr/bin/ --PYTHON3_PATH=/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.6.3-u4oaxsbnvbz6s7yxztqvvirlipfjrnx7/bin/ --ALIGNMENT_TOOL_PATH=/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/exonerate-2.4.0-ddwi7zhv5cb4rzwsyhpe32jmqn22pmcm/bin/ --prot_seq=protein.faa --genome=genome.fasta --bam=ASP_rnaseq_sorted.bam --gff3

BRAKER error on call: parse_ET.pl

I'm trying the BRAKER v2.1.0, but it stops at the GeneMark-ET step.
Here's the exact error:
error, file not found ~/src/gm_et_linux_64/gmes_petap/parse_ET.pl: set.out
error on call: ~/src/gm_et_linux_64/gmes_petap/parse_ET.pl --section ET_A --cfg ~/src/BRAKER2/BRAKER_v2.1.0/example/run.cfg --v
However, if i run ~/src/gm_et_linux_64/gmes_petap/parse_ET.pl, it will work.

Any help or suggestions would be appreciated!

PS: where can i downloade the BRAKER v1

Failed to execute: mv .rnaseq.tmp

Hello,

I'm running BRAKER using only proteins from a closely related species, as per your recipe in Fig 5 of the (excellent, by the way) help page. Full command is below:

~/software/BRAKER/scripts/braker.pl \
--genome=genome.fa \
--species=my_species --useexisting \
--prot_seq=proteins.faa --prg=gth --gth2traingenes --trainFromGth --gff3 --cores=1 \
--AUGUSTUS_CONFIG_PATH=/path/to/config/ \
--AUGUSTUS_BIN_PATH=/path/to/bin/ \
--AUGUSTUS_SCRIPTS_PATH=/path/to/scripts/ \
--BAMTOOLS_PATH=/path/to/bamtools/bin/ \
--GENEMARK_PATH=/path/to/genemark/bin/ \
--SAMTOOLS_PATH=/path/to/samtools \
--ALIGNMENT_TOOL_PATH=/path/to/gth

It fails with these messages to STDOUT:

# Wed Apr 24 11:10:56 2019: Log information is stored in file /path/to/braker.log
Use of uninitialized value $genemark_hintsfile in concatenation (.) or string at ~/software/BRAKER/scripts/braker.pl line 5005.
Use of uninitialized value $genemark_hintsfile in concatenation (.) or string at ~/software/BRAKER/scripts/braker.pl line 5006.
Use of uninitialized value $genemark_hintsfile in concatenation (.) or string at ~/software/BRAKER/scripts/braker.pl line 5075.
mv: missing destination file operand after ‘.rnaseq.tmp’
Try 'mv --help' for more information.
ERROR in file ~/software/BRAKER/scripts/braker.pl at line 5079
Failed to execute: mv .rnaseq.tmp

and the last entry in braker.log is:

# Wed Apr 24 11:37:14 2019: Filtering intron hints for GeneMark from /path/to/my_species/hintsfile.gff...
cat .prot | sort -n -k 4,4 | sort -s -n -k 5,5 | sort -s -k 3,3 | sort -s -k 1,1 | /path/to/join_mult_hints.pl > .prot.tmp
mv .rnaseq.tmp

I'm guessing it thinks there should be some RNA-seq evidence, as per $genemark_hintsfile, and fails when mv throws an error when it does not find it? Is this a bug, or perhaps I am missing a flag in my command?

Many thanks,
reubwn

PS: I updated the repo immediately before running this command.

AUGUSTUS_CONFIG_PATH/species (in this case /.../Augustus/config/) is not writeable.

Hello!

I'm now trying to predict genes in novel genome with BRAKER.
But I've got same errors.
I run the BRAKER with the following command:

/.../BRAKER/scripts/braker.pl --genome=/.../final.genome.scf.masked.sort.fasta --prot_seq=/.../compgene.fasta --prg=gth --ALIGNMENT_TOOL_PATH=/.../gth-1.7.1-Linux_x86_64-64bit/bin --trainFromGth --cores 30 --AUGUSTUS_CONFIG_PATH=/.../Augustus/config/

But I've got the following error:

Use of uninitialized value $species in concatenation (.) or string at /.../BRAKER/scripts/braker.pl line 1498.
Sun Jun 9 15:07:46 2019: braker.pl version 2.1.3

Sun Jun 9 15:07:46 2019: Configuring of BRAKER for using external tools...

Sun Jun 9 15:07:46 2019: Command line flag --AUGUSTUS_CONFIG_PATH was provided. Setting $AUGUSTUS_CONFIG_PATH in braker.pl to /.../Augustus/config.
Sun Jun 9 15:07:46 2019: ERROR: in file /.../BRAKER/scripts/braker.pl at line 1496
AUGUSTUS_CONFIG_PATH/species (in this case /.../Augustus/config/) is not writeable.
There are 3 alternative ways to set this variable for braker.pl:
a) provide command-line argument --AUGUSTUS_CONFIG_PATH=/your/path
b) use an existing environment variable $AUGUSTUS_CONFIG_PATH
for setting the environment variable, run
export AUGUSTUS_CONFIG_PATH=/your/path
in your shell. You may append this to your .bashrc or
.profile file in order to make the variable available to all
your bash sessions.
c) braker.pl can try guessing the location of
$AUGUSTUS_CONFIG_PATH from an augustus executable that is
available in your $PATH variable.
If you try to rely on this option, you can check by typing
which augustus
in your shell, whether there is an augustus executable in
your $PATH
Be aware: the $AUGUSTUS_CONFIG_PATH must be writable for
braker.pl because braker.pl is a pipeline that
optimizes parameters that reside in that
directory. This might be problematic in case you
are using a system-wide installed augustus
installation that resides in a directory that is
not writable to you as a user.

So I checked whether the directory /.../Augustus/config/ is writable or not.
But it was obviously writable because I found the access permission like this:

[ ~ Augustus]$ ls -ltr
.....
drwxr-xr-x 7 ... qbg 72 ... config

How could I fix this?
Any ideas would be appreciated!!
Thank you in advance!

PDF/Image Links Issue

Hi, The links are not functioning. I get a 404 not found error.

missing RNAseq.bam from examples

There is a missing test file so test1.sh can not be completed. BTW would it be possible to have shorter tests or make the computations parallel at least?

Setting AUGUSTUS_BIN_PATH fails at bam2hints

Setting the AUGUSTUS_BIN_PATH variable causes an issue with bam2hints as it looks for this in bin/ this results in an incorrect path. Removing bin from AUGUSTUS_BIN_PATH fails with couldn't find AUGUSTUS.

AUGUSTUS_BIN_PATH =/opt/augustus-3.2.3/bin
Command that fails
/opt/augustus-3.2.3/bin/bin/bam2hints --intronsonly --in=/data/scratch/user/STAR_Aligned.out.bam --out=/data/scratch/user/braker/Sp_1/bam2hints.temp.gff 2>/data/scratch/user/braker/Sp_1/errors/bam2hints.0.stderr

getAnnoFastaFromJoingenes.py: error

My Braker2 would fail in the middle of the process and I couldn't find any similar problem or possible resolution online...this is my command and the error messages:

################################################

Command:

nohup time -v --output=time_braker2_RNAonly.log braker.pl --species=BUSCO_BUSCO_sunbird_trimPLK_881882613 --useexisting --cores=40 --genome=/home/cch/sunbird/sunbird_RepeatMasker_trim/sunbird_platanus_trim_kraken_scaff_gapclosed_1000.fa.masked --softmasking 1 --hints=/home/cch/sunbird/sunbird_HISAT2_trim/sunbird_trim_hints_AS_400K.gff,/home/cch/sunbird/sunbird_HISAT2_trim/sunbird_trim_hints_China_400K.gff --UTR=off &

Error messages:

In GeneMark-ET.stderr this message repetitively printed out:

(in cleanup) Can't call method "FETCH" on an undefined value at /usr/local/share/perl/5.22.1/Object/InsideOut.pm line 1953 during global destruction.

In getAnnoFastaJoingenes..stderr:

usage: getAnnoFastaFromJoingenes.py [-h] -g GENOME -f GTF -o OUT
[-t TRANSLATION_TABLE] [-s FILTER]
getAnnoFastaFromJoingenes.py: error: argument -o/--out: expected one argument

my nohup file gave error messages here:

Use of uninitialized value $_ in substitution (s///) at /opt/BRAKER/scripts/braker.pl line 7970.
ERROR in file /opt/BRAKER/scripts/braker.pl at line 7986
Failed to execute: perl /opt/Augustus/scripts/join_aug_pred.pl < /home/cch/sunbird/sunbird_BRAKER2_RNAonly_trim/braker/BUSCO_BUSCO_sunbird_trimPLK_881882613/augustus.tmp.gff > /home/cch/sunbird/sunbird_BRAKER2_RNAonly_trim/braker/BUSCO_BUSCO_sunbird_trimPLK_881882613/augustus.hints.gff

My braker.log file ends here:

#Thu Dec 27 20:33:51 2018: Making a gtf file from /home/cch/sunbird/sunbird_BRAKER2_RNAonly_trim/braker/BUSCO_BUSCO_sunbird_trimPLK_881882613/augustus.hints.gff
cat /home/cch/sunbird/sunbird_BRAKER2_RNAonly_trim/braker/BUSCO_BUSCO_sunbird_trimPLK_881882613/augustus.hints.gff | perl -ne 'if(m/\tAUGUSTUS\t/) {print $_;}' | perl /opt/Augustus/scripts/gtf2gff.pl --printExon --out=/home/cch/sunbird/sunbird_BRAKER2_RNAonly_trim/braker/BUSCO_BUSCO_sunbird_trimPLK_881882613/augustus.hints.tmp.gtf 2>/home/cch/sunbird/sunbird_BRAKER2_RNAonly_trim/braker/BUSCO_BUSCO_sunbird_trimPLK_881882613/errors/gtf2gff.augustus.hints.gtf.stderr

#Thu Dec 27 20:34:03 2018: AUGUSTUS prediction complete
#Thu Dec 27 20:34:03 2018: Making a fasta file with protein sequences of /home/cch/sunbird/sunbird_BRAKER2_RNAonly_trim/braker/BUSCO_BUSCO_sunbird_trimPLK_881882613/augustus.hints.gtf
/usr/bin/python3 /opt/Augustus/scripts/getAnnoFastaFromJoingenes.py -g /home/cch/sunbird/sunbird_BRAKER2_RNAonly_trim/braker/BUSCO_BUSCO_sunbird_trimPLK_881882613/genome.fa -f /home/cch/sunbird/sunbird_BRAKER2_RNAonly_trim/braker/BUSCO_BUSCO_sunbird_trimPLK_881882613/augustus.hints.gtf -o 1> /home/cch/sunbird/sunbird_BRAKER2_RNAonly_trim/braker/BUSCO_BUSCO_sunbird_trimPLK_881882613/getAnnoFasta..stdout 2>/home/cch/sunbird/sunbird_BRAKER2_RNAonly_trim/braker/BUSCO_BUSCO_sunbird_trimPLK_881882613/errors/getAnnoFastaJoingenes..stderr

UTRs annotations

Hi,
I have been running braker with RNAseq data (bam files) and UTR=on; I wanted to add UTRs annotations into my genes. After getting all the output considered (gtf and gff files, etc), I encountered this scenario on some (many) genes: the presence of more than one annotated 5'UTR and/or 3'UTR within the gene and the same isoform. See below for a simplified example:

Contig1001 AUGUSTUS gene 35608 37394 0.08 - . g8717 Contig1001 AUGUSTUS transcript 35608 37394 0.08 - . g8717.t1 Contig1001 AUGUSTUS tts 35608 35608 . - . transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS 3'-UTR 35608 35986 0.46 - . transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS exon 35608 36139 . - . transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS stop_codon 35987 35989 . - 0 transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS CDS 35987 36139 1 - 0 transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS intron 36140 36198 1 - . transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS CDS 36199 36313 1 - 1 transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS exon 36199 36313 . - . transcript_id "g8717.t1"; gene_id "g8717"; ... Contig1001 AUGUSTUS intron 36949 37004 1 - . transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS CDS 37005 37222 1 - 0 transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS exon 37005 37227 . - . transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS start_codon 37220 37222 . - 0 transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS 5'-UTR 37223 37227 1 - . transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS 5'-UTR 37281 37394 0.25 - . transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS exon 37281 37394 . - . transcript_id "g8717.t1"; gene_id "g8717"; Contig1001 AUGUSTUS tss 37394 37394 . - . transcript_id "g8717.t1"; gene_id "g8717";

When I look into the RNAseq profile (IGV) I understand more or less why it's annotated this way: lack of RNAseq coverage that would link both initially separated 5'UTRs (or 3'UTRs).

But my question is; I haven't seen this in other genome annotation files; is this any common in other braker annotations with UTR=on (this are two Illumina 150PE lanes which give ~30x coverage)? Should I merge/connect the double UTRs as one for further analysis?

Thank you

Number of cores for large genomes

Hi Katharina,
I have a question about setting the number of cores for the genome annotation. I have a eukaryotic genome (~500Mb). And it has a large number of scaffolds (39888 scaffolds). Most of the scaffolds are > 20Mb. And the rest of the scaffolds (14% of the genome) are fragmented. I set the --cores = 32 to annotate the genome in a fast way. Then I got this warning while running braker:

file genome.fa contains a highly fragmented assembly (39888 scaffolds). This may lead to problems when running AUGUSTUS via braker in parallelized mode. You set --cores=32. You should run braker.pl in linear mode on such genomes, though (--cores=1).

I wanted to make sure the annotation can be done in a relatively fast way. So I was wondering if there're other ways to speed up the annotation process? For example, can I split the genome into two parts, the longer scaffolds part and shorter scaffold part. And annotate the longer scaffolds (> 20Mb) with --cores=32 and shorter scaffold part with --cores=1? But I don't know if splitting the genome would also affect building the gene models.

Thank you for any suggestions on this!

Yiyuan

braker2/augustus gff3 output

Hi!

How can I transfer the output gff3 of the Braker2 ab initio gene annotation pipeline to a valid EMBL flat file that I can submit to ENA?

I tried using EMBLmyGFF3 (https://github.com/NBISweden/EMBLmyGFF3). The tool seems working fine, but the BRAKER gff3 seems to be non-standard and I am always getting warnings and error messages like this (just dropped the first lines of the log):

13:59:49 ERROR feature: >>stop_codon<< is not a valid EMBL feature type. You can ignore this message if you don't need the feature.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.
13:59:49 ERROR feature: >>start_codon<< is not a valid EMBL feature type. You can ignore this message if you don't need the feature.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.
13:59:51 ERROR feature: >>inferred_parent<< is not a valid EMBL feature type. You can ignore this message if you don't need the feature.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.
13:59:51 WARNING feature: Partial CDS. The CDS with ID = g5589.t1.braker.CDS2 not a multiple of three.
/home/meitel/.local/lib64/python2.7/site-packages/Bio/Seq.py:2071: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
BiopythonWarning)
13:59:52 WARNING feature: Partial CDS. The CDS with ID = g5848.t1.braker.CDS2 not a multiple of three.
13:59:52 WARNING feature: Partial CDS. The CDS with ID = g5903.t1.braker.CDS3 not a multiple of three.
13:59:52 WARNING feature: Partial CDS. The CDS with ID = g5825.t1.braker.CDS2 not a multiple of three.
13:59:52 WARNING feature: Partial CDS. The CDS with ID = g6051.t1.braker.CDS1 not a multiple of three.
13:59:52 WARNING feature: Partial CDS. The CDS with ID = g5770.t1.braker.CDS2 not a multiple of three.
13:59:52 WARNING feature: Partial CDS. The CDS with ID = g5770.t1.braker.CDS3 not a multiple of three.
13:59:52 WARNING feature: Partial CDS. The CDS with ID = g5770.t1.braker.CDS4 not a multiple of three.
13:59:52 WARNING feature: Partial CDS. The CDS with ID = g5770.t1.braker.CDS5 not a multiple of three.
13:59:52 WARNING feature: Partial CDS. The CDS with ID = g5770.t1.braker.CDS7 not a multiple of three.

I am basically getting this error for all genes and wonder if the braker2 gff is non-standard.

These are the first lines of the output gff3:

scaffold_154 AUGUSTUS gene 5501 7878 0.85 - . ID=g1.braker;
scaffold_154 AUGUSTUS mRNA 5501 7878 0.52 - . ID=g1.t1.braker;Parent=g1.braker
scaffold_154 AUGUSTUS stop_codon 5501 5503 . - 0 Parent=g1.t1.braker;
scaffold_154 AUGUSTUS CDS 5501 5587 0.85 - 0 ID=g1.t1.braker.CDS1;Parent=g1.t1
scaffold_154 AUGUSTUS exon 5501 5587 . - . ID=g1.t1.braker.exon1;Parent=g1.t1;
scaffold_154 AUGUSTUS intron 5588 7152 0.84 - . Parent=g1.t1.braker;
scaffold_154 AUGUSTUS CDS 7153 7878 0.57 - 0 ID=g1.t1.braker.CDS2;Parent=g1.t1
scaffold_154 AUGUSTUS exon 7153 7878 . - . ID=g1.t1.braker.exon2;Parent=g1.t1;
scaffold_154 AUGUSTUS start_codon 7876 7878 . - 0 Parent=g1.t1.braker;
scaffold_154 AUGUSTUS mRNA 6946 7878 0.33 - . ID=g1.t2.braker;Parent=g1.braker
scaffold_154 AUGUSTUS stop_codon 6946 6948 . - 0 Parent=g1.t2.braker;
scaffold_154 AUGUSTUS CDS 6946 7878 0.33 - 0 ID=g1.t2.braker.CDS1;Parent=g1.t2
scaffold_154 AUGUSTUS exon 6946 7878 . - . ID=g1.t2.braker.exon1;Parent=g1.t2;
scaffold_154 AUGUSTUS start_codon 7876 7878 . - 0 Parent=g1.t2.braker;
scaffold_154 AUGUSTUS gene 10822 13441 0.34 + . ID=g2.braker;
scaffold_154 AUGUSTUS mRNA 10822 13441 0.34 + . ID=g2.t1.braker;Parent=g2.braker
scaffold_154 AUGUSTUS start_codon 10822 10824 . + 0 Parent=g2.t1.braker;
scaffold_154 AUGUSTUS CDS 10822 10946 0.42 + 0 ID=g2.t1.braker.CDS1;Parent=g2.t1
scaffold_154 AUGUSTUS exon 10822 10946 . + . ID=g2.t1.braker.exon1;Parent=g2.t1;
scaffold_154 AUGUSTUS intron 10947 11358 0.42 + . Parent=g2.t1.braker;
scaffold_154 AUGUSTUS CDS 11359 11608 0.49 + 1 ID=g2.t1.braker.CDS2;Parent=g2.t1
scaffold_154 AUGUSTUS exon 11359 11608 . + . ID=g2.t1.braker.exon2;Parent=g2.t1;
scaffold_154 AUGUSTUS intron 11609 13147 0.38 + . Parent=g2.t1.braker;
scaffold_154 AUGUSTUS CDS 13148 13441 0.38 + 0 ID=g2.t1.braker.CDS3;Parent=g2.t1
scaffold_154 AUGUSTUS exon 13148 13441 . + . ID=g2.t1.braker.exon3;Parent=g2.t1;
scaffold_154 AUGUSTUS stop_codon 13439 13441 . + 0 Parent=g2.t1.braker;

Any suggestions/comments are highly appreciated.

Michael

braker with diamond

Hi @npavlovikj,
Could you please build a new Braker package which contains Diamond as a replacement
of Blast ( #25 )?

Thank you in advance,

Best wishes,

Michal

The output gff didn't contain any introns.

The output gff didn't contain any introns. It is abnormal for my data. What is the reasons for it ?

The hints file is empty. Maybe the genome and the RNA-seq file do not belong together

I'm try to run braker with the following code below
perl /home/chutima/wat_work_2/software/BRAKER/scripts/braker.pl --genome ref.fasta --bam=gmap_isoseq_sort.bam --BAMTOOLS_PATH=/home/chutima/wat_work_2/software/bamtools/bin/ --AUGUSTUS_CONFIG_PATH=/home/chutima/wat_work_2/software/Augustus/config/ -cores=10 --gff3 -AUGUSTUS_ab_initio --SAMTOOLS_PATH=/home/chutima/wat_work_2/software/samtools-1.9/

this is the header of bam file
c404_f29p8_2075c404 16 tig00000954_quiver 388229 40 4S395M82N108M126N87M106N81M79N135M98N222M106N81M109N162M96N191M100N142M124N96M99N139M659

This is the header of fasta file

tig00000954_quiver
AAGTTAATAATCCTTCCCCTGAATTAAAACAATTGTCTGCTCACCTACGTTATGCTTTCTTAGGAGAATCTTCTACTTTC
CAGTTATCATTTCAAAATGATTTAAGTAAAGAAGAAGAGGAAAAATTGTTGGATGTGTTAAAAAAGCATAAATCTGCCTT

I think both file have the same header name. Do you have any suggestion why that error happen.

bam file sorting

I get this error from bam2hints but it's coordinate sorted by samtools sort defaults. "BAM file MUST be sorted by target sequence names" any ideas why this is, should be coordinate sorted right?

utrrnaseq segmentation fault

Hi,

I am running BRAKER with UTR prediction, and I have spit my RNA-seq into plus and minus strands as recommended here: https://www.biostars.org/p/92935/

My command is:

braker.pl --species=c_incerta_v1.3 --genome=../../repeat_masking/Chlamydomonas_incerta.V3.softmask_v1.fa --softmasking --bam=../plus.bam,../minus.bam --stranded=+,- --UTR=on –-cores=16 --PYTHON3_PATH=/usr/local/bin/

I am getting the following segmentation fault error to standard out:

sh: line 1: 132194 Segmentation fault /home/craigror/programs/augustus-3.3.1/config//../bin/utrrnaseq --in-scaffold-file /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/genome.fa -C /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/stops.and.starts.gff -I /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/rnaseq.utr.hints -W /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/rnaseq_plus.wig -o /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/utrs_plus.gff -r 76 -v 100 -n 15 -i 0.7 -m 0.3 -w 70 -c 100 -p 0.5 > /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/rnaseq2utr_plus.stdout 2> /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/errors/rnaseq2utr_plus.err
ERROR in file /home/craigror/programs/BRAKER/scripts/braker.pl at line 8496
Failed to execute: /home/craigror/programs/augustus-3.3.1/config//../bin/utrrnaseq --in-scaffold-file /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/genome.fa -C /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/stops.and.starts.gff -I /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/rnaseq.utr.hints -W /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/rnaseq_plus.wig -o /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/utrs_plus.gff -r 76 -v 100 -n 15 -i 0.7 -m 0.3 -w 70 -c 100 -p 0.5 1> /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/rnaseq2utr_plus.stdout 2> /scratch/research/projects/chlamydomonas/Cincerta_deNovo/analysis/assembly_V3/BRAKER2/run_v1.3/braker/c_incerta_v1.3/errors/rnaseq2utr_plus.err!

The file rnaseq2utr_plus.stdout contains:

Read in of scaffold file finished successfully!
Read in of coding region file finished successfully!
Read in of intron file finished successfully!
Input Data procession finished successfully!

and rnaseq2utr_plus.err is empty. I can reproduce the fault if I run the final utrrnaseq command by itself. Any help would be much appreciated.

Cheers,
Rory

memory issue with V3.3

Wrong forum, sorry!

output from protein mapping pipeline is different to what BRAKER expects

Hi Katharina,

I am attempting to predict genes with BRAKER2 by using protein families from several related species as evidence. I have followed the README in the protein mapping pipeline which generated an introns.gff file.

In your BRAKER README, you mention that the hints file from this pipeline must be in the following format to work:

chrName ProSplign   intron  6591    8003    5   +   .   mult=5;pri=4;src=P
chrName ProSplign   intron  6136    9084    11  +   .   mult=11;pri=4;src=P

However, the introns.gff file is in the following format:

CSP28.scaffold163_cov73 ProSplign       Intron  1528203 1528347 1       +       .       tmp
CSP28.scaffold295_cov78 ProSplign       Intron  414858  414903  6       +       .       tmp

Providing this file to BRAKER results in an error.

Any idea where I've gone wrong? Column 9 containing the word tmp looks wrong, so perhaps this is an issue with the combine_gff_records.pl script? Happy to contact the authors of that code this is indeed the issue.

Thanks for your help,

Lewis

Can't locate Hash/Merge.pm

Getting this error during the GeneMark-ET part of the pipeline:

Can't locate Hash/Merge.pm in @INC (you may need to install the Hash::Merge module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .) at /home/arkadiy_garber/bin/gm_et_linux_64/gmes_petap/parse_by_introns.pl line 22.
BEGIN failed--compilation aborted at /home/arkadiy_garber/bin/gm_et_linux_64/gmes_petap/parse_by_introns.pl line 22.
        (in cleanup)    (in cleanup)  at /home/arkadiy_garber/anaconda3/lib/site_perl/5.26.2/Object/InsideOut.pm line 1953 during global destruction.

The Hash::Merge module is, indeed, installed. And the gmes_petap.pl script is executable otherwise:

arkadiy_garber@server:/scratch/1/arkadiy/PacBIO_data/assembly_and_trimmed_reads$ perl /home/arkadiy_garber/bin/gm_et_linux_64/gmes_petap/gmes_petap.pl
# -------------------
Usage:  /home/arkadiy_garber/bin/gm_et_linux_64/gmes_petap/gmes_petap.pl  [options]  --sequence [filename]

GeneMark-ES Suite version 4.38
   includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction

Input sequence/s should be in FASTA format

Algorithm options
  --ES           to run self-training
  --fungus       to run algorithm with branch point model (most useful for fungal genomes)
  --ET           [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format)
  --EP           [filename]; to run training with introns coordinates from protein splice alighnmnet (GFF format)
  --et_score     [number]; 10 (default) minimum score of intron in initiation of the ET algorithm
  --ep_score     [number]; 4 (default) minimum score of intron in initiation of the EP algorithm
  --evidence     [filename]; to use in prediction external evidence (RNA or protein) mapped to genome
  --training     to run only training step
  --prediction   to run only prediction step
  --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps)

Sequence pre-processing options
  --max_contig   [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig
  --min_contig   [number]; 50000 (default); will ignore contigs shorter then min_contig in training 
  --max_gap      [number]; 5000 (default); will split sequence at gaps longer than max_gap
                 Letters 'n' and 'N' are interpreted as standing within gaps 
  --max_mask     [number]; 5000 (default); will split sequence at repeats longer then max_mask
                 Letters 'x' and 'X' are interpreted as results of hard masking of repeats
  --soft_mask    [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length

Run options
  --cores        [number]; 1 (default) to run program with multiple threads 
  --pbs          to run on cluster with PBS support
  --v            verbose

Customizing parameters:
  --max_intron          [number]; default 10000 (3000 fungi), maximum length of intron
  --max_intergenic      [number]; default 10000, maximum length of intergenic regions
  --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step

Developer options:
  --usr_cfg      [filename]; to customize configuration file
  --ini_mod      [filename]; use this file with parameters for algorithm initiation
  --test_set     [filename]; to evaluate prediction accuracy on the given test set
  --key_bin
  --debug
# -------------------

thanks,
Arkadiy

Random fails with multiple BAM files

I run BRAKER with multiple BAM files (separated forward and reverse strand) and protein evidence. Several times it printed with a message like this:

WARNING: Format of hintsfile /data2/results2/gusev/jellyfish/annotation/braker2/braker/genemark_hintsfile.gff is incorrect in the last column, possibly src=tag is missing!

And later GeneMark-ETP fails with this or similar message:

error, unexpected ID format found on line: scaffold_2158        b2h     intron  1173736 1173804 1       -       i=4;src=E      scaffold_2158   b2h     intron  1173736 1173804 998     -       .       mult=998;pri=4;src=E

Sometimes restarting from scratch helps, some times it does not.

It seems like some lines in genemark_hintsfile.gff are garbled on merge, because
individual bam2hints.temp.*.gff seem to be correct.

I noticed that parallel processing in make_rnaseq_hints routine involves writting into the same file from different threads. See this line:

BRAKER/scripts/braker.pl

Line 4504 in e117150

$cmdString .= "cat $bam_temp >>$hintsfile_temp";

After changing line 4473 from

 my $pj = new Parallel::ForkManager($CPU);

 my $pj = new Parallel::ForkManager(1);

the error goes away (however, I did not do any elaborate testing). Obvisously, this is a temporary fix and the error (if I am right about the cause) should be fixed in a different way.

Hope this helps someone.

Change README.TXT to README.md

-> Watch AUGUSTUS Issue Gaius-Augustus/Augustus#16 -> implement the changes in BRAKER repository, too.

Error running gmes_petap.pl and ab-initio

I am trying to run BRAKER2 for a reference haplotype fungal genome.
braker.pbs

#!/bin/bash
#PBS -P OSR
#PBS -N ref_braker
#PBS -l select=1:ncpus=8:mem=64GB
#PBS -l walltime=20:00:00
#PBS -e logs/braker.err
#PBS -o logs/braker.out

projectDir=/scratch/OSR/canu4_annotation/ref
outDir=$projectDir/results/braker
inDir=$projectDir/results/star/star2pass

module load genemark-es/4.33
module load bamtools/2.5.1
module load blast+/2.7.1
module load samtools/1.9
module load python/3.7.2
module load genomethreader/1.7.1
module load makehub/i
module load eval/2.2.8
module load braker2/2.1.2

export AUGUSTUS_CONFIG_PATH=$outDir/augustus_config
export AUGUSTUS_BIN_PATH=/usr/local/augustus/3.3.2/bin
export AUGUSTUS_SCRIPTS_PATH=/usr/local/augustus/3.3.2/scripts

braker.pl --species=OSR --genome=$projectDir/ref/ref.fa --bam=$inDir/refAligned.sortedByCoord.out.bam --softmasking --UTR=on --ab_initio --cores=8 --fungus --crf --makehub [email protected] --workingdir=$outDir

braker.err

	The file ~/.gmkey exists and has not been copied.
Python 3.7.2 As we suffer from package overload, only minimal packages will be installed in this version.
Unknown option: ab_initio
Unknown option: makehub
Unknown option: email
ERROR in file /usr/local/braker2/2.1.2/braker.pl at line 5179
Failed to execute: perl /usr/local/genemark-es/4.33/gmes_petap.pl --verbose --sequence=/scratch/RDS-FAE-OSR-RW/canu4_annotation/ref/results/braker/genome.fa 
--ET=/scratch/RDS-FAE-OSR-RW/canu4_annotation/ref/results/braker/genemark_hintsfile.gff --et_score 10 --max_intergenic 50000 --cores=8 --fungus --soft_mask 1
000 1>/scratch/RDS-FAE-OSR-RW/canu4_annotation/ref/results/braker/GeneMark-ET.stdout 2>/scratch/RDS-FAE-OSR-RW/canu4_annotation/ref/results/braker/errors/Gen
eMark-ET.stderr

Can you help me figure out why gmes_petap.pl is not running? Also, why does braker not recognize ab_initio, makehub and email for makehub? Thank you.

Replace BLAST by DIAMOND

Replace BLAST by DIAMOND for aa2nonred.pl (originally an AUGUSTUS script). Intention: speed up BRAKER.

Augustus output is not produced

Hello,

I am trying to use BRAKER to annotate an insect genome. The GeneMark-ET step seems to succeed, but there is no output produced by Augustus and I cannot find the reason of that problem.

Below is the output files produced for my genome:

[slukiche@hmem00 braker]$ ls -lh
total 1.9G
-rw-rw-r-- 1 slukiche slukiche  900 Jun  7 01:56 augustus.hints.gff
-rw-rw-r-- 1 slukiche slukiche 384K Jun  6 12:45 bam_header.map
-rw-rw-r-- 1 slukiche slukiche 9.1M Jun  7 01:58 braker.log
drwxrwxr-x 2 slukiche slukiche    1 Jun  7 01:58 errors
-rw-rw-r-- 1 slukiche slukiche  417 Jun  7 01:15 filterGenemark.stdout
drwxrwxr-x 6 slukiche slukiche   12 Jun  7 01:15 GeneMark-ET
-rw-rw-r-- 1 slukiche slukiche 6.2K Jun  7 00:38 GeneMark-ET.stdout
-rw-rw-r-- 1 slukiche slukiche  11M Jun  6 18:53 genemark_hintsfile.gff
-rw-rw-r-- 1 slukiche slukiche 1.9G Jun  6 12:44 genome.fa
-rw-rw-r-- 1 slukiche slukiche 384K Jun  6 12:44 genome_header.map
-rw-rw-r-- 1 slukiche slukiche  11M Jun  6 18:53 hintsfile.gff
drwxrwxr-x 3 slukiche slukiche    1 Jun  7 01:15 species

errors folder only contains GeneMark-ET.stderr file with the error (in cleanup) Can't call method "FETCH" on an undefined value at /home/ulb/ebe/slukiche/perl5/lib/perl5/Object/InsideOut.pm line 1953 during global destruction., but GeneMark seems to run correctly.
The content of filterGenemark.stdout is:

Number of cds hints is 0
Average gene length: 3572
Average number of introns: 1.48159767565855
Good gene rate: 0.0447713908185422
Number of genes: 143889
Number of complete genes: 129413
Number of good genes: 5794
Number of one-exon-genes: 43795
Number of bad genes: 138095
Good intron rate: 0.191962402293721
One exon gene rate (of good genes): 0.304452882292026
One exon gene rate (of all genes): 0.304366560334702

The only one file produced by Augustus is augustus.hints.gff which contains the following information:

# This output was generated with AUGUSTUS (version 3.3.2).
# AUGUSTUS is a gene prediction tool written by M. Stanke ([email protected]),
# O. Keller, S. König, L. Gerischer, L. Romoth and Katharina Hoff.
# Please cite: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008),
# Using native and syntenically mapped cDNA alignments to improve de novo gene finding
# Bioinformatics 24: 637-644, doi 10.1093/bioinformatics/btn013
# Sources of extrinsic information: M RM E W P
# reading in the file /mnt/fhgfs/users/s/l/slukiche/genome_annotation/braker/braker/augustus_tmp/8256.001.ctg11933.fa.1..21181.hints ...
# Have extrinsic information about 1 sequences (in the specified range).
# Initializing the parameters using config directory /CECI/home/ulb/ebe/slukiche/tools/augustus/Augustus/config/ ...
# gonioctenaQuinquepunctata version. Using default transition matrix.

I wasn't able to find any error message from Augustus in the log file so I don't understand what is going on.

Substitution loop error in filterIntronsFindStrand.pl

I'm attempting to run BRAKER v2.1.2 on a genome with large chromosomes (the largest being over 1 Gb in size), and I ran into the following error message in the file errors/filterIntronsFindStrand.stderr:

Substitution loop at PATH/to/BRAKER-2.1.2/scripts/filterIntronsFindStrand.pl line 122, <FASTA> chunk 1.

Of course, the obvious fix is to split apart the chromosomes before running BRAKER - please do note that BRAKER then ran successfully with split chromosomes. However, I still wanted to note the error for this particular use case. Thank you.