Code Monkey home page Code Monkey logo

varus-braker's Introduction

varus-braker

Small pipeline combining raw genome-data and VARUS.bam (or RNA-seq data) and run BRAKER3

Software dependencies

BRAKER3 from https://github.com/Gaius-Augustus/BRAKER

The best way to use singularity container:

singularity build braker3.sif docker://teambraker/braker3:latest

VARUS from https://github.com/Gaius-Augustus/VARUS

BUSCO from https://busco.ezlab.org/

hisat2 https://github.com/DaehwanKimLab/hisat2

sra-toolkit https://github.com/ncbi/sra-tools requeired version >= 3.0.2
Important: GeneMark-ETP has sra-tools included, but v3.0.1

GeneMark-ETP https://github.com/gatech-genemark/GeneMark-ETP

Usage

Input should be a table with at least 1 column:

See table_example.tbl Columns are divided by tab You may use without DNA-link, will try to find genome in ncbi-databse. RNA-files should have "_1" and "_2" parts in names.

SPECIES_NAME DNA_LINK RNA_LINKS[optional,ID1_1,ID1_2,ID2_1,ID2_2 ... ]
Drosophila melanogaster https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/215/GCF_000001215.4_Release_6_plus_ISO1_MT/GCF_000001215.4_Release_6_plus_ISO1_MT_genomic.fna.gz   /home/user/genomes/RNA/SRR13030903_1.fastq,/home/user/genomes/RNA/SRR13030903_2.fastq
Tribolium castaneum     https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/335/GCF_000002335.3_Tcas5.2/GCF_000002335.3_Tcas5.2_genomic.fna.gz 
Bombyx mori
#You may add commentaries into the table and also use local files, need absolute path
Drosophila Melanogaster /home/user/genomes/Drosophila/genome.fasta.masked
Populus trichocarpa     /home/user/genomes/Populus/genome.fasta.masked
Arabidopsis thaliana
Caenorhabditis elegans

Command line:

varus-braker.py --input table.txt

Configuration

File config.ini should have pathways to all executables.

[VARUS]
varus_path = /home/user/VARUS
hisat2_path = /home/user/hisat2
sratoolkit_path = /home/user/sratoolkit.3.0.2-ubuntu64/bin
batchsize = 100000
maxbatches = 1000

[BRAKER]
braker_cmd = /home/user/braker3/BRAKER/scripts/braker.pl 
augustus_config_path = /home/user/Augustus/config
augustus_bin_path = /home/user/Augustus/bin
augustus_scripts_path = /home/user/Augustus/scripts
diamond_path = /home/user/
prothint_path = /home/user/ProtHint/bin
genemark_path = /home/user/GeneMark-ETP/bin
orthodb_path = /home/user/orthodb-clades
excluded = species


[SLURM_ARGS]
partition = snowball
cpus_per_task = 36
module_load = module load bedtools

In case using singularity containter :

[VARUS]
varus_path = /home/user/VARUS/
hisat2_path = /home/user/hisat2
sratoolkit_path = /home/user/sratoolkit.3.0.2-ubuntu64/bin
batchsize = 100000
maxbatches = 1000


[BRAKER]
braker_cmd = singularity exec braker.sif braker.pl 
genemark_path = /home/user/GeneMark-ETP/bin
excluded = species

[SLURM_ARGS]
partition = snowball
cpus_per_task = 48
module_load = module load bedtools
    module load singularity

Please choose the optimal batchsize and maxbatches values: huge batchsize could lead to increasing running time.

Accuracies

Gene Transcript Exon
Sn Sp Sn Sp Sn Sp
Caenorhabditis elegans
batchsize = 1500000 maxbatches = 200 Total = 300M 71.65 83.64 54.17 74.62 80.16 94.40
batchsize = 100000 maxbatches = 1000 Total = 100M 66.19 82.86 49.53 74.93 74.30 94.65
batchsize = 75000 maxbatches = 600 Total = 45M 71.54 85.34 54.21 76.23 78.67 94.84
Arabidopsis thaliana
batchsize = 1500000 maxbatches = 200 Total = 300M 82.63 82.86 57.66 77.95 82.44 93.96
batchsize = 100000 maxbatches = 1000 Total = 100M 82.54 82.96 57.58 78.09 82.20 94.01
batchsize = 75000 maxbatches = 600 Total = 45M 82.64 83.08 57.64 78.14 82.12 94.08
Populus trichocarpa
batchsize = 1500000 maxbatches = 200 Total = 300M 77.93 88.46 63.15 80.83 85.48 94.50
batchsize = 100000 maxbatches = 1000 Total = 100M 78.16 88.53 63.29 81.38 85.57 94.69
batchsize = 75000 maxbatches = 600 Total = 45M 78.33 88.62 63.40 81.52 85.59 94.82
Medicago truncatula
batchsize = 1500000 maxbatches = 200 Total = 300M 50.28 72.97 50.28 65.27 77.79 88.07
batchsize = 100000 maxbatches = 1000 Total = 100M 50.53 72.91 50.53 65.26 77.79 88.22
batchsize = 75000 maxbatches = 600 Total = 45M 50.56 72.95 50.56 65.56 77.71 88.46
Parasteatoda tepidariorum
batchsize = 1500000 maxbatches = 200 Total = 300M 49.02 59.55 41.56 53.45 55.76 86.67
batchsize = 100000 maxbatches = 1000 Total = 100M 46.47 59.71 39.07 53.89 51.97 87.24
Danio rerio
batchsize = 1500000 maxbatches = 200 Total = 300M 56.67 69.55 35.42 66.57 66.07 91.98
batchsize = 100000 maxbatches = 1000 Total = 100M 56.79 70.72 35.36 67.76 65.34 92.60
batchsize = 75000 maxbatches = 600 Total = 45M 56.84 71.11 35.30 68.06 65.85 92.73
Drosophila Melanogaster
batchsize = 1500000 maxbatches = 200 Total = 300M 86.45 90.11 60.03 82.61 83.88 95.34
batchsize = 100000 maxbatches = 1000 Total = 100M 86.89 89.85 60.07 81.99 83.47 94.95
batchsize = 75000 maxbatches = 600 Total = 45M 87.85 90.31 61.19 83.11 84.16 95.43
Solanum lycopersicum
batchsize = 1500000 maxbatches = 200 Total = 300M 46.34 47.79 37.54 47.50 83.88 94.52
batchsize = 100000 maxbatches = 1000 Total = 100M 46.23 47.76 37.43 47.42 83.40 94.47

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.