Code Monkey home page Code Monkey logo

reciprocalblast's Introduction

reciprocalBlast

Protocol to perform a reciprocal blast for comparison of two de-novo gene annotations based on different genome assemblies.

Reciprocal Best Hits (RBH) blast is a common method for infering putative orthologs.

Requisites:

Forward blastn

Create a database with genome2 gene sequences.

mkdir reciprocal_blast
cd ./reciprocal_blast

# DB of genes in genome2.
mkdir genome2_db
makeblastdb -in <path/to/genome2_seq.fasta> -dbtype nucl -out <path/to/genome2_db>

Create a directory with a fasta file for each gene in genome1.

mkdir genome1_genes

python parse_genes.py -f <path/to/genome1_seq.fasta> -d <path/to/genome1_genes>

ls | wc -l # check that there are as many fasta as there are genes in genome1.

Blastn of genes in genome1 to genome2.

mkdir blast_forward
cd blast_forward/
nohup ../blast_forward.sh &

Backward blastn

Create a database with genome1 gene sequences.

# DB of genes in genome1.
mkdir genome1_db
makeblastdb -in <path/to/genome1_seq.fasta> -dbtype nucl -out <path/to/genome1_db

Create a directory with a fasta file for each gene in genome2.

mkdir genome2_genes

python parse_genes.py -f <path/to/genome2_seq.fasta> -d <path/to/genome2_genes>

ls | wc -l # check that there are as many fasta as there are genes in genome2.

Blastn of genes in genome2 to genome1 with only one alignment. because you only want to record the best alignment partner in the genome.

mkdir blast_backward
cd blast_backward/
nohup ../blast_backward.sh &

RBH analysis

Anlaysis of results from forward and backward blastn.

Parse xml files from both blastn to get the correspondence between genes in both genomes. A RBH is considered when genome1 gene's hit (genome2 gene) in forward blast has as best hit the genome1 gene in backward blast.

nohup python -u ../parse_xml_reciprocal_blast.py -blastf </path/to/ blast forward xml files directory> -blastb </path/to/  blast backward xml files directory> -o </path/to/ output directory> &

Output:

  • tab separated table with the following columns:
Genome1 gene Corresponding genome2 gene RBH
<str> <str> <str>

"RBH" string in RBH column indicates genome1 gene has a corresponding genome2 gene indicated in "Corresponding genome2 gene" column.

reciprocalblast's People

Contributors

miriammarins avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.