Code Monkey home page Code Monkey logo

gowsh's Introduction

logo

GOWSH

Perl homology searcher based on webscrapping and heuristic approaches. It's supposed to look up in HomoloGene, Ensemble and Inparanoid after running Bidirectional best hit algorithm (BDBH).

Getting Started

Clone the repo on local:

git clone https://github.com/carrascomj/gowsh

Add script to path (on your bash initialization file; e.g., ~/bashrc):

export PATH=$PATH:"path/to/gowsh/bin"

The program requires additional packages that can be installed with cpanm, if not already done:

cpanm JSON Data::Dumper Bio::SeqIO LWP::Simple File::Basename Getopt::Long XML::Parser

Alternatively, one could install WebAPIsGOWSH as an usual perl package (on 'gowsh/' directory):

perl Makefile.PL
make
make install

Finally, formatdb and blast+ are both required.

Usage

gowsh.pl is the main script. The program takes command-line arguments with the following options:

gowsh.pl --gfile|go|glist "path_to_file|GOid|list" --tfile|torg "path_to_file|organism"
    [--modelf|modelo] "path_to_file|organism" --out "outfile" --preserve

    --gfile path_to_file: input, genes as multiFASTA
    --go GOid: input, Genetic Ontology ID (as in AmiGO)
    --glist list: input, blank separated list gene IDs
    --tfile path_to_file: multiFASTA containing proteins of genome of target organism
    --torg organism: target organism name (genus and specie)
    --modfile path_to_file: optional, multiFASTA containing proteins of genome of model organism
    --modorg organism: optional, model organism name (genus and specie)
    --out "outfile": optional, name of output file; default "GOWSH_output.txt"
    --preserve: optional, if it's added, (nearly) all files generated will be preserved.

Running the test

The script can be tested wit the following command:

gowsh.pl --go 0048507 --modorg "arabidopsis thaliana" --torg "oryza sativa"

You can compare the output with the file "t/GOWSH_outputq1.tsv".

The program will then parse the input file, download both genomes from NCBI and try to match homologues.

What I Learned

This code was developed as a project for one subjects of my BSc in Biotechnology (UPM). To sum up, I learned the following concepts:

  • Webscrapping biological information using Perl and mygene API.
  • Use of Entrez E-utilities programmatic access API from NCBI.
  • Use of Ensembl REST API.
  • Run BLAST on local using blast+.
  • Heuristic algorithms to account for homology.
  • How to build a Perl package.
  • How to write a README.md.

gowsh's People

Contributors

carrascomj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.