Code Monkey home page Code Monkey logo

amdn / genoogle Goto Github PK

View Code? Open in Web Editor NEW

This project forked from felipealbrecht/genoogle

0.0 1.0 0.0 32.44 MB

Genoogle is a software for fast similar DNA sequence searching. It is a full functional similar genetics searching tool, having a text mode interface with simple scripting language, web interface, and web service interface. Genoogle is fast, easy and completely free. More info at https://www.evernote.com/pub/felipealbrecht/genoogle and http://arxiv.org/abs/1507.02987

License: Other

Shell 1.30% Protocol Buffer 0.30% Java 97.37% HTML 0.36% XSLT 0.67%

genoogle's Introduction

Genoogle

Fundamental Information

Genoogle is software for similar DNA sequences searching developed by Felipe Albrecht: home page, email contact.

Genoogle uses indexing and parallel processing techniques and it is developed in Java. Genoogle is free and open source. The name Genoogle comes from Genes + Google, the final world domination plan is to develop a software to locate genes likes Google is to locate information in the Web and Genoogle does not have any affiliation with Google Inc and I hope its name will not cause problem.

It is a Beta Version of Genoogle, it means: it lacks some features and have some know and a lot of unknow bugs. So, I hope that the users (YOU!) will inform me about bugs and features which you will like to have.

If you really want to develop something in to Genoogle, contact me.

Features

Actual Features:

  • Fast similar sequences searching.
  • Really good sensibility.
  • Text mode interfaces.
  • Web Services Interface.
  • Very Simple Web interface, but support for JSP.
  • Good memory requirements. (For a 4 gigabytes data bank, it is necessary not more than 4 gigabytes of RAM memory).
  • Working (and tested) at Windows and Linux.
  • Data banks with more than 8 gigabytes.
  • Console and batch interfaces.

Missing and planned features:

  • Better web interface.
  • RNA indexing and searching sequences.

Missing and not (for so soon) planned features:

  • Proteins indexing and searching. (It will be a big work to implement it, but it is possible)
  • Clusters implementation. (May be my Ph.D. project)

Intallation

Requirements

To run Genoogle it is necessary:

  • JRE >= 1.6 and the environment variable JAVA_HOME should inform where the JRE is, by example: JAVA_HOME="/usr/lib/jvm/java-6-sun"
  • Ram Memory: The Genoogle memory requirement is approximately 80% of the data bank size more approximately 40Mbs for Java run time.

Installation process

  • Download the package here (TBA)
  • Unpack
  • Copy the fasta files data banks into the fasta/files folder.
  • Configure the conf/genoogle.xml file and insert the copied files at the <genoogle:split-databanks> section as new <genoogle:databank>:

<genoogle:split-databanks name="RefSeq" path="files/fasta" mask="111010010100110111" number-of-sub-databanks="1" sub-sequence-length="11"> <genoogle:databank name="Cow" path="cow.rna.fna" /> <genoogle:databank name="Frog" path="frog.rna.fna" /> </genoogle:split-databanks> ```

  • Run the format_db.sh script.
  • Wait while the data bank is formatted and the inverted index processed.
  • Execute :
    • run_web.sh, for webservices, web page and col),
    • or run_standalone_web.sh, for web page that will access Genoogle by webservice,
    • or run_console.sh, for console only interface.
  • Have fun!

Searching

Genoogle has two interfaces: a very simple web page, text mode console, and WebServices interface.

To do the search using the web page is very simples. Open the address localhost:8080 at your browser and put the query sequence in the input box and click Search sequence button. Wait and the results will be shown. The console interface is much better!

To use the WebServices, please check their wiki.

Console interface

The console interface has the following commands:

  • search : does the search.
  • list : lists the data banks.
  • parameters : shows the search parameters and their values.
  • set = : set the parameter value.
  • gc : executes the java garbage collection.
  • prev or l : executes the last command.
  • batch : runs the commands listed in this batch file.
  • exit : finish Genoogle execution.

The search parameters are:

  • MaxSubSequenceDistance : maximum index entries distance to be considered in the same HSPs.
  • SequencesExtendDropoff : drop off for sequence extension.
  • MaxHitsResults : maximum quantity of returned results.
  • QuerySplitQuantity : how many slices the input query will be divided.
  • MinQuerySliceLength : minimum size of each input query slice.
  • MaxThreadsIndexSearch : quantity of threads which will be used to index search. ( Should be MaxThreadsIndexSearch <= QuerySplitQuantity * 2).
  • MaxThreadsExtendAlign : quantity of threads which will be used to extend and align the HSPs.
  • MatchScore : score when has a match at the alignment.
  • MismatchScore : score when has a mismatch at the alignment.

An example of search is shown bellow:

search Genomes_RefSeq BA000002 result_file QuerySplitQuantity=2 MaxThreadsIndexSearch=2 MaxHitsResults=20

This search, make a search at the Genomes_RefSeq databank, using as input the file BA000002 and the results will be saved at "result_file.xml" file. The input query will be split in to 2 parts and will be used 2 threads to do the search of the input query sub-sequences at the inverted index. At the end will be returned to the user, the 20 better scores.


Dependencies

Genoogle uses:

All these libraries are in the directory https://github.com/felipealbrecht/Genoogle/tree/master/lib

genoogle's People

Contributors

amdn avatar felipealbrecht avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.