Code Monkey home page Code Monkey logo

halign-2's Introduction

HAlign-II

HAlign-II is a Java based software, which can align multiple nuleotide/protein sequences stand-alone on Hadoop cluster. Hadoop parallel computing environment has a faster alignment speed. Additionally, if a Hadoop cluster environment is not ready, you can use its stand-alone mode to start your work. But when your sequence files are large (more than 1GB), we recommend that you'd better to run on the Hadoop cluster to save valuable time.

Home page: http://lab.malab.cn/soft/halign
Reference: Shixiang Wan and Quan Zou, HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing, Algorithms for Molecular Biology, 2017, 12:25. view

Other implementations:

Development Environment

  • JDK 1.8
  • Hadoop 2.7.2
  • Spark 2.0.2
  • Intellij IDEA (Maven)

Usage

1. Stand-alone mode

# java -jar HAlign2.1.jar <mode> <input-file> <output-file> <algorithm>
  • mode: -localMSA, -localTree.
  • input-file: local fasta format file, required.
  • output-file: local fasta format file, just a file name, required.
  • algorithm: sequence alignment algorithms, required for -localMSA mode, but none for -localTree mode. 0 represents the suffix tree algorithm, the fastest, but only for DNA/RNA; 1 represents the KBand algorithm based BLOSUM62 scoring matrix, only for Protein; 2 represents the KBand algorithm based on affine gap penalty, only for DNA/RNA; 3 represents the trie tree alignment algorithm, but slower and only for DNA/RNA; 4 represents the basic algorithm based the similarity matrix, the slowest and only for DNA/RNA. But it is the most accurate in the case of lower sequences similarity.

2. Hadoop cluster mode

# hadoop jar HAlign2.1.jar <mode> <input-file> <output-file> <algorithm>
  • mode: -hadoopMSA.
  • input_file/output_file/algorithm type: same as stand-alone mode.

3. Spark cluster mode

# spark-submit --class main HAlign2.1.jar <mode> <input-file> <output-file> <algorithm>
  • mode: -sparkMSA, -sparkTree.
  • input-file: local fasta format file, required.
  • output-file: local fasta format file, just a file name, required.
  • algorithm: sequence alignment algorithms, required for -sparkMSA mode, but none for -sparkTree mode. 0 represents the suffix tree algorithm, the fastest, but only for DNA/RNA; 1 represents the KBand algorithm based BLOSUM62 scoring matrix, only for Protein.

Update

  • 2020-07-16, version 2.1.2:
    • bug fix
  • 2016-11-25, version 2.1.1:
    • Fix some bugs about protein sequences alignment on multi-thread technique.
    • Fix some bugs about file I/O.
  • 2016-11-14, version 2.1.0:
    • Add version: comment for english.
  • 2016-09-07, version 2.0.0:
    • Basic functions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.