Code Monkey home page Code Monkey logo

soapnuke's Introduction

Getting started

For 2.X:
make
./SOAPnuke

For 1.X:
# Make sure libraries like boost, zlib, log4cplus (1.0.3), openssl have been installed.
git clone https://github.com/BGI-flexlab/SOAPnuke.git
cd SOAPnuke/src
# Make sure those libraries required are accessible in ENV{LD_LIBRARY_PATH}
cmake .
make
./SOAPnuke

Introduction

SOAPnuke is a novel analysis tool developed for ultrafast quality control and preprocessing of high throughput sequencing (HTS) data. It consists of four modules: Filter, Filter-DGE, Filter-sRNA and Filter-meta. The first module is designed for general usage, while the rest three with suitable parameters or tailored functions are for datasets from specific experiments as their names suggest. All of these modules combines quality control and preprocessing to speed up the report on statistics graphs of raw datasets, preprocessed datasets and preprocessing status. Moreover, Hadoop MapReduce framework is introduced to provide great acceleration for parallel large datasets performing with good scalability, while it also supports standalone running.

All usages start with executable file SOAPnuke, and different modules are invoked with different sub-commands: filter for Filter, filterdge for Filter-DGE, filtersrna for Filter-sRNA and filtermeta for Filter-meta module.

Availability

SOAPnuke is released under GPLv3. The latest source code is freely available at github. Before compiling our program, please make sure that you have installed these following libraries: boost, zlib, log4cplus (1.0.3) and openssl. After you acquire the source code, modify the CmakeLists.txt according to locations of libraries mentioned above. Then, you can use cmake . and make to compile.

Frequently asked questions (FAQs)

  1. What types of data format does SOAPnuke work with?
  2. What types of data does SOAPnuke work with?
  3. Can I choose a part of quality-control statistics instead of all?

1. What types of data format does SOAPnuke work with?

SOAPnuke only works with sequence data of FASTQ format temporarily, but we are considering supporting SAM/BAM to provide operations on aligned datasets.

2. What types of data does SOAPnuke work with?

Filter module are designed to deal with sequences from most experiments. Compared to it, the other three modules with edited parameter-sets and tailored functions deal with specific data. These designs have refered to existing pipelines of data preprocessing.

Filter-DGE module deals with DGE datasets. DGE reads are all single-end so only one input file is available. Each read has two adapters at both ends and a ‘CATG’ segment neighboring targeted sequences, which is 17bp in length, in direction of 5’ ends. We also consider those features when designing parameters.

Filter-sRNA module deals with small-RNA reads. In sRNA experiment, mRNA sequences must be filtered out, so a parameter of mRNA_FilePath is available. In addition, functions are reduced with reference to existing pipelines.

Filter-meta module deals with metagenomics data. It does not have new functions since meta data do not have unique features compared to other data. However, we still modify its functions and parameters according to existing pipelines.

3. Can I choose a part of quality-control statistics instead of all?

We will realize this feature in the following versions, but all those statistics will be computed for now.

soapnuke's People

Contributors

youngchan919 avatar huangzhibo avatar lishengkang avatar zym1905 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.