Code Monkey home page Code Monkey logo

bsvf's Introduction

BSVF

Bisulfite Sequencing Virus integration Finder

Attention

For directional libraries only. PBAT and indirectional libraries are NOT supported.

Dependencies

bwa-meth 0.10 depends on

Install

Run pip install toolshed.

Run src/install.sh.

In case EMBOSS failed to install, you'll need to download the binary from above sites. And put water of EMBOSS in to ./bin. Or, just link water to ./bin.

Your bsIntegration/bin/ should be like this:

-rwxr-xr-x  398860 Feb 20 00:48 bwa
-rwxr-xr-x   21892 Sep  1 08:37 bwameth.py
-rwxr-xr-x   27040 Feb 20 01:14 water
-rwxr-xr-x  971772 Feb 20 00:48 samtools

Usage

./bsuit <command> <config_file>

./bsuit prepare prj.ini
./bsuit aln prj.ini
./bsuit grep prj.ini
./bsuit analyse prj.ini

a Logo

Test Run

mkdir sim90 && cd sim90 && ./simVirusInserts.pl GRCh38_no_alt_analysis_set.fna.gz X04615.fa.gz s90 && cd ..
mkdir sim50 && cd sim50 && ./simVirusInserts.pl GRCh38_no_alt_analysis_set.fna.gz X04615.fa.gz s50 50 ../sim90/s90.ini && cd ..

./bsuit prepare sim90/s90.ini

./bsuit aln sim90/s90.ini
./run/s90_aln.sh
./bsuit grep sim90/s90.ini
./bsuit analyse sim90/s90.ini

./bsuit aln sim50/s50.ini
./run/s50_aln.sh
./bsuit grep sim50/s50.ini
./bsuit analyse sim50/s50.ini

Reference Files

Simulation

./simVirusInserts.pl GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz HBV.X04615.fa sim150 150

GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

A gzipped file that contains FASTA format sequences for the following:

  1. chromosomes from the GRCh38 Primary Assembly unit.
    Note: the two PAR regions on chrY have been hard-masked with Ns.
    The chromosome Y sequence provided therefore has the same coordinates as the GenBank sequence but it is not identical to the GenBank sequence. Similarly, duplicate copies of centromeric arrays and WGS on chromosomes 5, 14, 19, 21 & 22 have been hard-masked with Ns (locations of the unmasked copies are given below).
  2. mitochondrial genome from the GRCh38 non-nuclear assembly unit.
  3. unlocalized scaffolds from the GRCh38 Primary Assembly unit.
  4. unplaced scaffolds from the GRCh38 Primary Assembly unit.
  5. Epstein-Barr virus (EBV) sequence
    Note: The EBV sequence is not part of the genome assembly but is included in the analysis set as a sink for alignment of reads that are often present in sequencing samples.

Format of config_file

An example

[RefFiles]
HostRef=/share/HomoGRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
VirusRef=/share/work/bsvir/HBV.AJ507799.2.fa

[DataFiles]
780_T.1=/share/work/bsvir/F12HPCCCSZ0010_Upload/s00_C.bs_1.fq.gz
780_T.2=/share/work/bsvir/F12HPCCCSZ0010_Upload/s00_C.bs_2.fq.gz
s01_P.1=/share/work/bsvir/F12HPCCCSZ0010_Upload/s01_P.bs_1.fq.gz
s01_P.2=/share/work/bsvir/F12HPCCCSZ0010_Upload/s01_P.bs_2.fq.gz
;MultiLibExample.1=/test/Lib1/AAAA.1.fq.gz, /test/Lib2/AAAA.1.fq.gz, /test/Lib3/BBBB.1.fq.gz
;MultiLibExample.2=/test/Lib1/AAAA.2.fq.gz, /test/Lib2/AAAA_2.fq , /test/Lib3/BBBB.2.fq.gz
tSE_X.1=/share/work/bsvir/F12HPCCCSZ0010_Upload/s00_C.bs_1.fq.gz,/share/work/bsvir/F12HPCCCSZ0010_Upload/s01_P.bs_2.fq.gz

[InsertSizes]
780_T=200
780_T.SD=120
s01_P=200
s01_P.SD=30
;MultiLibExample=210
;MultiLibExample.SD=70
tSE_X=90
tSE_X.SD=1

[Output]
WorkDir=/share/work/bsvir/bsI
ProjectID=SZ2015

Build

You'll need cmake and autoconf, automake and devel-libs, as well as gcc, g++ to compile all sources.

For Mac OS X, install Homebrew first. Then:

xcode-select --install
brew install autoconf automake cmake python
brew install --without-multilib gcc

To Build the binaries:

cd src
./download.sh
./install.sh

pip install toolshed

Details

  • For comment lines, use ; as the first character.

  • RefFiles Section

    • HostRef is Host genome.
    • VirusRef is Virus sequence.
  • DataFiles Section

    • Each Sample need an unique ID as SampleID. Use SampleID.1 and SampleID.2 to specify pair-end sequencing data.
    • For samples with multiple PE sets, join each file with comma and keep their order.
  • InsertSizes Section

    • For each SampleID, use SampleID to specify average insert sizes. And use SampleID.SD to specify its standard deviation.
  • Output Section

    • WorkDir is the output directory.
    • ProjectID is an unique ID for this analyse defined in the config_file.

Description

BSuit is a suit to analyse xxx.

Formats

病毒整合结果文件

Chr	breakpoint	virus-start virus-end virusstrand	how-many-reads-support cluster-name
Chr1	3000	200	300	+/-	20 cluster1

中间contig信息文件

clustername contig-number chrpoint virus-integration
cluster1	contig1	chr1:3000	virus:+:200-300
cluster1	contig2	chr2:4000	viurs:-:300-400

ToDo

Compare with ViralFusionSeq [VFS] and VirusFinder 2 on normal WGS data.

See also

Tool Sequencing Type Programme Language 1st Aligenment * Assembler 2nd Aligenment # Epub Date
VirusSeq RNA-Seq, WGS Perl MOSAIK to Human MOSAIK to Virus MOSAIK to Hybrid 2012 Nov 08
ViralFusionSeq RNA-Seq, WGS Perl BWA-SW to Human cap3, SSAKE Blastall to Virus 2013 Jan 12
VERSE(VirusFinder2) WGS, RNA-Seq Perl Bowtie2 to Human, BLAT to Virus, BLASTN to Virus Trinity BWA-SW to Hybrid, SVDetect,CREST 2015 Jan 20
Virus-Clip RNA-seq Perl BWA-MEM to Virus Virus-Clip BLASTN to Human 2015 May 19
Vy-PER WGS, RNA-Seq Python2 BWA-SW to Human Vy-PER BLAT to Virus 2015 Jul 13
seeksv WGS C++ BWA to Hybrid seeksv seeksv to Hybrid 2016 Sep 14
BSVF WGBS, WGS Perl,C,C++ BWA-MEM to Hybrid BSVF water(EMBOSS) to Hybrid N/A

* for virus-infected reads
# for integration infomation

bsvf's People

Contributors

galaxy001 avatar gaosjlucky avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.